Article Artificial Intelligence Digital Health
30 July 2024

LLMs in Healthcare Diagnostics

In today's blog, our Managing Director for Waracle Studios and digital health expert David Low explores a range of diverse perspectives that underscore the importance of collaborative efforts between technologists, clinicians, and regulators to realise the full benefits of LLMs in healthcare.

The use of large language models (LLMs) in diagnostic health settings is a burgeoning field of study, reflecting both the nascent potential and the inherent challenges of integrating advanced artificial intelligence into clinical practice.

By drawing on recent research, we aim to provide a balanced view on the subject, delving into the technological advances, diagnostic capabilities, and inherent limitations of LLMs at this point in time.

We will cover the technological landscape, the diagnostic capabilities, the specialist healthcare models, along with some of the most pertinent ethical considerations and practical challenges.

Let’s explore together!

The Technological Landscape

Generic large language models such as GPT-4, GPT-3.5, and Gemini represent significant advancements in natural language processing (NLP). These models have been trained on vast datasets, including medical literature, patient data, and general text, allowing them to understand and generate human-like text responses. Their capabilities include interpreting symptoms, providing potential diagnoses, and offering treatment suggestions.

For instance, GPT-4, developed by OpenAI, has shown remarkable proficiency in processing complex medical information and has achieved high accuracy rates on medical diagnostic tests. Studies indicate that GPT-4 achieves a 75% accuracy rate on the Medical Knowledge Self-Assessment Program, outperforming its predecessors and contemporaries​​. Similarly, though designed for niche applications, the Gemini model demonstrates high precision in its outputs, making it suitable for critical diagnostic tasks where accuracy is paramount​.

Diagnostic Capabilities

The integration of LLMs into healthcare diagnostics is driven by their potential to enhance diagnostic accuracy and efficiency. These models can quickly process large volumes of data, identify patterns, and provide insights that may take time to become apparent to human clinicians. For example, in a study evaluating the diagnostic capabilities of various LLMs, GPT-4 and GPT-3.5 were able to accurately diagnose a range of common illnesses based on symptom descriptions, with GPT-4 achieving exceptionally high precision and recall scores​.

The practical applications of these models are vast. They can assist in triaging patients by prioritising cases based on severity, thus streamlining the workflow in busy clinical settings. Furthermore, LLMs can serve as decision-support tools, offering second opinions that help clinicians make more informed decisions. These models’ ability to continuously learn and adapt from new data ensures that their diagnostic capabilities improve, potentially leading to better patient outcomes.

Specialist healthcare models

Large specialist language models such as PaLM (Pathways Language Model) contribute significantly to medical diagnostics by leveraging their advanced training and architecture designed for specific domains. PaLM, developed by Google, has demonstrated high accuracy rates in early testing, achieving remarkable diagnostic precision and efficiency.

This model benefits from the Pathways dataflow system, which allows it to process extensive datasets across thousands of accelerator chips, enhancing its ability to understand and interpret complex medical information​​. PaLM’s architecture, tailored for high-dimensional data integration, enables it to provide nuanced diagnostic insights by synthesising information from various sources, including medical literature, patient histories, and real-time clinical data. Its early testing results suggest that PaLM can effectively handle chain-of-thought reasoning, making it particularly adept at complex diagnostic scenarios where multiple symptoms and possible conditions must be considered simultaneously​.

Challenges and Limitations

Despite their potential, deploying LLMs in healthcare comes with significant challenges. One of the primary concerns is the susceptibility of these models to errors stemming from incorrect or biased input data. Research has shown that when LLMs are presented with biased self-diagnostic information from patients, their diagnostic accuracy can degrade significantly. For instance, a study found that models such as GPT-3.5, PaLM, and Llama notably decreased diagnostic accuracy when provided with adversarial prompts, highlighting the models’ vulnerability to confirmation bias​.

Another critical challenge is ensuring data privacy and compliance with health information regulations such as HIPAA. Integrating LLMs in clinical settings necessitates robust measures to safeguard patient data and ensure that AI systems operate within the ethical and legal frameworks governing healthcare. Additionally, these models’ “black-box” nature poses interpretability issues, making it difficult for clinicians to understand and trust the AI‘s decision-making process fully.

Ethical and Practical Considerations

The ethical implications of using LLMs in healthcare extend beyond data privacy. There is a pressing need to address the potential biases in the training data, which can lead to skewed diagnostic outcomes. For example, if the training data predominantly represent a specific demographic, the model’s diagnostic accuracy for other groups may be compromised. Ensuring diversity and representativeness in the training datasets is crucial to mitigate this risk.

Moreover, relying on LLMs should complement, not replace, human clinicians. While these models can enhance diagnostic processes, the final decision must always rest with qualified medical professionals. This approach helps maintain the human touch in patient care and ensures that the AI’s recommendations are interpreted and applied correctly.

Future Directions

To fully harness the potential of LLMs in healthcare, future research should focus on integrating these models with multimodal data, including medical images and laboratory results, to provide more comprehensive diagnostic insights. Additionally, continuous efforts to improve model transparency and interpretability will be essential in building trust among clinicians and patients.

Exploring the practical deployment of LLMs in real-world clinical settings is another crucial step. Pilot projects and clinical trials can provide valuable insights into the efficacy and reliability of these models in everyday medical practice. Such initiatives will help identify any unforeseen challenges and refine the models to meet clinical needs better.

Does size matter?

Small Language Models (SLMs) have become prominent thanks to Apple’s “Apple Intelligence” announcements, which largely depend on them.

Due to their reduced complexity and smaller size, SLMs are particularly well-suited for integration into portable and resource-constrained medical devices. These devices, such as handheld diagnostic tools or wearable health monitors, can benefit from SLMs’ real-time processing capabilities. By running on the edge, these models can provide immediate diagnostic insights without needing constant internet connectivity or powerful servers, which is crucial in remote or resource-limited settings.

The ability of SLMs to be fine-tuned on specific datasets allows them to deliver high accuracy for targeted applications. For instance, a small language model trained on dermatological data could be embedded in a skin-scanning device to help diagnose skin conditions. Similarly, an SLM focused on cardiology data could be integrated into wearable heart monitors to detect anomalies in heart rhythms and alert users to potential issues.

Additionally, SLMs can enhance patient privacy by processing data locally on the device, reducing the need to transmit sensitive medical information over networks. This localised processing aligns with stringent data protection regulations like HIPAA and helps maintain patient confidentiality. However, they still need to catch up to LLMs in documented research and understanding.

Conclusion

Integrating large language models in diagnostic health settings offers a promising avenue for enhancing medical diagnostics and patient care. While the technological advancements in LLMs like GPT-4, GPT-3.5, and Gemini showcase their potential to revolutionise healthcare, addressing the ethical, practical, and technical challenges is imperative for their successful deployment.

By ensuring robust data privacy measures, mitigating biases, and maintaining the centrality of human clinicians in the diagnostic process, we can harness the full potential of these advanced AI tools to improve healthcare outcomes.

These perspectives underscore the importance of ongoing research, ethical considerations, and collaborative efforts between technologists, clinicians, and regulators to realise the full benefits of LLMs in healthcare. As we continue to explore and refine these technologies, their role in enhancing diagnostic accuracy and efficiency will undoubtedly grow, paving the way for a future where AI-driven diagnostics become an integral part of medical practice.

If you have a burgeoning question about how these nascent technologies may impact your business and your software development practices, get in touch with our team to find out more.

Digital Health Apps

Leverage digital experiences for clinical grade data

Find out more
Share this article

Authors

David Low
David Low
Managing Director - Studios

Related