Addressing cognitive bias in medical language models

Addressing cognitive bias in medical language models

20 Feb 2024 | Samuel Schmidgall, Carl Harris, Ime Essien, Daniel Oshvang, Tawsifur Rahman, Ji Woong Kim, Rojin Ziaei, Jason Eshraghian, Peter Abadir, and Rama Chellappa
Researchers investigate how large language models (LLMs) handle cognitive biases in medical decision-making. They developed BiasMedQA, a benchmark to evaluate how LLMs perform when presented with clinical questions containing cognitive biases. Six LLMs, including GPT-4, were tested on 1,273 questions from the USMLE, modified to include common cognitive biases. Results showed that GPT-4 was more resilient to bias than other models, while Llama 2 70B-chat and PMC Llama 13B were significantly affected. The study highlights the need for bias mitigation in medical LLMs to ensure safer and more reliable healthcare applications. The study examines seven cognitive biases that influence clinical decisions: self-diagnosis, recency, confirmation, frequency, cultural, status quo, and false consensus. Each bias was tested on LLMs, with results showing varying impacts. GPT-4 showed the least drop in accuracy when exposed to biases, while other models experienced significant accuracy reductions. The researchers propose three mitigation strategies: bias education, one-shot bias demonstration, and few-shot bias demonstration. These strategies improved model performance, with GPT-4 showing the most improvement across all methods. The study also notes that some models, like PaLM-2, often failed to respond to prompts due to safety filters, and others, like Llama 2 70B-chat and PMC Llama 13B, provided nonsensical or multiple answers. The researchers emphasize the importance of addressing cognitive biases in medical LLMs to ensure accurate and unbiased clinical decisions. They also highlight the need for further research to improve the robustness of medical LLMs and to understand the potential issues with these models. The study concludes that while LLMs show promise in healthcare, they require careful mitigation of biases to ensure safe and effective use. The BiasMedQA dataset and code are open-sourced to support further research and development in this area.Researchers investigate how large language models (LLMs) handle cognitive biases in medical decision-making. They developed BiasMedQA, a benchmark to evaluate how LLMs perform when presented with clinical questions containing cognitive biases. Six LLMs, including GPT-4, were tested on 1,273 questions from the USMLE, modified to include common cognitive biases. Results showed that GPT-4 was more resilient to bias than other models, while Llama 2 70B-chat and PMC Llama 13B were significantly affected. The study highlights the need for bias mitigation in medical LLMs to ensure safer and more reliable healthcare applications. The study examines seven cognitive biases that influence clinical decisions: self-diagnosis, recency, confirmation, frequency, cultural, status quo, and false consensus. Each bias was tested on LLMs, with results showing varying impacts. GPT-4 showed the least drop in accuracy when exposed to biases, while other models experienced significant accuracy reductions. The researchers propose three mitigation strategies: bias education, one-shot bias demonstration, and few-shot bias demonstration. These strategies improved model performance, with GPT-4 showing the most improvement across all methods. The study also notes that some models, like PaLM-2, often failed to respond to prompts due to safety filters, and others, like Llama 2 70B-chat and PMC Llama 13B, provided nonsensical or multiple answers. The researchers emphasize the importance of addressing cognitive biases in medical LLMs to ensure accurate and unbiased clinical decisions. They also highlight the need for further research to improve the robustness of medical LLMs and to understand the potential issues with these models. The study concludes that while LLMs show promise in healthcare, they require careful mitigation of biases to ensure safe and effective use. The BiasMedQA dataset and code are open-sourced to support further research and development in this area.
Reach us at info@study.space
[slides and audio] Addressing cognitive bias in medical language models