[slides and audio] Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models

The paper introduces Self-BioRAG, a framework designed to enhance medical reasoning through retrieval and self-reflection with retrieval-augmented large language models (LLMs). Self-BioRAG is specifically tailored for biomedical text, focusing on generating explanations, retrieving domain-specific documents, and self-reflection to generate responses. The framework uses 84K filtered documents and 30K sets to train a self-BioRAG model that can assess its generated explanations with customized reflective tokens. The study demonstrates that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets, are necessary for adhering to domain-related instructions. Experimental results on three major medical question-answering benchmark datasets show that Self-BioRAG achieves significant performance gains, with an average absolute improvement of 7.2% over state-of-the-art open-foundation models with a parameter size of 7B or less. Additionally, Self-BioRAG outperforms RAG by 8% Rouge-1 score in generating more proficient answers on two long-form question-answering benchmarks. The paper also provides detailed analyses of the effectiveness of each domain-adaptation component and the role of retrieved evidence in solving open-domain question-answering benchmarks. Overall, Self-BioRAG is shown to be effective in finding clues in questions, retrieving relevant documents, and understanding how to answer with information from retrieved documents and encoded knowledge, similar to how medical experts approach problems.The paper introduces Self-BioRAG, a framework designed to enhance medical reasoning through retrieval and self-reflection with retrieval-augmented large language models (LLMs). Self-BioRAG is specifically tailored for biomedical text, focusing on generating explanations, retrieving domain-specific documents, and self-reflection to generate responses. The framework uses 84K filtered documents and 30K sets to train a self-BioRAG model that can assess its generated explanations with customized reflective tokens. The study demonstrates that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets, are necessary for adhering to domain-related instructions. Experimental results on three major medical question-answering benchmark datasets show that Self-BioRAG achieves significant performance gains, with an average absolute improvement of 7.2% over state-of-the-art open-foundation models with a parameter size of 7B or less. Additionally, Self-BioRAG outperforms RAG by 8% Rouge-1 score in generating more proficient answers on two long-form question-answering benchmarks. The paper also provides detailed analyses of the effectiveness of each domain-adaptation component and the role of retrieved evidence in solving open-domain question-answering benchmarks. Overall, Self-BioRAG is shown to be effective in finding clues in questions, retrieving relevant documents, and understanding how to answer with information from retrieved documents and encoded knowledge, similar to how medical experts approach problems.

Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models

2024 | Minbyul Jeong, Jiwon Sohn, Mujeen Sung, Jaewoo Kang