Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

27 Jan 2024 | Yuxin Liang, Zhuoyang Song, Hao Wang, Jiaxing Zhang
This paper investigates the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, which is crucial for mitigating factual hallucinations and ensuring reliable applications. The study finds that LLMs exhibit strong self-awareness of their internal knowledge state, as evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. To address this, the authors develop an automated hallucination annotation tool called DreamCatcher, which combines knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as a reward, they propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework to enhance the factuality and honesty of LLMs. Experiments across multiple models show that RLKF training effectively improves the models' ability to utilize their internal knowledge state, boosting performance in various knowledge-based and honesty-related tasks. The paper also explores the problem of fact-conflict hallucination, where LLMs generate content that is fluent and plausible but conflicts with real-world facts. The study highlights the importance of enhancing the model's utilization of internal knowledge to mitigate hallucinations. The results demonstrate that RLKF not only enhances the honesty and factuality of LLMs but also improves their general capabilities. The paper contributes to the field by providing a training framework that leverages reinforcement learning to improve LLMs' factuality and honesty. The findings suggest that RLKF is a promising solution to address LLM hallucination issues and, when combined with RLHF, offers significant potential for enhancing the model's overall capabilities.This paper investigates the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, which is crucial for mitigating factual hallucinations and ensuring reliable applications. The study finds that LLMs exhibit strong self-awareness of their internal knowledge state, as evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. To address this, the authors develop an automated hallucination annotation tool called DreamCatcher, which combines knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as a reward, they propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework to enhance the factuality and honesty of LLMs. Experiments across multiple models show that RLKF training effectively improves the models' ability to utilize their internal knowledge state, boosting performance in various knowledge-based and honesty-related tasks. The paper also explores the problem of fact-conflict hallucination, where LLMs generate content that is fluent and plausible but conflicts with real-world facts. The study highlights the importance of enhancing the model's utilization of internal knowledge to mitigate hallucinations. The results demonstrate that RLKF not only enhances the honesty and factuality of LLMs but also improves their general capabilities. The paper contributes to the field by providing a training framework that leverages reinforcement learning to improve LLMs' factuality and honesty. The findings suggest that RLKF is a promising solution to address LLM hallucination issues and, when combined with RLHF, offers significant potential for enhancing the model's overall capabilities.
Reach us at info@study.space
[slides] Learning to Trust Your Feelings%3A Leveraging Self-awareness in LLMs for Hallucination Mitigation | StudySpace