Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

27 Jan 2024 | Yuxin Liang*, Zhuoyang Song*, Hao Wang, Jiaxing Zhang
This paper evaluates the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a critical factor in countering factual hallucinations and ensuring reliable application. The authors observe that while LLMs have a robust self-awareness of their internal knowledge state, they often fail to express this knowledge during generation, leading to factual hallucinations. To address this issue, they develop DreamCatcher, an automated hallucination annotation tool that merges knowledge probing and consistency checking methods to rank factual preference data. Using this data, they propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, which leverages reinforcement learning to enhance the factuality and honesty of LLMs. Experiments across multiple models show that RLKF training effectively improves the models' ability to utilize their internal knowledge state, boosting performance in various knowledge-based and honesty-related tasks. The primary contributions of the paper include extensive experiments on LLMs' capacity to discern their internal knowledge, the development of DreamCatcher, and the introduction of the RLKF training framework.This paper evaluates the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a critical factor in countering factual hallucinations and ensuring reliable application. The authors observe that while LLMs have a robust self-awareness of their internal knowledge state, they often fail to express this knowledge during generation, leading to factual hallucinations. To address this issue, they develop DreamCatcher, an automated hallucination annotation tool that merges knowledge probing and consistency checking methods to rank factual preference data. Using this data, they propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, which leverages reinforcement learning to enhance the factuality and honesty of LLMs. Experiments across multiple models show that RLKF training effectively improves the models' ability to utilize their internal knowledge state, boosting performance in various knowledge-based and honesty-related tasks. The primary contributions of the paper include extensive experiments on LLMs' capacity to discern their internal knowledge, the development of DreamCatcher, and the introduction of the RLKF training framework.
Reach us at info@study.space