IN-CONTEXT SHARPNESS AS ALERTS: AN INNER REPRESENTATION PERSPECTIVE FOR HALLUCINATION MITIGATION

IN-CONTEXT SHARPNESS AS ALERTS: AN INNER REPRESENTATION PERSPECTIVE FOR HALLUCINATION MITIGATION

12 Mar 2024 | Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He
This paper investigates the mechanisms behind hallucinations in large language models (LLMs) from the perspective of inner representations. The authors propose an entropy-based metric to quantify the "sharpness" of in-context hidden states, which is used to improve the factuality of LLM outputs. They find that correct generations tend to have sharper context activations in the hidden states of the in-context tokens compared to incorrect ones. By incorporating this metric into the decoding process, they develop a constrained decoding approach called Activation Decoding. Experiments on various benchmarks show that Activation Decoding significantly improves factual accuracy, achieving up to an 8.6 point improvement on TruthfulQA. The method is effective across different model sizes and outperforms other approaches in reducing factual error generations. The study provides a practical solution for hallucination mitigation and enhances the reliability of text generation. The code is publicly available at https://github.com/hkust-nlp/Activation_Decoding.This paper investigates the mechanisms behind hallucinations in large language models (LLMs) from the perspective of inner representations. The authors propose an entropy-based metric to quantify the "sharpness" of in-context hidden states, which is used to improve the factuality of LLM outputs. They find that correct generations tend to have sharper context activations in the hidden states of the in-context tokens compared to incorrect ones. By incorporating this metric into the decoding process, they develop a constrained decoding approach called Activation Decoding. Experiments on various benchmarks show that Activation Decoding significantly improves factual accuracy, achieving up to an 8.6 point improvement on TruthfulQA. The method is effective across different model sizes and outperforms other approaches in reducing factual error generations. The study provides a practical solution for hallucination mitigation and enhances the reliability of text generation. The code is publicly available at https://github.com/hkust-nlp/Activation_Decoding.
Reach us at info@study.space