[slides and audio] In-Context Sharpness as Alerts%3A An Inner Representation Perspective for Hallucination Mitigation

This paper explores the mechanisms behind hallucinations in large language models (LLMs) from the perspective of *inner representations*. The authors discover that correct generations tend to have sharper context activations in the hidden states of in-context tokens compared to incorrect ones. Leveraging this insight, they propose an entropy-based metric to quantify the "sharpness" of in-context hidden states and incorporate it into the decoding process to formulate a constrained decoding approach called *Activation Decoding*. Experiments on various benchmarks demonstrate the effectiveness of this approach, achieving up to an 8.6-point improvement on TruthfulQA. The study contributes to a better understanding of hallucinations and provides a practical solution for improving LLMs' factuality.This paper explores the mechanisms behind hallucinations in large language models (LLMs) from the perspective of *inner representations*. The authors discover that correct generations tend to have sharper context activations in the hidden states of in-context tokens compared to incorrect ones. Leveraging this insight, they propose an entropy-based metric to quantify the "sharpness" of in-context hidden states and incorporate it into the decoding process to formulate a constrained decoding approach called *Activation Decoding*. Experiments on various benchmarks demonstrate the effectiveness of this approach, achieving up to an 8.6-point improvement on TruthfulQA. The study contributes to a better understanding of hallucinations and provides a practical solution for improving LLMs' factuality.

IN-CONTEXT SHARPNESS AS ALERTS: AN INNER REPRESENTATION PERSPECTIVE FOR HALLUCINATION MITIGATION

12 Mar 2024 | Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He

IN-CONTEXT SHARPNESS AS ALERTS: AN INNER REPRESENTATION PERSPECTIVE FOR HALLUCINATION MITIGATION

12 Mar 2024 | Shiqi Chen*, Miao Xiong*, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He

12 Mar 2024 | Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He