12 Mar 2024 | Shiqi Chen*, Miao Xiong*, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He
This paper explores the mechanisms behind hallucinations in large language models (LLMs) from the perspective of *inner representations*. The authors discover that correct generations tend to have sharper context activations in the hidden states of in-context tokens compared to incorrect ones. Leveraging this insight, they propose an entropy-based metric to quantify the "sharpness" of in-context hidden states and incorporate it into the decoding process to formulate a constrained decoding approach called *Activation Decoding*. Experiments on various benchmarks demonstrate the effectiveness of this approach, achieving up to an 8.6-point improvement on TruthfulQA. The study contributes to a better understanding of hallucinations and provides a practical solution for improving LLMs' factuality.This paper explores the mechanisms behind hallucinations in large language models (LLMs) from the perspective of *inner representations*. The authors discover that correct generations tend to have sharper context activations in the hidden states of in-context tokens compared to incorrect ones. Leveraging this insight, they propose an entropy-based metric to quantify the "sharpness" of in-context hidden states and incorporate it into the decoding process to formulate a constrained decoding approach called *Activation Decoding*. Experiments on various benchmarks demonstrate the effectiveness of this approach, achieving up to an 8.6-point improvement on TruthfulQA. The study contributes to a better understanding of hallucinations and provides a practical solution for improving LLMs' factuality.