Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension

Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension

28 Feb 2024 | Fan Yin, Jayanth Srinivasa, Kai-Wei Chang
This paper explores a novel approach to characterizing and predicting the truthfulness of texts generated by large language models (LLMs). Traditional methods based on entropy or verbalized uncertainty are often intractable, sensitive to hyperparameters, and less reliable in generative tasks. The authors propose using the local intrinsic dimension (LID) of model activations to quantify LLMs' truthfulness. Through experiments on four question-answering (QA) datasets, they demonstrate the effectiveness of their method, outperforming existing uncertainty-based methods by 8% in AUROC. They also study the intrinsic dimensions in LLMs, revealing interesting properties such as a "hunchback" shape in the intrinsic dimensions of language generations and the correlation between intrinsic dimensions and model performance during instruction tuning. The findings suggest that intrinsic dimensions can be a powerful tool for understanding LLMs and detecting hallucinations. The code for this research is available at <https://github.com/fanyin3639/LID-HallucinationDetection>.This paper explores a novel approach to characterizing and predicting the truthfulness of texts generated by large language models (LLMs). Traditional methods based on entropy or verbalized uncertainty are often intractable, sensitive to hyperparameters, and less reliable in generative tasks. The authors propose using the local intrinsic dimension (LID) of model activations to quantify LLMs' truthfulness. Through experiments on four question-answering (QA) datasets, they demonstrate the effectiveness of their method, outperforming existing uncertainty-based methods by 8% in AUROC. They also study the intrinsic dimensions in LLMs, revealing interesting properties such as a "hunchback" shape in the intrinsic dimensions of language generations and the correlation between intrinsic dimensions and model performance during instruction tuning. The findings suggest that intrinsic dimensions can be a powerful tool for understanding LLMs and detecting hallucinations. The code for this research is available at <https://github.com/fanyin3639/LID-HallucinationDetection>.
Reach us at info@study.space