Fine-grained Hallucination Detection and Editing for Language Models

Fine-grained Hallucination Detection and Editing for Language Models

2024 | Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Yulia Tsvetkov, Graham Neubig, Hannaneh Hajishirzi
This paper introduces a comprehensive taxonomy of hallucinations in large language models (LLMs) and proposes a new task of automatic fine-grained hallucination detection. The authors argue that hallucinations manifest in diverse forms, each requiring varying degrees of careful assessment to verify factuality. They construct a new evaluation benchmark, FAVABENCH, containing approximately 1,000 fine-grained human judgments on three LMs across various domains. Their analysis reveals that ChatGPT and Llama2-Chat (70B, 7B) exhibit diverse types of hallucinations in the majority of their outputs in information-seeking scenarios, highlighting the need for fine-grained systems. To address this, they train FAVA, a retrieval-augmented LM that can identify and mark hallucinations at the span-level using a unified syntax. Automatic and human evaluations show that FAVA significantly outperforms retrieval-augmented ChatGPT and GPT-4 on fine-grained hallucination detection. Furthermore, FAVA outperforms widely-used hallucination detection systems on binary detection and shows effectiveness in editing to improve factuality. The authors also propose a fine-grained hallucination taxonomy, categorizing hallucinations into six types: entity, relation, sentence, invented, subjective, and unverifiable. They introduce a new task of identifying and editing fine-grained factual errors in LM outputs. FAVA is trained on high-quality synthetic data generated through a three-step process: seed passage generation, error insertion, and post-processing. The model consists of a retriever and an editing LM, which work together to detect and correct hallucinations. The authors evaluate FAVA on a benchmark consisting of 902 annotated passages and find that it significantly outperforms baselines in both detection and editing tasks. They also conduct human evaluations and find that FAVA performs better than retrieval-augmented ChatGPT in both detection and editing. The results show that FAVA has strong capabilities in detecting factual errors in LM outputs, although there is still room for improvement in retrieving and incorporating many references. The paper concludes that FAVA is a promising approach for fine-grained hallucination detection and editing in LMs.This paper introduces a comprehensive taxonomy of hallucinations in large language models (LLMs) and proposes a new task of automatic fine-grained hallucination detection. The authors argue that hallucinations manifest in diverse forms, each requiring varying degrees of careful assessment to verify factuality. They construct a new evaluation benchmark, FAVABENCH, containing approximately 1,000 fine-grained human judgments on three LMs across various domains. Their analysis reveals that ChatGPT and Llama2-Chat (70B, 7B) exhibit diverse types of hallucinations in the majority of their outputs in information-seeking scenarios, highlighting the need for fine-grained systems. To address this, they train FAVA, a retrieval-augmented LM that can identify and mark hallucinations at the span-level using a unified syntax. Automatic and human evaluations show that FAVA significantly outperforms retrieval-augmented ChatGPT and GPT-4 on fine-grained hallucination detection. Furthermore, FAVA outperforms widely-used hallucination detection systems on binary detection and shows effectiveness in editing to improve factuality. The authors also propose a fine-grained hallucination taxonomy, categorizing hallucinations into six types: entity, relation, sentence, invented, subjective, and unverifiable. They introduce a new task of identifying and editing fine-grained factual errors in LM outputs. FAVA is trained on high-quality synthetic data generated through a three-step process: seed passage generation, error insertion, and post-processing. The model consists of a retriever and an editing LM, which work together to detect and correct hallucinations. The authors evaluate FAVA on a benchmark consisting of 902 annotated passages and find that it significantly outperforms baselines in both detection and editing tasks. They also conduct human evaluations and find that FAVA performs better than retrieval-augmented ChatGPT in both detection and editing. The results show that FAVA has strong capabilities in detecting factual errors in LM outputs, although there is still room for improvement in retrieving and incorporating many references. The paper concludes that FAVA is a promising approach for fine-grained hallucination detection and editing in LMs.
Reach us at info@study.space
[slides and audio] Fine-grained Hallucination Detection and Editing for Language Models