Factuality of Large Language Models in the Year 2024

Factuality of Large Language Models in the Year 2024

01-Feb-2024 | Yuxia Wang, Minghan Wang, Muhammad Arslan Manzoor, Fei Liu, Georgi Georgiev, Rocktim Jyoti Das, Preslav Nakov
Large language models (LLMs) have become integral to daily life, offering quick answers to complex questions. However, their factual accuracy is often compromised, leading to misinformation. This survey critically analyzes existing research on LLM factuality, identifying major challenges and potential solutions. It highlights the distinction between hallucination (fabricated content) and factuality (accuracy in factual knowledge), emphasizing the need for reliable evaluation methods. The survey categorizes datasets into four types based on answer format and difficulty, including open-ended generations, yes/no answers, short-form responses, and multiple-choice questions. Evaluation metrics include FactScore, which measures the percentage of verified claims, and other methods like hallucination error rates and entailment ratios. Improving LLM factuality involves pre-training with high-quality data, tuning with knowledge injection, and using retrieval-augmented generation (RAG) to enhance accuracy. Techniques like nucleus sampling, context-aware decoding, and multi-agent debates are explored to enhance factuality during inference. Multimodal LLMs, capable of processing visual and audio data, also face hallucination challenges, requiring specialized approaches for fact-checking. Automatic fact-checkers, such as FactScore and Factool, are evaluated for their effectiveness, though challenges remain in automated verification and quantification. The survey identifies three major challenges: the inherent limitations of LLMs in learning facts, the difficulty of evaluating open-ended generations, and the need for efficient, accurate fact-checking systems. Future research directions include developing more robust evaluation frameworks, improving retrieval and verification methods, and enhancing the efficiency of fact-checking systems. The survey concludes that addressing these challenges is crucial for advancing the reliability and utility of LLMs in real-world applications.Large language models (LLMs) have become integral to daily life, offering quick answers to complex questions. However, their factual accuracy is often compromised, leading to misinformation. This survey critically analyzes existing research on LLM factuality, identifying major challenges and potential solutions. It highlights the distinction between hallucination (fabricated content) and factuality (accuracy in factual knowledge), emphasizing the need for reliable evaluation methods. The survey categorizes datasets into four types based on answer format and difficulty, including open-ended generations, yes/no answers, short-form responses, and multiple-choice questions. Evaluation metrics include FactScore, which measures the percentage of verified claims, and other methods like hallucination error rates and entailment ratios. Improving LLM factuality involves pre-training with high-quality data, tuning with knowledge injection, and using retrieval-augmented generation (RAG) to enhance accuracy. Techniques like nucleus sampling, context-aware decoding, and multi-agent debates are explored to enhance factuality during inference. Multimodal LLMs, capable of processing visual and audio data, also face hallucination challenges, requiring specialized approaches for fact-checking. Automatic fact-checkers, such as FactScore and Factool, are evaluated for their effectiveness, though challenges remain in automated verification and quantification. The survey identifies three major challenges: the inherent limitations of LLMs in learning facts, the difficulty of evaluating open-ended generations, and the need for efficient, accurate fact-checking systems. Future research directions include developing more robust evaluation frameworks, improving retrieval and verification methods, and enhancing the efficiency of fact-checking systems. The survey concludes that addressing these challenges is crucial for advancing the reliability and utility of LLMs in real-world applications.
Reach us at info@study.space
[slides and audio] Factuality of Large Language Models%3A A Survey