16 Apr 2024 | Liyan Tang, Philippe Laban, Greg Durrett
The paper "MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents" by Liyan Tang, Philippe Laban, and Greg Durrett addresses the challenge of verifying the factual accuracy of large language model (LLM) outputs, particularly in grounded generation settings where evidence is available. The authors propose a novel approach to build small models that achieve GPT-4-level performance at a significantly lower cost. They achieve this by constructing synthetic training data using GPT-4 to create realistic yet challenging instances of factual errors. This synthetic data is then used to train models that can check each fact in a claim and recognize synthesis of information across sentences.
The paper introduces a new benchmark called LLM-AGGREFACT, which unifies datasets from various sources to evaluate fact-checking performance across closed-book and grounded generation settings. The benchmark includes 10 datasets with human-annotated tuples of (document, claim, label). The authors evaluate their system, MiniCheck, on this benchmark and find that it outperforms existing specialized fact-checkers and LLM-based fact-checkers, achieving similar performance to GPT-4 but with a much smaller model size and lower inference cost.
Key contributions of the paper include:
1. Two synthetic data generation methods to address the challenges of fact-checking on grounding documents.
2. A new benchmark, LLM-AGGREFACT, that aggregates multiple datasets for factual evaluation.
3. Evaluation showing that MiniCheck outperforms previous specialized systems by 4% to 10% in absolute values, despite using less fine-tuning data.
The paper also discusses the importance of training data selection and the computational cost of LLM-based fact-checkers, highlighting that MiniCheck-FT5, the best-performing model, is over 400 times cheaper than GPT-4. Additionally, the authors revisit the need for claim decomposition and decontextualization in fact-checking, finding that these steps are not necessary for their approach.The paper "MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents" by Liyan Tang, Philippe Laban, and Greg Durrett addresses the challenge of verifying the factual accuracy of large language model (LLM) outputs, particularly in grounded generation settings where evidence is available. The authors propose a novel approach to build small models that achieve GPT-4-level performance at a significantly lower cost. They achieve this by constructing synthetic training data using GPT-4 to create realistic yet challenging instances of factual errors. This synthetic data is then used to train models that can check each fact in a claim and recognize synthesis of information across sentences.
The paper introduces a new benchmark called LLM-AGGREFACT, which unifies datasets from various sources to evaluate fact-checking performance across closed-book and grounded generation settings. The benchmark includes 10 datasets with human-annotated tuples of (document, claim, label). The authors evaluate their system, MiniCheck, on this benchmark and find that it outperforms existing specialized fact-checkers and LLM-based fact-checkers, achieving similar performance to GPT-4 but with a much smaller model size and lower inference cost.
Key contributions of the paper include:
1. Two synthetic data generation methods to address the challenges of fact-checking on grounding documents.
2. A new benchmark, LLM-AGGREFACT, that aggregates multiple datasets for factual evaluation.
3. Evaluation showing that MiniCheck outperforms previous specialized systems by 4% to 10% in absolute values, despite using less fine-tuning data.
The paper also discusses the importance of training data selection and the computational cost of LLM-based fact-checkers, highlighting that MiniCheck-FT5, the best-performing model, is over 400 times cheaper than GPT-4. Additionally, the authors revisit the need for claim decomposition and decontextualization in fact-checking, finding that these steps are not necessary for their approach.