This paper introduces the concept of molecular facts for fact verification in large language models (LLMs). The authors argue that atomic facts, which are typically used for fact checking, are not sufficient due to their lack of context. Instead, they propose molecular facts, which are more specific and can stand alone while retaining their original meaning. Molecular facts are defined by two criteria: decontextuality, which ensures that the fact can stand alone, and minimality, which ensures that the fact contains as little additional information as possible.
The authors evaluate the impact of decontextualization on minimality and present a baseline methodology for generating molecular facts automatically. They compare their approach with various methods of decontextualization and find that molecular facts balance minimality with fact verification accuracy in ambiguous settings.
The paper also discusses the problem of non-minimality in decontextualization, where adding too much information to a fact can lead to errors in error localization. The authors conduct a controlled experiment to illustrate this problem and find that decontextualization can lead to non-minimal cases for between 1.7% to 9.6% of decontextualizations. They also evaluate the performance of different decontextualization methods on a dataset of ambiguous biographies and find that molecular facts strike a balance between maintaining minimality and accuracy of fact verification.
The authors conclude that molecular facts improve fact verification precision for claims from generation about ambiguous entities and that they strike a balance between maintaining minimality and accuracy of fact verification. They also note that their approach should be evaluated fully end-to-end in an LLM pipeline that generates responses and then verifies their factuality.This paper introduces the concept of molecular facts for fact verification in large language models (LLMs). The authors argue that atomic facts, which are typically used for fact checking, are not sufficient due to their lack of context. Instead, they propose molecular facts, which are more specific and can stand alone while retaining their original meaning. Molecular facts are defined by two criteria: decontextuality, which ensures that the fact can stand alone, and minimality, which ensures that the fact contains as little additional information as possible.
The authors evaluate the impact of decontextualization on minimality and present a baseline methodology for generating molecular facts automatically. They compare their approach with various methods of decontextualization and find that molecular facts balance minimality with fact verification accuracy in ambiguous settings.
The paper also discusses the problem of non-minimality in decontextualization, where adding too much information to a fact can lead to errors in error localization. The authors conduct a controlled experiment to illustrate this problem and find that decontextualization can lead to non-minimal cases for between 1.7% to 9.6% of decontextualizations. They also evaluate the performance of different decontextualization methods on a dataset of ambiguous biographies and find that molecular facts strike a balance between maintaining minimality and accuracy of fact verification.
The authors conclude that molecular facts improve fact verification precision for claims from generation about ambiguous entities and that they strike a balance between maintaining minimality and accuracy of fact verification. They also note that their approach should be evaluated fully end-to-end in an LLM pipeline that generates responses and then verifies their factuality.