Understanding What Evidence Do Language Models Find Convincing%3F

This paper investigates how retrieval-augmented language models (LLMs) handle subjective and conflicting queries, such as "is aspartame linked to cancer." The authors construct a dataset called CONFLICTINGQA, which pairs controversial queries with real-world evidence documents containing different facts, argument styles, and answers. They use this dataset to study which text features most affect LLM predictions. The results show that current models heavily rely on the relevance of a website to the query, while largely ignoring stylistic features that humans find important, such as scientific references or neutral tone. The study highlights the importance of improving the quality of retrieved evidence and aligning LLM training with human preferences. The authors also perform counterfactual analyses to explore how changes in text style and relevance affect model predictions, finding that simple perturbations targeting relevance can significantly improve the win-rate of a model's predictions. Overall, the findings suggest that LLMs need to be better equipped to handle ambiguous and conflicting information, and that future work should focus on integrating additional forms of information to enhance their judgment of text credibility.This paper investigates how retrieval-augmented language models (LLMs) handle subjective and conflicting queries, such as "is aspartame linked to cancer." The authors construct a dataset called CONFLICTINGQA, which pairs controversial queries with real-world evidence documents containing different facts, argument styles, and answers. They use this dataset to study which text features most affect LLM predictions. The results show that current models heavily rely on the relevance of a website to the query, while largely ignoring stylistic features that humans find important, such as scientific references or neutral tone. The study highlights the importance of improving the quality of retrieved evidence and aligning LLM training with human preferences. The authors also perform counterfactual analyses to explore how changes in text style and relevance affect model predictions, finding that simple perturbations targeting relevance can significantly improve the win-rate of a model's predictions. Overall, the findings suggest that LLMs need to be better equipped to handle ambiguous and conflicting information, and that future work should focus on integrating additional forms of information to enhance their judgment of text credibility.

What Evidence Do Language Models Find Convincing?

9 Aug 2024 | Alexander Wan, Eric Wallace, Dan Klein