10 Jul 2024 | Ori Press*1,4, Andreas Hochlehnert*1,4, Ameya Prabhu1,4, Vishaal Udandarao1,3,4, Ofir Press1,2, and Matthias Bethge1,1,4
The paper "CiteME: Can Language Models Accurately Cite Scientific Claims?" by Ori Press, Andreas Hochlehner, Ameya Prabhu, Vishaal Udandarao, Ofir Press, and Matthias Bethge explores the ability of language models (LMs) to accurately attribute scientific claims to their sources. The authors introduce CiteME, a benchmark that evaluates LMs' performance in citation attribution tasks. CiteME consists of text excerpts from recent machine learning papers, each referencing a single other paper. The benchmark reveals a significant gap between LMs and human performance, with LMs achieving only 4.2-18.5% accuracy compared to humans' 69.7%. To bridge this gap, the authors introduce CiteAgent, an autonomous system built on the GPT-4o LM that can search and read papers. CiteAgent achieves an accuracy of 35.3% on CiteME, demonstrating the potential for LMs to act as reliable research assistants in attributing scientific claims. The paper also discusses the limitations of current LMs and suggests future directions for improving their accuracy in verifying claims.The paper "CiteME: Can Language Models Accurately Cite Scientific Claims?" by Ori Press, Andreas Hochlehner, Ameya Prabhu, Vishaal Udandarao, Ofir Press, and Matthias Bethge explores the ability of language models (LMs) to accurately attribute scientific claims to their sources. The authors introduce CiteME, a benchmark that evaluates LMs' performance in citation attribution tasks. CiteME consists of text excerpts from recent machine learning papers, each referencing a single other paper. The benchmark reveals a significant gap between LMs and human performance, with LMs achieving only 4.2-18.5% accuracy compared to humans' 69.7%. To bridge this gap, the authors introduce CiteAgent, an autonomous system built on the GPT-4o LM that can search and read papers. CiteAgent achieves an accuracy of 35.3% on CiteME, demonstrating the potential for LMs to act as reliable research assistants in attributing scientific claims. The paper also discusses the limitations of current LMs and suggests future directions for improving their accuracy in verifying claims.