Mapping the Increasing Use of LLMs in Scientific Papers

Mapping the Increasing Use of LLMs in Scientific Papers

1 Apr 2024 | Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou
This study investigates the increasing use of large language models (LLMs) in scientific papers. By analyzing 950,965 papers published between January 2020 and February 2024 across arXiv, bioRxiv, and Nature portfolio journals, the research uses a population-level statistical framework to estimate the prevalence of LLM-modified content over time. The findings reveal a steady increase in LLM usage, with the most significant growth observed in Computer Science papers (up to 17.5% in abstracts and 15.3% in introductions by February 2024). In contrast, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3% in abstracts and 6.4% in introductions). The study also found that higher levels of LLM modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and shorter papers. The results suggest that LLMs are being widely used in scientific writing. The study also explores the relationship between LLM usage and various paper characteristics, including preprint posting frequency, paper similarity, and paper length. The findings highlight the growing influence of LLMs in scientific publishing and the need for further research to understand the implications of this trend.This study investigates the increasing use of large language models (LLMs) in scientific papers. By analyzing 950,965 papers published between January 2020 and February 2024 across arXiv, bioRxiv, and Nature portfolio journals, the research uses a population-level statistical framework to estimate the prevalence of LLM-modified content over time. The findings reveal a steady increase in LLM usage, with the most significant growth observed in Computer Science papers (up to 17.5% in abstracts and 15.3% in introductions by February 2024). In contrast, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3% in abstracts and 6.4% in introductions). The study also found that higher levels of LLM modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and shorter papers. The results suggest that LLMs are being widely used in scientific writing. The study also explores the relationship between LLM usage and various paper characteristics, including preprint posting frequency, paper similarity, and paper length. The findings highlight the growing influence of LLMs in scientific publishing and the need for further research to understand the implications of this trend.
Reach us at info@study.space
[slides] Mapping the Increasing Use of LLMs in Scientific Papers | StudySpace