1 Apr 2024 | Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou
The paper "Mapping the Increasing Use of LLMs in Scientific Papers" by Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, and James Y. Zou from Stanford University and UC Santa Barbara, explores the growing use of large language models (LLMs) in academic writing. The study is the first systematic, large-scale analysis of LLM-modified content in scientific papers, covering 950,965 papers published between January 2020 and February 2024 on platforms like arXiv, bioRxiv, and Nature journals.
Key findings include:
- A steady increase in LLM usage, with the highest growth observed in Computer Science papers (up to 17.5%).
- Mathematics papers and Nature portfolio journals showed the least LLM modification (up to 6.3%).
- Papers with more frequent preprint posting by first authors and shorter lengths tend to have higher LLM modification.
- Papers in more crowded research areas also show higher LLM usage.
The study uses a population-level statistical framework to measure LLM-modified content, providing insights into the structural factors that drive LLM usage in academic writing. The findings highlight the need for further research to understand the implications of LLMs on scientific publishing, including concerns about accuracy, plagiarism, anonymity, and ownership.The paper "Mapping the Increasing Use of LLMs in Scientific Papers" by Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, and James Y. Zou from Stanford University and UC Santa Barbara, explores the growing use of large language models (LLMs) in academic writing. The study is the first systematic, large-scale analysis of LLM-modified content in scientific papers, covering 950,965 papers published between January 2020 and February 2024 on platforms like arXiv, bioRxiv, and Nature journals.
Key findings include:
- A steady increase in LLM usage, with the highest growth observed in Computer Science papers (up to 17.5%).
- Mathematics papers and Nature portfolio journals showed the least LLM modification (up to 6.3%).
- Papers with more frequent preprint posting by first authors and shorter lengths tend to have higher LLM modification.
- Papers in more crowded research areas also show higher LLM usage.
The study uses a population-level statistical framework to measure LLM-modified content, providing insights into the structural factors that drive LLM usage in academic writing. The findings highlight the need for further research to understand the implications of LLMs on scientific publishing, including concerns about accuracy, plagiarism, anonymity, and ownership.