[slides] Delving into ChatGPT usage in academic writing through excess vocabulary

This study investigates the widespread use of large language models (LLMs) like ChatGPT in academic writing, focusing on the increase in "style words" that indicate LLM-assisted writing. By analyzing 14 million PubMed abstracts from 2010–2024, the researchers found that the appearance of LLMs led to a sharp rise in the frequency of certain style words. Their analysis of "excess word usage" suggests that at least 10% of 2024 abstracts were processed with LLMs, with some sub-corpora showing as high as 30% usage. This impact was compared to major historical events like the Covid pandemic, showing that LLMs have had an unprecedented effect on scientific writing. The study used a data-driven approach to track LLM usage without relying on assumptions about which models scientists use. They calculated the difference between observed word frequencies and expected frequencies based on pre-LLM data, identifying words that were used more frequently in 2024 than expected. These "excess words" included style words like "delves," "showing," and "underlines," which are often used by LLMs. The analysis revealed that the use of LLMs has increased significantly in various academic fields, countries, and journals, with some fields showing higher rates of LLM usage than others. The study also found that the use of LLMs in scientific writing is not uniform across all disciplines, countries, or journals. For example, computational fields showed higher rates of LLM usage compared to other fields, and some countries, like China, South Korea, and Taiwan, showed higher rates than English-speaking countries. The study also found that LLM usage was more common in journals with expedited review processes, suggesting that authors may be using LLMs to write low-effort articles. The study highlights the potential benefits and risks of LLM use in academic writing. While LLMs can improve grammar, readability, and translation, they can also introduce inaccuracies, biases, and fake publications. The study calls for a reassessment of current policies and regulations around the use of LLMs in science, as the true extent of their adoption may be higher than what is currently measured. The study provides a method for measuring LLM usage in academic writing, which can help inform future policies and regulations.This study investigates the widespread use of large language models (LLMs) like ChatGPT in academic writing, focusing on the increase in "style words" that indicate LLM-assisted writing. By analyzing 14 million PubMed abstracts from 2010–2024, the researchers found that the appearance of LLMs led to a sharp rise in the frequency of certain style words. Their analysis of "excess word usage" suggests that at least 10% of 2024 abstracts were processed with LLMs, with some sub-corpora showing as high as 30% usage. This impact was compared to major historical events like the Covid pandemic, showing that LLMs have had an unprecedented effect on scientific writing. The study used a data-driven approach to track LLM usage without relying on assumptions about which models scientists use. They calculated the difference between observed word frequencies and expected frequencies based on pre-LLM data, identifying words that were used more frequently in 2024 than expected. These "excess words" included style words like "delves," "showing," and "underlines," which are often used by LLMs. The analysis revealed that the use of LLMs has increased significantly in various academic fields, countries, and journals, with some fields showing higher rates of LLM usage than others. The study also found that the use of LLMs in scientific writing is not uniform across all disciplines, countries, or journals. For example, computational fields showed higher rates of LLM usage compared to other fields, and some countries, like China, South Korea, and Taiwan, showed higher rates than English-speaking countries. The study also found that LLM usage was more common in journals with expedited review processes, suggesting that authors may be using LLMs to write low-effort articles. The study highlights the potential benefits and risks of LLM use in academic writing. While LLMs can improve grammar, readability, and translation, they can also introduce inaccuracies, biases, and fake publications. The study calls for a reassessment of current policies and regulations around the use of LLMs in science, as the true extent of their adoption may be higher than what is currently measured. The study provides a method for measuring LLM usage in academic writing, which can help inform future policies and regulations.

Delving into ChatGPT usage in academic writing through excess vocabulary

July 4, 2024 | Dmitry Kobak, Rita González-Márquez, Emőke-Ágnes Horvát, and Jan Lause