LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?

LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?

2024 | Qihui Zhang, Chujie Gao, Dongping Chen, Yue Huang, Yixin Huang, Zhenyang Sun, Shilin Zhang, Weiyi Li, Zhengyan Fu, Yao Wan, Lichao Sun
LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected? With the rapid development and widespread application of Large Language Models (LLMs), the use of Machine-Generated Text (MGT) has become increasingly common, bringing potential risks, especially in terms of quality and integrity in fields like news, education, and science. Current research mainly focuses on purely MGT detection without adequately addressing mixed scenarios, including AI-revised Human-Written Text (HWT) or human-revised MGT. To tackle this challenge, we define mixtext, a form of mixed text involving both AI and human-generated content. Then, we introduce MIXSET, the first dataset dedicated to studying these mixtext scenarios. Leveraging MIXSET, we executed comprehensive experiments to assess the efficacy of prevalent MGT detectors in handling mixtext situations, evaluating their performance in terms of effectiveness, robustness, and generalization. Our findings reveal that existing detectors struggle to identify mixtext, particularly in dealing with subtle modifications and style adaptability. This research underscores the urgent need for more fine-grained detectors tailored for mixtext, offering valuable insights for future research. Code and Models are available at https://github.com/Dongping-Chen/MixSet. Our work provides three main contributions: (1) We defined mixtext, a form of mixed text involving both AI and human-generated content, providing a new perspective for further exploration in related fields. (2) We proposed a new dataset MIXSET, which specifically addresses the mixture of MGT and HWT, encompassing a diverse range of operations within real-world scenarios, addressing gaps in previous research. (3) Based on MIXSET, we conducted extensive experiments involving mainstream detectors and obtained numerous insightful findings, which provide a strong impetus for future research.LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected? With the rapid development and widespread application of Large Language Models (LLMs), the use of Machine-Generated Text (MGT) has become increasingly common, bringing potential risks, especially in terms of quality and integrity in fields like news, education, and science. Current research mainly focuses on purely MGT detection without adequately addressing mixed scenarios, including AI-revised Human-Written Text (HWT) or human-revised MGT. To tackle this challenge, we define mixtext, a form of mixed text involving both AI and human-generated content. Then, we introduce MIXSET, the first dataset dedicated to studying these mixtext scenarios. Leveraging MIXSET, we executed comprehensive experiments to assess the efficacy of prevalent MGT detectors in handling mixtext situations, evaluating their performance in terms of effectiveness, robustness, and generalization. Our findings reveal that existing detectors struggle to identify mixtext, particularly in dealing with subtle modifications and style adaptability. This research underscores the urgent need for more fine-grained detectors tailored for mixtext, offering valuable insights for future research. Code and Models are available at https://github.com/Dongping-Chen/MixSet. Our work provides three main contributions: (1) We defined mixtext, a form of mixed text involving both AI and human-generated content, providing a new perspective for further exploration in related fields. (2) We proposed a new dataset MIXSET, which specifically addresses the mixture of MGT and HWT, encompassing a diverse range of operations within real-world scenarios, addressing gaps in previous research. (3) Based on MIXSET, we conducted extensive experiments involving mainstream detectors and obtained numerous insightful findings, which provide a strong impetus for future research.
Reach us at info@study.space
[slides] LLM-as-a-Coauthor%3A Can Mixed Human-Written and Machine-Generated Text Be Detected%3F | StudySpace