22 Apr 2024 | Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Chenxi Whitehouse, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov
The paper presents the results and findings from SemEval-2024 Task 8, which focuses on detecting machine-generated text across multiple generators, domains, and languages. The task consists of three subtasks: Subtask A is a binary classification task to determine whether a text is written by a human or a machine, with both monolingual and multilingual tracks; Subtask B aims to identify the exact source of a text, distinguishing it from specific LLMs; and Subtask C involves identifying the boundary where the authorship transitions from human to machine. The task attracted a significant number of participants, with 126 teams for Subtask A monolingual, 59 for Subtask A multilingual, 70 for Subtask B, and 30 for Subtask C. The best systems across all subtasks utilized LLMs. The paper discusses the datasets, evaluation metrics, task organization, participating systems, and results, highlighting the importance of advanced LLMs, ensemble techniques, and comprehensive analysis for effective MGT detection. It also addresses limitations and ethical considerations, emphasizing the need for further research in detecting machine-generated text in various modalities.The paper presents the results and findings from SemEval-2024 Task 8, which focuses on detecting machine-generated text across multiple generators, domains, and languages. The task consists of three subtasks: Subtask A is a binary classification task to determine whether a text is written by a human or a machine, with both monolingual and multilingual tracks; Subtask B aims to identify the exact source of a text, distinguishing it from specific LLMs; and Subtask C involves identifying the boundary where the authorship transitions from human to machine. The task attracted a significant number of participants, with 126 teams for Subtask A monolingual, 59 for Subtask A multilingual, 70 for Subtask B, and 30 for Subtask C. The best systems across all subtasks utilized LLMs. The paper discusses the datasets, evaluation metrics, task organization, participating systems, and results, highlighting the importance of advanced LLMs, ensemble techniques, and comprehensive analysis for effective MGT detection. It also addresses limitations and ethical considerations, emphasizing the need for further research in detecting machine-generated text in various modalities.