SemEval-2024 Task 8: Multidomain, Multimodal and Multilingual Machine-Generated Text Detection

SemEval-2024 Task 8: Multidomain, Multimodal and Multilingual Machine-Generated Text Detection

22 Apr 2024 | Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Chenxi Whitehouse, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov
SemEval-2024 Task 8 focused on detecting machine-generated text across multiple generators, domains, and languages. The task included three subtasks: A (human vs. machine classification), B (multi-way generator detection), and C (change point detection). A total of 54 teams submitted system descriptions, with 126, 59, 70, and 30 teams participating in subtasks A monolingual, A multilingual, B, and C, respectively. The best systems for all subtasks used large language models (LLMs). Subtask A aimed to classify texts as human or machine-generated. It had two tracks: monolingual (English) and multilingual. Subtask B required identifying the exact source of a text, distinguishing between human and specific LLMs. Subtask C focused on detecting the transition point within a text where authorship shifts from human to machine. The task used large-scale datasets for all subtasks, with the monolingual track involving English texts and the multilingual track covering multiple languages. Evaluation metrics included accuracy for subtasks A and B, and mean absolute error (MAE) for subtask C. The best-performing systems in subtask A achieved high accuracy, with the top team achieving 96.88% accuracy. In subtask B, the top team achieved 90.85% accuracy. For subtask C, the top system achieved an MAE of 15.68. The task attracted significant interest from researchers, with a wide range of approaches used, including LLMs, data augmentation, and ensemble methods. Challenges included detecting machine-generated text in complex scenarios, such as texts with mixed human and machine contributions. The results highlighted the importance of leveraging advanced LLMs, ensemble techniques, and comprehensive analysis for effective detection of machine-generated text across multiple languages and domains. The task also emphasized the need for robust models that can generalize well to unseen data and handle language-style attacks. The findings contribute to the ongoing research in natural language processing and the development of systems to detect and mitigate the misuse of machine-generated text.SemEval-2024 Task 8 focused on detecting machine-generated text across multiple generators, domains, and languages. The task included three subtasks: A (human vs. machine classification), B (multi-way generator detection), and C (change point detection). A total of 54 teams submitted system descriptions, with 126, 59, 70, and 30 teams participating in subtasks A monolingual, A multilingual, B, and C, respectively. The best systems for all subtasks used large language models (LLMs). Subtask A aimed to classify texts as human or machine-generated. It had two tracks: monolingual (English) and multilingual. Subtask B required identifying the exact source of a text, distinguishing between human and specific LLMs. Subtask C focused on detecting the transition point within a text where authorship shifts from human to machine. The task used large-scale datasets for all subtasks, with the monolingual track involving English texts and the multilingual track covering multiple languages. Evaluation metrics included accuracy for subtasks A and B, and mean absolute error (MAE) for subtask C. The best-performing systems in subtask A achieved high accuracy, with the top team achieving 96.88% accuracy. In subtask B, the top team achieved 90.85% accuracy. For subtask C, the top system achieved an MAE of 15.68. The task attracted significant interest from researchers, with a wide range of approaches used, including LLMs, data augmentation, and ensemble methods. Challenges included detecting machine-generated text in complex scenarios, such as texts with mixed human and machine contributions. The results highlighted the importance of leveraging advanced LLMs, ensemble techniques, and comprehensive analysis for effective detection of machine-generated text across multiple languages and domains. The task also emphasized the need for robust models that can generalize well to unseen data and handle language-style attacks. The findings contribute to the ongoing research in natural language processing and the development of systems to detect and mitigate the misuse of machine-generated text.
Reach us at info@study.space
Understanding SemEval-2024 Task 8%3A Multidomain%2C Multimodel and Multilingual Machine-Generated Text Detection