Understanding If in a Crowdsourced Data Annotation Pipeline%2C a GPT-4

This paper compares the data labeling accuracy of GPT-4 and an ethical, well-executed Amazon Mechanical Turk (MTurk) pipeline. The study involved 415 MTurk workers labeling 3,177 sentence segments from 200 scholarly articles using the CODA-19 scheme. Two worker interfaces were used, yielding 127,080 labels, which were then aggregated using eight label-aggregation algorithms. The results showed that while the MTurk pipeline achieved a highest accuracy of 81.5%, GPT-4 achieved 83.6%. Interestingly, combining GPT-4's labels with crowd labels collected via an advanced worker interface for aggregation resulted in two out of eight algorithms achieving even higher accuracy (87.5% and 87.0%). Further analysis suggested that when the crowd's and GPT-4's labeling strengths are complementary, aggregating them can increase labeling accuracy. The study highlights the importance of considering holistic crowdsourcing practices and the potential benefits of integrating GPT-4 into MTurk pipelines to enhance labeling accuracy.This paper compares the data labeling accuracy of GPT-4 and an ethical, well-executed Amazon Mechanical Turk (MTurk) pipeline. The study involved 415 MTurk workers labeling 3,177 sentence segments from 200 scholarly articles using the CODA-19 scheme. Two worker interfaces were used, yielding 127,080 labels, which were then aggregated using eight label-aggregation algorithms. The results showed that while the MTurk pipeline achieved a highest accuracy of 81.5%, GPT-4 achieved 83.6%. Interestingly, combining GPT-4's labels with crowd labels collected via an advanced worker interface for aggregation resulted in two out of eight algorithms achieving even higher accuracy (87.5% and 87.0%). Further analysis suggested that when the crowd's and GPT-4's labeling strengths are complementary, aggregating them can increase labeling accuracy. The study highlights the importance of considering holistic crowdsourcing practices and the potential benefits of integrating GPT-4 into MTurk pipelines to enhance labeling accuracy.

If in a Crowdsourced Data Annotation Pipeline, a GPT-4

28 Jun 2024 | Zeyu He, Chieh-Yang Huang, Chien-Kuang Cornelia Ding, Shaurya Rohatgi, Ting-Hao 'Kenneth' Huang