A COMPARATIVE STUDY ON ANNOTATION QUALITY OF CROWDSOURCING AND LLM VIA LABEL AGGREGATION

A COMPARATIVE STUDY ON ANNOTATION QUALITY OF CROWDSOURCING AND LLM VIA LABEL AGGREGATION

18 Jan 2024 | Ji yi Li
This paper investigates the annotation quality of crowdsourcing and large language models (LLMs) through label aggregation. The study compares the quality of individual crowd labels and LLM labels, and evaluates the quality of aggregated labels from both. The results show that adding LLM labels from good LLMs to existing crowdsourcing datasets can enhance the quality of the aggregated labels, which is also higher than the quality of LLM labels themselves. The study also proposes a hybrid label aggregation method that combines crowd and LLM labels, and verifies its performance. The findings suggest that while LLMs can outperform crowd workers in some cases, crowd workers may still be better in others, especially when quality control methods are applied. The study also highlights the importance of label aggregation in improving the quality of data annotations. The paper concludes that while the current study focuses on categorical labels, further research is needed on other types of labels.This paper investigates the annotation quality of crowdsourcing and large language models (LLMs) through label aggregation. The study compares the quality of individual crowd labels and LLM labels, and evaluates the quality of aggregated labels from both. The results show that adding LLM labels from good LLMs to existing crowdsourcing datasets can enhance the quality of the aggregated labels, which is also higher than the quality of LLM labels themselves. The study also proposes a hybrid label aggregation method that combines crowd and LLM labels, and verifies its performance. The findings suggest that while LLMs can outperform crowd workers in some cases, crowd workers may still be better in others, especially when quality control methods are applied. The study also highlights the importance of label aggregation in improving the quality of data annotations. The paper concludes that while the current study focuses on categorical labels, further research is needed on other types of labels.
Reach us at info@study.space
[slides and audio] A Comparative Study on Annotation Quality of Crowdsourcing and LLm Via Label Aggregation