Can large language models help augment English psycholinguistic datasets?

Can large language models help augment English psycholinguistic datasets?

Accepted: 5 January 2024 / Published online: 23 January 2024 | Sean Trott
This paper explores the potential of using large language models (LLMs) to augment the creation of psycholinguistic datasets, particularly those involving contextualized judgments. The author, Sean Trott, uses GPT-4 to collect multiple types of semantic judgments for English words and compares these judgments against human "gold standard" data. The results show that GPT-4's judgments are positively correlated with human judgments, sometimes even surpassing the average inter-annotator agreement. The study identifies systematic differences between LLM-generated norms and human-generated norms and performs substitution analyses to assess the impact of using LLM-generated norms in statistical models. The findings suggest that while LLMs can provide reliable and useful data, there are limitations and considerations, such as data contamination, choice of LLM, external validity, construct validity, and data quality. The author concludes by discussing the viability of LLM-generated norms and their potential applications in psycholinguistic research.This paper explores the potential of using large language models (LLMs) to augment the creation of psycholinguistic datasets, particularly those involving contextualized judgments. The author, Sean Trott, uses GPT-4 to collect multiple types of semantic judgments for English words and compares these judgments against human "gold standard" data. The results show that GPT-4's judgments are positively correlated with human judgments, sometimes even surpassing the average inter-annotator agreement. The study identifies systematic differences between LLM-generated norms and human-generated norms and performs substitution analyses to assess the impact of using LLM-generated norms in statistical models. The findings suggest that while LLMs can provide reliable and useful data, there are limitations and considerations, such as data contamination, choice of LLM, external validity, construct validity, and data quality. The author concludes by discussing the viability of LLM-generated norms and their potential applications in psycholinguistic research.
Reach us at info@study.space