Limited ability of LLMs to simulate human psychological behaviours: a psychometric analysis

Limited ability of LLMs to simulate human psychological behaviours: a psychometric analysis

May 12, 2024 | Nikolay B Petrov, Gregory Serapio-Garcia, Jason Rentfrow
This study investigates the ability of large language models (LLMs), specifically GPT-3.5 and GPT-4, to simulate human psychological behaviors through psychometric analysis. The research explores whether LLMs can be used to simulate human participants in experiments, opinion polls, and surveys by prompting them to respond to standardized questionnaires. The study uses psychometrics, the science of psychological measurement, to evaluate the psychometric properties of LLM responses. The researchers prompted GPT-3.5 and GPT-4 to assume different personas and respond to standardized measures of personality constructs. Two types of persona descriptions were used: generic (randomly sampled from the PersonaChat dataset) and specific (based on demographic data from a large-scale survey). The results showed that GPT-4, but not GPT-3.5, using generic personas demonstrated psychometric properties similar to human norms. However, when using specific demographic profiles, both models showed poor psychometric properties. The study found that LLMs' responses using silicon personas (based on demographic data) were not reliable indicators of underlying latent traits. This suggests that LLMs are not yet capable of accurately simulating individual-level human behavior across multiple-choice question answering tasks. The research highlights the limitations of current LLMs in simulating human psychological behaviors and underscores the need for further research to improve their psychometric properties. The findings indicate that while GPT-4 performs better than GPT-3.5 in some aspects, both models still fall short in accurately representing human personality traits. The study concludes that LLMs are not yet ready to fully simulate individual human behavior.This study investigates the ability of large language models (LLMs), specifically GPT-3.5 and GPT-4, to simulate human psychological behaviors through psychometric analysis. The research explores whether LLMs can be used to simulate human participants in experiments, opinion polls, and surveys by prompting them to respond to standardized questionnaires. The study uses psychometrics, the science of psychological measurement, to evaluate the psychometric properties of LLM responses. The researchers prompted GPT-3.5 and GPT-4 to assume different personas and respond to standardized measures of personality constructs. Two types of persona descriptions were used: generic (randomly sampled from the PersonaChat dataset) and specific (based on demographic data from a large-scale survey). The results showed that GPT-4, but not GPT-3.5, using generic personas demonstrated psychometric properties similar to human norms. However, when using specific demographic profiles, both models showed poor psychometric properties. The study found that LLMs' responses using silicon personas (based on demographic data) were not reliable indicators of underlying latent traits. This suggests that LLMs are not yet capable of accurately simulating individual-level human behavior across multiple-choice question answering tasks. The research highlights the limitations of current LLMs in simulating human psychological behaviors and underscores the need for further research to improve their psychometric properties. The findings indicate that while GPT-4 performs better than GPT-3.5 in some aspects, both models still fall short in accurately representing human personality traits. The study concludes that LLMs are not yet ready to fully simulate individual human behavior.
Reach us at info@study.space
Understanding Limited Ability of LLMs to Simulate Human Psychological Behaviours%3A a Psychometric Analysis