[slides] People cannot distinguish GPT-4 from a human in a Turing test

The study evaluates the ability of human participants to distinguish between a human and an AI system (GPT-4, GPT-3.5, and ELIZA) in a Turing test. The experiment is designed as a randomized, controlled, and preregistered Turing test, where human participants engage in a 5-minute conversation with either a human or an AI system and must judge whether their interlocutor is human. The results show that GPT-4 was judged to be human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%). This provides the first robust empirical demonstration that any artificial system can pass an interactive 2-player Turing test. The findings have implications for debates around machine intelligence and suggest that current AI systems may be capable of deceiving people into believing they are human, with potential social and economic consequences. Analysis of participants' strategies and reasoning indicates that stylistic and socio-emotional factors play a larger role in passing the Turing test than traditional notions of intelligence.The study evaluates the ability of human participants to distinguish between a human and an AI system (GPT-4, GPT-3.5, and ELIZA) in a Turing test. The experiment is designed as a randomized, controlled, and preregistered Turing test, where human participants engage in a 5-minute conversation with either a human or an AI system and must judge whether their interlocutor is human. The results show that GPT-4 was judged to be human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%). This provides the first robust empirical demonstration that any artificial system can pass an interactive 2-player Turing test. The findings have implications for debates around machine intelligence and suggest that current AI systems may be capable of deceiving people into believing they are human, with potential social and economic consequences. Analysis of participants' strategies and reasoning indicates that stylistic and socio-emotional factors play a larger role in passing the Turing test than traditional notions of intelligence.

People cannot distinguish GPT-4 from a human in a Turing test

9 May 2024 | Cameron R. Jones, Benjamin K. Bergen