People cannot distinguish GPT-4 from a human in a Turing test

People cannot distinguish GPT-4 from a human in a Turing test

9 May 2024 | Cameron R. Jones, Benjamin K. Bergen
A randomized, controlled Turing test was conducted to evaluate whether GPT-4 could be distinguished from a human. Human participants engaged in 5-minute conversations with either a human or an AI and judged whether the interlocutor was human. GPT-4 was judged to be human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%). This is the first robust empirical demonstration that any artificial system can pass an interactive two-player Turing test. The results suggest that current AI systems may be capable of deception that goes undetected. Analysis of participants' strategies and reasoning indicates that stylistic and socio-emotional factors play a larger role in passing the Turing test than traditional notions of intelligence. The study evaluated three AI systems: GPT-4, GPT-3.5, and ELIZA. The AI systems were prompted to adopt a young, casual persona with slang and occasional spelling errors. The study found that interrogators were more accurate when they asked about human experiences, logic, and current events. Interrogators' most frequent reasons for their decisions were related to linguistic style and socio-emotional factors, such as tone, spelling, and personality. These findings suggest that social intelligence is a human characteristic that is more inimitable by machines. The results have implications for debates around machine intelligence and suggest that deception by current AI systems may go undetected. The study also found that participants' confidence in their judgments was not random, with a mean confidence of 73% for identifying GPT-4 as human. The study highlights the importance of understanding what the Turing test measures and the potential consequences of AI systems that can convincingly imitate humans. The results suggest that current AI systems may be capable of deceiving people into believing they are human, which could have widespread social and economic consequences. The study also found that age had a negative effect on accuracy, suggesting that younger people may be harder to fool. The study provides a useful starting point for tracking our changing relationship with AI technologies as they improve.A randomized, controlled Turing test was conducted to evaluate whether GPT-4 could be distinguished from a human. Human participants engaged in 5-minute conversations with either a human or an AI and judged whether the interlocutor was human. GPT-4 was judged to be human 54% of the time, outperforming ELIZA (22%) but lagging behind actual humans (67%). This is the first robust empirical demonstration that any artificial system can pass an interactive two-player Turing test. The results suggest that current AI systems may be capable of deception that goes undetected. Analysis of participants' strategies and reasoning indicates that stylistic and socio-emotional factors play a larger role in passing the Turing test than traditional notions of intelligence. The study evaluated three AI systems: GPT-4, GPT-3.5, and ELIZA. The AI systems were prompted to adopt a young, casual persona with slang and occasional spelling errors. The study found that interrogators were more accurate when they asked about human experiences, logic, and current events. Interrogators' most frequent reasons for their decisions were related to linguistic style and socio-emotional factors, such as tone, spelling, and personality. These findings suggest that social intelligence is a human characteristic that is more inimitable by machines. The results have implications for debates around machine intelligence and suggest that deception by current AI systems may go undetected. The study also found that participants' confidence in their judgments was not random, with a mean confidence of 73% for identifying GPT-4 as human. The study highlights the importance of understanding what the Turing test measures and the potential consequences of AI systems that can convincingly imitate humans. The results suggest that current AI systems may be capable of deceiving people into believing they are human, which could have widespread social and economic consequences. The study also found that age had a negative effect on accuracy, suggesting that younger people may be harder to fool. The study provides a useful starting point for tracking our changing relationship with AI technologies as they improve.
Reach us at info@study.space