Can Language Models Recognize Convincing Arguments?

Can Language Models Recognize Convincing Arguments?

3 Oct 2024 | Paula Dolores Rescala, Manoel Horta Ribeiro, Tiancheng Hu, Robert West
Can Language Models Recognize Convincing Arguments? Paula Dolores Rescala, Manoel Horta Ribeiro, Tiancheng Hu, Robert West Abstract: The capabilities of large language models (LLMs) have raised concerns about their potential to create and propagate convincing narratives. This study examines LLMs' ability to detect convincing arguments, providing insights into their persuasive capabilities without direct human experimentation. The researchers extended a dataset with debates, votes, and user traits, proposing tasks to measure LLMs' ability to distinguish strong from weak arguments, predict stances based on beliefs and demographics, and determine argument appeal based on individual traits. Results show LLMs perform similarly to humans, with combined predictions from different LLMs outperforming human performance. The data and code contribute to ongoing efforts to evaluate and monitor LLMs' capabilities and potential impact. Introduction: As LLMs become more capable, concerns grow about their potential to create and spread tailored, convincing narratives. While "tailor-made misinformation" predates LLMs, models like GPT-4, Claude 3, and Gemini 1.5 could exacerbate the issue by enabling malicious actors to create personalized content or detect and amplify content persuasive to specific demographics. Previous studies show LLMs are persuasive in generative settings, but assessing their argument generation capacity requires continuous human experimentation, which is time-consuming. In contrast, measuring LLMs' ability to detect content persuasive to specific demographics is more efficient. This study investigates whether LLMs can detect content persuasive to individuals with specific demographics or beliefs. The researchers focus on three research questions: (1) Can LLMs judge the quality of arguments and identify convincing arguments? (2) Can LLMs judge how demographics and beliefs influence people's stances on specific topics? (3) Can LLMs determine how arguments appeal to individuals based on their demographics? To investigate these questions, the researchers extended a dataset from a defunct debate platform, annotating 833 politics-related debates with clear propositions. Each debate contains arguments for and against the proposition, along with votes from participants. The dataset includes demographic information and stances on 48 "big issues." For 121 debates with 751 votes on three prominent topics, the researchers obtained crowdsourced labels to compare LLMs' capabilities with humans. Using this enriched dataset, they evaluated four LLMs (GPT-3.5, GPT-4, Llama 2, and Mistral 7B) on three tasks: (1) identifying the side with more convincing arguments (RQ1); (2) predicting individuals' stances on specific propositions before the debate (RQ2); and (3) predicting individuals' stances on specific propositions after the debate (RQ3). Key findings: LLMs exhibit human-like performance across the three tasks. In judging the better debater (RCan Language Models Recognize Convincing Arguments? Paula Dolores Rescala, Manoel Horta Ribeiro, Tiancheng Hu, Robert West Abstract: The capabilities of large language models (LLMs) have raised concerns about their potential to create and propagate convincing narratives. This study examines LLMs' ability to detect convincing arguments, providing insights into their persuasive capabilities without direct human experimentation. The researchers extended a dataset with debates, votes, and user traits, proposing tasks to measure LLMs' ability to distinguish strong from weak arguments, predict stances based on beliefs and demographics, and determine argument appeal based on individual traits. Results show LLMs perform similarly to humans, with combined predictions from different LLMs outperforming human performance. The data and code contribute to ongoing efforts to evaluate and monitor LLMs' capabilities and potential impact. Introduction: As LLMs become more capable, concerns grow about their potential to create and spread tailored, convincing narratives. While "tailor-made misinformation" predates LLMs, models like GPT-4, Claude 3, and Gemini 1.5 could exacerbate the issue by enabling malicious actors to create personalized content or detect and amplify content persuasive to specific demographics. Previous studies show LLMs are persuasive in generative settings, but assessing their argument generation capacity requires continuous human experimentation, which is time-consuming. In contrast, measuring LLMs' ability to detect content persuasive to specific demographics is more efficient. This study investigates whether LLMs can detect content persuasive to individuals with specific demographics or beliefs. The researchers focus on three research questions: (1) Can LLMs judge the quality of arguments and identify convincing arguments? (2) Can LLMs judge how demographics and beliefs influence people's stances on specific topics? (3) Can LLMs determine how arguments appeal to individuals based on their demographics? To investigate these questions, the researchers extended a dataset from a defunct debate platform, annotating 833 politics-related debates with clear propositions. Each debate contains arguments for and against the proposition, along with votes from participants. The dataset includes demographic information and stances on 48 "big issues." For 121 debates with 751 votes on three prominent topics, the researchers obtained crowdsourced labels to compare LLMs' capabilities with humans. Using this enriched dataset, they evaluated four LLMs (GPT-3.5, GPT-4, Llama 2, and Mistral 7B) on three tasks: (1) identifying the side with more convincing arguments (RQ1); (2) predicting individuals' stances on specific propositions before the debate (RQ2); and (3) predicting individuals' stances on specific propositions after the debate (RQ3). Key findings: LLMs exhibit human-like performance across the three tasks. In judging the better debater (R
Reach us at info@futurestudyspace.com