Can Large Language Models Identify Authorship?

Can Large Language Models Identify Authorship?

22 Oct 2024 | Baixiang Huang, Canyu Chen, Kai Shu
Can Large Language Models Identify Authorship? This paper investigates whether Large Language Models (LLMs) can effectively perform authorship verification and attribution without domain-specific fine-tuning. The study addresses three research questions: (1) Can LLMs perform zero-shot, end-to-end authorship verification? (2) Can LLMs accurately attribute authorship among multiple candidates? (3) Can LLMs provide explainability in authorship analysis through linguistic features? The study also proposes a prompting technique called Linguistically Informed Prompting (LIP) to guide LLMs in identifying linguistic features used in forensic linguistics. Authorship analysis is crucial for verifying content authenticity and combating misinformation. Traditional methods rely on hand-crafted features, while state-of-the-art approaches use text embeddings from pre-trained models. However, these methods often require fine-tuning and suffer from performance degradation in cross-domain scenarios. This study demonstrates that LLMs can perform authorship verification and attribution without domain-specific fine-tuning, providing explanations through linguistic feature analysis. The study evaluates LLMs on two datasets: the Enron Email dataset and the Blog Authorship Attribution corpus. Results show that LLMs, particularly GPT-4 Turbo, outperform traditional models like BERT in authorship verification and attribution tasks. The LIP technique enhances LLMs' ability to identify linguistic features, improving their performance and providing explainable authorship analysis. The study also highlights the importance of linguistic guidance in authorship analysis, showing that LLMs can provide clear, focused explanations for their decisions. This approach improves the explainability of authorship predictions and addresses the limitations of traditional methods, such as extensive feature engineering and limited generalization capabilities. The findings demonstrate that LLMs have significant potential in authorship analysis, offering robust solutions for digital forensics, cybersecurity, and combating misinformation. The study sets a new benchmark for future research on LLM-based authorship analysis.Can Large Language Models Identify Authorship? This paper investigates whether Large Language Models (LLMs) can effectively perform authorship verification and attribution without domain-specific fine-tuning. The study addresses three research questions: (1) Can LLMs perform zero-shot, end-to-end authorship verification? (2) Can LLMs accurately attribute authorship among multiple candidates? (3) Can LLMs provide explainability in authorship analysis through linguistic features? The study also proposes a prompting technique called Linguistically Informed Prompting (LIP) to guide LLMs in identifying linguistic features used in forensic linguistics. Authorship analysis is crucial for verifying content authenticity and combating misinformation. Traditional methods rely on hand-crafted features, while state-of-the-art approaches use text embeddings from pre-trained models. However, these methods often require fine-tuning and suffer from performance degradation in cross-domain scenarios. This study demonstrates that LLMs can perform authorship verification and attribution without domain-specific fine-tuning, providing explanations through linguistic feature analysis. The study evaluates LLMs on two datasets: the Enron Email dataset and the Blog Authorship Attribution corpus. Results show that LLMs, particularly GPT-4 Turbo, outperform traditional models like BERT in authorship verification and attribution tasks. The LIP technique enhances LLMs' ability to identify linguistic features, improving their performance and providing explainable authorship analysis. The study also highlights the importance of linguistic guidance in authorship analysis, showing that LLMs can provide clear, focused explanations for their decisions. This approach improves the explainability of authorship predictions and addresses the limitations of traditional methods, such as extensive feature engineering and limited generalization capabilities. The findings demonstrate that LLMs have significant potential in authorship analysis, offering robust solutions for digital forensics, cybersecurity, and combating misinformation. The study sets a new benchmark for future research on LLM-based authorship analysis.
Reach us at info@study.space