Understanding Can Large Language Models Identify Authorship%3F

The paper explores the potential of Large Language Models (LLMs) in authorship analysis, a critical task for verifying content authenticity and combating misinformation. Traditional methods rely on hand-crafted stylistic features, while state-of-the-art approaches use text embeddings from pre-trained language models, often requiring fine-tuning on labeled data. This work addresses three key research questions: (1) Can LLMs perform zero-shot, end-to-end authorship verification effectively? (2) Are LLMs capable of accurately attributing authorship among multiple candidates? (3) Can LLMs provide explainability in authorship analysis, particularly through linguistic features? The authors introduce Linguistically Informed Prompting (LIP), a technique that guides LLMs to leverage linguistic features for more accurate and explainable authorship analysis. Empirical evaluations on the Blog and Enron Email datasets demonstrate that LLMs, especially GPT-4 Turbo, outperform traditional models and BERT-based methods in authorship verification and attribution tasks. The study highlights the importance of linguistic guidance and the versatility of LLMs in handling complex authorship analysis tasks, setting a new benchmark for future research. However, limitations include the method's performance with an increasing number of candidate authors and the lack of deeper explainability at the neuronal level.The paper explores the potential of Large Language Models (LLMs) in authorship analysis, a critical task for verifying content authenticity and combating misinformation. Traditional methods rely on hand-crafted stylistic features, while state-of-the-art approaches use text embeddings from pre-trained language models, often requiring fine-tuning on labeled data. This work addresses three key research questions: (1) Can LLMs perform zero-shot, end-to-end authorship verification effectively? (2) Are LLMs capable of accurately attributing authorship among multiple candidates? (3) Can LLMs provide explainability in authorship analysis, particularly through linguistic features? The authors introduce Linguistically Informed Prompting (LIP), a technique that guides LLMs to leverage linguistic features for more accurate and explainable authorship analysis. Empirical evaluations on the Blog and Enron Email datasets demonstrate that LLMs, especially GPT-4 Turbo, outperform traditional models and BERT-based methods in authorship verification and attribution tasks. The study highlights the importance of linguistic guidance and the versatility of LLMs in handling complex authorship analysis tasks, setting a new benchmark for future research. However, limitations include the method's performance with an increasing number of candidate authors and the lack of deeper explainability at the neuronal level.

Can Large Language Models Identify Authorship?

22 Oct 2024 | Baixiang Huang1 Canyu Chen1 Kai Shu2*