Understanding Your Large Language Models Are Leaving Fingerprints

The paper "Your Large Language Models Are Leaving Fingerprints" by Hope McGovern, Rickard Stureborg, Yoshi Suhara, Dimitris Alikaniotis, and others explores the unique linguistic characteristics of large language models (LLMs) that can be used to distinguish machine-generated text from human-generated text. The authors analyze five datasets and find that LLMs exhibit distinct "fingerprints" in their output, which manifest as differences in the frequency of certain lexical and morphosyntactic features. These fingerprints are robust across different domains and model families, and they can be visualized and used to detect machine-generated text. The study also shows that fine-tuning models for chat tasks makes them easier to detect compared to standard language models, suggesting that these fingerprints may be induced by the training data. The paper discusses the implications of these findings for machine-generated text detection and highlights the need for further research to develop more robust methods.The paper "Your Large Language Models Are Leaving Fingerprints" by Hope McGovern, Rickard Stureborg, Yoshi Suhara, Dimitris Alikaniotis, and others explores the unique linguistic characteristics of large language models (LLMs) that can be used to distinguish machine-generated text from human-generated text. The authors analyze five datasets and find that LLMs exhibit distinct "fingerprints" in their output, which manifest as differences in the frequency of certain lexical and morphosyntactic features. These fingerprints are robust across different domains and model families, and they can be visualized and used to detect machine-generated text. The study also shows that fine-tuning models for chat tasks makes them easier to detect compared to standard language models, suggesting that these fingerprints may be induced by the training data. The paper discusses the implications of these findings for machine-generated text detection and highlights the need for further research to develop more robust methods.

Your Large Language Models Are Leaving Fingerprints

22 May 2024 | Hope McGovern, Rickard Stureborg, Yoshi Suhara, Dimitris Alikaniotis