28 February 2024 | Xie He, Arash Habibi Lashkari, Nikhill Vombatkere, Dilli Prasad Sharma
This paper provides a comprehensive survey of authorship attribution methods, challenges, and future research directions. It classifies authorship attribution into five categories: stylistic, statistical, language, machine learning, and deep learning. The paper discusses various feature types used in authorship attribution, including lexical, syntactic, and semantic features, as well as the datasets and evaluation metrics used in the field. It also highlights the challenges and limitations of existing methods, such as the difficulty of capturing long-distance dependencies in text, the assumption that documents are written by a single author, and the lack of interpretability in some models. The paper also discusses potential future research directions, including the development of more robust and interpretable models, the use of larger and more balanced datasets, and the integration of multiple feature types. The survey also covers the application of authorship attribution in software forensics, plagiarism detection, and security attack detection. The paper concludes that deep learning models show promise in capturing more distinct features and providing better results for complex attribution tasks. The authors emphasize the importance of further research in this area to improve the accuracy and reliability of authorship attribution methods.This paper provides a comprehensive survey of authorship attribution methods, challenges, and future research directions. It classifies authorship attribution into five categories: stylistic, statistical, language, machine learning, and deep learning. The paper discusses various feature types used in authorship attribution, including lexical, syntactic, and semantic features, as well as the datasets and evaluation metrics used in the field. It also highlights the challenges and limitations of existing methods, such as the difficulty of capturing long-distance dependencies in text, the assumption that documents are written by a single author, and the lack of interpretability in some models. The paper also discusses potential future research directions, including the development of more robust and interpretable models, the use of larger and more balanced datasets, and the integration of multiple feature types. The survey also covers the application of authorship attribution in software forensics, plagiarism detection, and security attack detection. The paper concludes that deep learning models show promise in capturing more distinct features and providing better results for complex attribution tasks. The authors emphasize the importance of further research in this area to improve the accuracy and reliability of authorship attribution methods.