The paper "Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey" by Xie He, Arash Habibi Lashkari, Nikhil Vombatkere, and Dilli Prasad Sharma provides a comprehensive overview of authorship attribution methods, their applications, and future research directions. The authors categorize the existing techniques into five main groups: stylistic, statistical, language, machine learning, and deep learning models. Each category is detailed with specific methods and their performance, datasets, challenges, and limitations.
Key contributions of the paper include:
1. Extensive survey of state-of-the-art authorship attribution methods.
2. Classification of methods based on model characteristics.
3. Discussion of available datasets and evaluation criteria.
4. Analysis of challenges and limitations.
5. Identification of potential future research directions.
The paper also discusses various feature types used in authorship attribution, including text-based and code-based features, and their effectiveness in different contexts. Text-based features cover lexical, syntactic, semantic, n-gram-based, content-specific, and application-specific aspects, while code-based features focus on programming style and structure.
The authors highlight the strengths and weaknesses of each method, emphasizing the need for more interpretable and explainable models. They also suggest that deep learning models can capture more distinct features and provide better results for complex attribution problems. The paper concludes with a synthesis of the existing techniques and a taxonomy of feature types, providing a comprehensive resource for researchers and practitioners in the field of authorship attribution.The paper "Authorship Attribution Methods, Challenges, and Future Research Directions: A Comprehensive Survey" by Xie He, Arash Habibi Lashkari, Nikhil Vombatkere, and Dilli Prasad Sharma provides a comprehensive overview of authorship attribution methods, their applications, and future research directions. The authors categorize the existing techniques into five main groups: stylistic, statistical, language, machine learning, and deep learning models. Each category is detailed with specific methods and their performance, datasets, challenges, and limitations.
Key contributions of the paper include:
1. Extensive survey of state-of-the-art authorship attribution methods.
2. Classification of methods based on model characteristics.
3. Discussion of available datasets and evaluation criteria.
4. Analysis of challenges and limitations.
5. Identification of potential future research directions.
The paper also discusses various feature types used in authorship attribution, including text-based and code-based features, and their effectiveness in different contexts. Text-based features cover lexical, syntactic, semantic, n-gram-based, content-specific, and application-specific aspects, while code-based features focus on programming style and structure.
The authors highlight the strengths and weaknesses of each method, emphasizing the need for more interpretable and explainable models. They also suggest that deep learning models can capture more distinct features and provide better results for complex attribution problems. The paper concludes with a synthesis of the existing techniques and a taxonomy of feature types, providing a comprehensive resource for researchers and practitioners in the field of authorship attribution.