[slides and audio] Text Classification Algorithms%3A A Survey

This paper provides a comprehensive survey of text classification algorithms, covering various aspects such as text feature extraction, dimensionality reduction, classification techniques, and evaluation methods. The authors discuss the importance of text classification in real-world applications and highlight the challenges in selecting suitable algorithms and techniques for different tasks. They categorize text classification into four levels: document, paragraph, sentence, and sub-sentence levels. The paper also explores different text feature extraction methods, including term frequency-inverse document frequency (TF-IDF), word embeddings like Word2Vec and GloVe, and other techniques such as bag-of-words and n-grams. Dimensionality reduction techniques like principal component analysis (PCA), linear discriminant analysis (LDA), and non-negative matrix factorization (NMF) are discussed, along with random projection and autoencoders. The paper also covers various classification algorithms, including logistic regression, Naïve Bayes, support vector machines (SVM), decision trees, and deep learning approaches. Evaluation methods such as Fβ score, Matthews Correlation Coefficient (MCC), and area under the ROC curve (AUC) are discussed. The authors also highlight the limitations of each technique and their applications in real-world problems, emphasizing the importance of selecting appropriate methods based on the specific requirements of the task. The paper concludes with a discussion on the future directions of text classification research and the need for further advancements in handling complex and high-dimensional text data.This paper provides a comprehensive survey of text classification algorithms, covering various aspects such as text feature extraction, dimensionality reduction, classification techniques, and evaluation methods. The authors discuss the importance of text classification in real-world applications and highlight the challenges in selecting suitable algorithms and techniques for different tasks. They categorize text classification into four levels: document, paragraph, sentence, and sub-sentence levels. The paper also explores different text feature extraction methods, including term frequency-inverse document frequency (TF-IDF), word embeddings like Word2Vec and GloVe, and other techniques such as bag-of-words and n-grams. Dimensionality reduction techniques like principal component analysis (PCA), linear discriminant analysis (LDA), and non-negative matrix factorization (NMF) are discussed, along with random projection and autoencoders. The paper also covers various classification algorithms, including logistic regression, Naïve Bayes, support vector machines (SVM), decision trees, and deep learning approaches. Evaluation methods such as Fβ score, Matthews Correlation Coefficient (MCC), and area under the ROC curve (AUC) are discussed. The authors also highlight the limitations of each technique and their applications in real-world problems, emphasizing the importance of selecting appropriate methods based on the specific requirements of the task. The paper concludes with a discussion on the future directions of text classification research and the need for further advancements in handling complex and high-dimensional text data.

Text Classification Algorithms: A Survey

23 April 2019 | Kamran Kowsari, Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes, and Donald Brown