A SURVEY OF TEXT CLASSIFICATION ALGORITHMS

A SURVEY OF TEXT CLASSIFICATION ALGORITHMS

2012 | Charu C. Aggarwal, ChengXiang Zhai
This chapter provides an overview of text classification algorithms, a problem that has been extensively studied in data mining, machine learning, databases, and information retrieval. The classification problem involves using a set of labeled training records to construct a model that can predict the class label for new, unlabeled instances. The chapter discusses both hard and soft versions of the classification problem, where the former assigns a specific label and the latter assigns a probability value. Text classification is particularly relevant due to the large domain size of text data, which includes both the presence or absence of words and the frequency of words. The chapter also highlights applications of text classification in various domains, such as news filtering and organization, document retrieval, and scientific literature management. Additionally, it mentions several toolkits and software that have been developed to implement these techniques, including BOW, Mallot, WEKA, and LingPipe.This chapter provides an overview of text classification algorithms, a problem that has been extensively studied in data mining, machine learning, databases, and information retrieval. The classification problem involves using a set of labeled training records to construct a model that can predict the class label for new, unlabeled instances. The chapter discusses both hard and soft versions of the classification problem, where the former assigns a specific label and the latter assigns a probability value. Text classification is particularly relevant due to the large domain size of text data, which includes both the presence or absence of words and the frequency of words. The chapter also highlights applications of text classification in various domains, such as news filtering and organization, document retrieval, and scientific literature management. Additionally, it mentions several toolkits and software that have been developed to implement these techniques, including BOW, Mallot, WEKA, and LingPipe.
Reach us at info@study.space