Character-level Convolutional Networks for Text Classification

Character-level Convolutional Networks for Text Classification

4 Apr 2016 | Xiang Zhang, Junbo Zhao, Yann LeCun
This paper presents an empirical study on the use of character-level convolutional networks (ConvNets) for text classification. The authors constructed several large-scale datasets to demonstrate that character-level ConvNets can achieve state-of-the-art or competitive results. They compared their approach with traditional models such as bag-of-words, n-grams, and their TFIDF variants, as well as deep learning models like word-based ConvNets and recurrent neural networks (RNNs). The study explores treating text as a raw signal at the character level and applying temporal (one-dimensional) ConvNets to it. The authors built several large-scale datasets, ranging from hundreds of thousands to several millions of samples, to evaluate their model's performance. They also used data augmentation techniques, such as replacing words or phrases with their synonyms, to improve generalization. The proposed character-level ConvNet model consists of multiple layers, including convolutional and fully-connected layers. The model uses a modular design with temporal convolutional and max-pooling modules, and employs rectified linear units (ReLUs) for non-linearity. The model is trained using stochastic gradient descent (SGD) with momentum and a learning rate that decreases over time. The authors compared their model with several traditional and deep learning methods, including bag-of-words, n-grams, word2vec-based models, and LSTM. The results showed that character-level ConvNets outperformed traditional methods on large-scale datasets, especially when the dataset size was large. The model was also effective at capturing semantic information without relying on word-level features. The study also examined the impact of different alphabets on model performance. The results indicated that using a single alphabet (without distinguishing between uppercase and lowercase letters) generally performed better for large-scale datasets. The authors concluded that character-level ConvNets are a promising approach for text classification, as they can work without the need for words and are effective at capturing semantic information. The study highlights the importance of dataset size and the potential of ConvNets for real-world applications.This paper presents an empirical study on the use of character-level convolutional networks (ConvNets) for text classification. The authors constructed several large-scale datasets to demonstrate that character-level ConvNets can achieve state-of-the-art or competitive results. They compared their approach with traditional models such as bag-of-words, n-grams, and their TFIDF variants, as well as deep learning models like word-based ConvNets and recurrent neural networks (RNNs). The study explores treating text as a raw signal at the character level and applying temporal (one-dimensional) ConvNets to it. The authors built several large-scale datasets, ranging from hundreds of thousands to several millions of samples, to evaluate their model's performance. They also used data augmentation techniques, such as replacing words or phrases with their synonyms, to improve generalization. The proposed character-level ConvNet model consists of multiple layers, including convolutional and fully-connected layers. The model uses a modular design with temporal convolutional and max-pooling modules, and employs rectified linear units (ReLUs) for non-linearity. The model is trained using stochastic gradient descent (SGD) with momentum and a learning rate that decreases over time. The authors compared their model with several traditional and deep learning methods, including bag-of-words, n-grams, word2vec-based models, and LSTM. The results showed that character-level ConvNets outperformed traditional methods on large-scale datasets, especially when the dataset size was large. The model was also effective at capturing semantic information without relying on word-level features. The study also examined the impact of different alphabets on model performance. The results indicated that using a single alphabet (without distinguishing between uppercase and lowercase letters) generally performed better for large-scale datasets. The authors concluded that character-level ConvNets are a promising approach for text classification, as they can work without the need for words and are effective at capturing semantic information. The study highlights the importance of dataset size and the potential of ConvNets for real-world applications.
Reach us at info@study.space
[slides and audio] Character-level Convolutional Networks for Text Classification