This paper presents two approaches for improving sequence learning with recurrent networks using unlabeled data: next-step prediction and sequence autoencoding. These methods can be used as a pretraining step for supervised sequence learning. The parameters obtained from the unsupervised step can be used as a starting point for other supervised training models. Experiments show that LSTMs pretrained with these methods are more stable and generalize better. With pretraining, LSTMs can be trained up to a few hundred timesteps, achieving strong performance in text classification tasks such as IMDB, DBpedia, and 20 Newsgroups.
The paper introduces sequence autoencoders and recurrent language models as pretraining methods. Sequence autoencoders use an RNN to read an input sequence into a hidden state, which is then used to reconstruct the original sequence. Recurrent language models predict the next word in a sequence. These methods can be used to initialize standard LSTMs, improving training and generalization.
Experiments on text classification and sentiment analysis tasks show that LSTMs pretrained with these methods outperform randomly initialized LSTMs. Additionally, using more unlabeled data from related tasks in pretraining improves the generalization of subsequent supervised models. For example, using unlabeled Amazon reviews to pretrain sequence autoencoders improved classification accuracy on Rotten Tomatoes from 79.7% to 83.3%.
The paper also discusses the advantages of the semi-supervised approach over other unsupervised sequence learning methods. It is particularly effective because it allows for easy fine-tuning. The semi-supervised approach is related to Skip-Thought vectors, but differs in that Skip-Thought is a harder objective and is a pure unsupervised learning algorithm without fine-tuning.
The paper presents experiments on various tasks, including sentiment analysis with IMDB and Rotten Tomatoes, text classification with 20 Newsgroups and DBpedia, and object classification with CIFAR-10. Results show that SA-LSTMs and LM-LSTMs achieve strong performance, often surpassing previous baselines. The use of unlabeled data significantly improves performance, demonstrating the effectiveness of semi-supervised learning in improving supervised learning.This paper presents two approaches for improving sequence learning with recurrent networks using unlabeled data: next-step prediction and sequence autoencoding. These methods can be used as a pretraining step for supervised sequence learning. The parameters obtained from the unsupervised step can be used as a starting point for other supervised training models. Experiments show that LSTMs pretrained with these methods are more stable and generalize better. With pretraining, LSTMs can be trained up to a few hundred timesteps, achieving strong performance in text classification tasks such as IMDB, DBpedia, and 20 Newsgroups.
The paper introduces sequence autoencoders and recurrent language models as pretraining methods. Sequence autoencoders use an RNN to read an input sequence into a hidden state, which is then used to reconstruct the original sequence. Recurrent language models predict the next word in a sequence. These methods can be used to initialize standard LSTMs, improving training and generalization.
Experiments on text classification and sentiment analysis tasks show that LSTMs pretrained with these methods outperform randomly initialized LSTMs. Additionally, using more unlabeled data from related tasks in pretraining improves the generalization of subsequent supervised models. For example, using unlabeled Amazon reviews to pretrain sequence autoencoders improved classification accuracy on Rotten Tomatoes from 79.7% to 83.3%.
The paper also discusses the advantages of the semi-supervised approach over other unsupervised sequence learning methods. It is particularly effective because it allows for easy fine-tuning. The semi-supervised approach is related to Skip-Thought vectors, but differs in that Skip-Thought is a harder objective and is a pure unsupervised learning algorithm without fine-tuning.
The paper presents experiments on various tasks, including sentiment analysis with IMDB and Rotten Tomatoes, text classification with 20 Newsgroups and DBpedia, and object classification with CIFAR-10. Results show that SA-LSTMs and LM-LSTMs achieve strong performance, often surpassing previous baselines. The use of unlabeled data significantly improves performance, demonstrating the effectiveness of semi-supervised learning in improving supervised learning.