Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions

Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions

July 27-31, 2011 | Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, Christopher D. Manning
This paper introduces a novel machine learning framework based on recursive autoencoders (RAEs) for predicting sentence-level sentiment label distributions. The method learns vector space representations for multi-word phrases and outperforms other state-of-the-art approaches on commonly used datasets like movie reviews without using pre-defined sentiment lexica or polarity shifting rules. The model is evaluated on a new dataset based on confessions from the Experience Project, which consists of personal user stories annotated with multiple labels forming a multinomial distribution of emotional reactions. The model can more accurately predict distributions over these labels compared to several competitive baselines. The RAE model uses continuous word vectors and learns semantic vector representations of phrases and full sentences from unsupervised text. It extends to learn a distribution over sentiment labels at each node of the hierarchy. The model is evaluated on several standard datasets and the Experience Project dataset, achieving state-of-the-art performance. It outperforms competitive baselines in both tasks, including predicting the label with the most votes and the full distribution over sentiment categories. The model is semi-supervised, allowing training on both unlabeled domain data and supervised sentiment data without requiring language-specific sentiment lexica or parsers. It uses a hierarchical structure and compositional semantics to understand sentiment, rather than a simple positive/negative scale. The model predicts a multidimensional distribution over several complex, interconnected sentiments. The RAE model is trained using a greedy algorithm to construct trees from input vectors, minimizing reconstruction errors. It also incorporates a softmax layer to predict class distributions. The model's performance is evaluated using cross-entropy error and reconstruction error, with the final objective combining both. The model is shown to outperform other methods on standard datasets and the Experience Project dataset, demonstrating its effectiveness in capturing complex sentiment distributions. The model's ability to learn semantic vector representations and predict sentiment distributions makes it a promising approach for sentiment analysis tasks.This paper introduces a novel machine learning framework based on recursive autoencoders (RAEs) for predicting sentence-level sentiment label distributions. The method learns vector space representations for multi-word phrases and outperforms other state-of-the-art approaches on commonly used datasets like movie reviews without using pre-defined sentiment lexica or polarity shifting rules. The model is evaluated on a new dataset based on confessions from the Experience Project, which consists of personal user stories annotated with multiple labels forming a multinomial distribution of emotional reactions. The model can more accurately predict distributions over these labels compared to several competitive baselines. The RAE model uses continuous word vectors and learns semantic vector representations of phrases and full sentences from unsupervised text. It extends to learn a distribution over sentiment labels at each node of the hierarchy. The model is evaluated on several standard datasets and the Experience Project dataset, achieving state-of-the-art performance. It outperforms competitive baselines in both tasks, including predicting the label with the most votes and the full distribution over sentiment categories. The model is semi-supervised, allowing training on both unlabeled domain data and supervised sentiment data without requiring language-specific sentiment lexica or parsers. It uses a hierarchical structure and compositional semantics to understand sentiment, rather than a simple positive/negative scale. The model predicts a multidimensional distribution over several complex, interconnected sentiments. The RAE model is trained using a greedy algorithm to construct trees from input vectors, minimizing reconstruction errors. It also incorporates a softmax layer to predict class distributions. The model's performance is evaluated using cross-entropy error and reconstruction error, with the final objective combining both. The model is shown to outperform other methods on standard datasets and the Experience Project dataset, demonstrating its effectiveness in capturing complex sentiment distributions. The model's ability to learn semantic vector representations and predict sentiment distributions makes it a promising approach for sentiment analysis tasks.
Reach us at info@futurestudyspace.com