This study aims to analyze Amazon product reviews using various natural language processing (NLP) techniques, including machine learning (ML), ensemble learning (EL), and deep learning (DL) methods. The primary objective is to accurately classify sentiments into positive, negative, or neutral categories. The research involves a comprehensive workflow, including data collection, preprocessing, feature extraction, and model training and evaluation.
The dataset used consists of 400,000 Amazon reviews across five product categories: mobile electronics, furniture, camera, grocery, and watches. The reviews are preprocessed to handle missing values, convert text to lowercase, remove stop words, and tokenize the text. Feature extraction techniques such as Bag-of-Words (BoW) and TF-IDF are applied to transform the text into numerical vectors.
The study evaluates multiple ML algorithms, including Multinomial Naive Bayes (MNB), Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR), as well as ensemble learning techniques like bagging. Deep learning models such as Convolutional Neural Networks (CNNs), Bidirectional Long Short-Term Memory (Bi-LSTM), and transformer-based models (BERT and XLNet) are also explored.
The results show that the BERT model outperforms other models, achieving an accuracy rate of 89%. The study also discusses the limitations and future directions, suggesting the need for more advanced techniques to handle context and semantics, and expanding the analysis to different languages and cultural contexts. The research provides valuable insights for both consumers and businesses, enhancing product and service quality through informed decision-making.This study aims to analyze Amazon product reviews using various natural language processing (NLP) techniques, including machine learning (ML), ensemble learning (EL), and deep learning (DL) methods. The primary objective is to accurately classify sentiments into positive, negative, or neutral categories. The research involves a comprehensive workflow, including data collection, preprocessing, feature extraction, and model training and evaluation.
The dataset used consists of 400,000 Amazon reviews across five product categories: mobile electronics, furniture, camera, grocery, and watches. The reviews are preprocessed to handle missing values, convert text to lowercase, remove stop words, and tokenize the text. Feature extraction techniques such as Bag-of-Words (BoW) and TF-IDF are applied to transform the text into numerical vectors.
The study evaluates multiple ML algorithms, including Multinomial Naive Bayes (MNB), Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR), as well as ensemble learning techniques like bagging. Deep learning models such as Convolutional Neural Networks (CNNs), Bidirectional Long Short-Term Memory (Bi-LSTM), and transformer-based models (BERT and XLNet) are also explored.
The results show that the BERT model outperforms other models, achieving an accuracy rate of 89%. The study also discusses the limitations and future directions, suggesting the need for more advanced techniques to handle context and semantics, and expanding the analysis to different languages and cultural contexts. The research provides valuable insights for both consumers and businesses, enhancing product and service quality through informed decision-making.