Accepted: 30 January 2024 | Pooja Rani, Rohit Lamba, Ravi Kumar Sachdeva, Karan Kumar, Celestine Iwendi
This paper presents a machine learning model for predicting Alzheimer's disease (AD) using the SMOTE-RF methodology. AD is a neurodegenerative disorder that affects the elderly, with symptoms initially mild but worsening over time. Early diagnosis can help reduce its impacts, despite the lack of a cure. The study evaluates the performance of three machine learning algorithms—decision tree (DT), extreme gradient boosting (XGB), and random forest (RF)—on the Open Access Series of Imaging Studies (OASIS) dataset, available on Kaggle. The dataset is balanced using the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance, a common issue in medical data. Experiments are conducted on both imbalanced and balanced datasets. The results show that DT achieved 73.38% accuracy on the imbalanced dataset and 83.15% accuracy on the balanced dataset. XGB achieved 83.88% and 91.05% accuracy, respectively. RF achieved 87.84% and 95.03% accuracy, respectively. The highest accuracy of 95.03% was achieved with the SMOTE-RF model. The study highlights the importance of early diagnosis and the potential of machine learning in improving patient outcomes and reducing societal and financial burdens associated with AD.This paper presents a machine learning model for predicting Alzheimer's disease (AD) using the SMOTE-RF methodology. AD is a neurodegenerative disorder that affects the elderly, with symptoms initially mild but worsening over time. Early diagnosis can help reduce its impacts, despite the lack of a cure. The study evaluates the performance of three machine learning algorithms—decision tree (DT), extreme gradient boosting (XGB), and random forest (RF)—on the Open Access Series of Imaging Studies (OASIS) dataset, available on Kaggle. The dataset is balanced using the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance, a common issue in medical data. Experiments are conducted on both imbalanced and balanced datasets. The results show that DT achieved 73.38% accuracy on the imbalanced dataset and 83.15% accuracy on the balanced dataset. XGB achieved 83.88% and 91.05% accuracy, respectively. RF achieved 87.84% and 95.03% accuracy, respectively. The highest accuracy of 95.03% was achieved with the SMOTE-RF model. The study highlights the importance of early diagnosis and the potential of machine learning in improving patient outcomes and reducing societal and financial burdens associated with AD.