An Improved Ensemble-Based Cardiovascular Disease Detection System with Chi-Square Feature Selection

An Improved Ensemble-Based Cardiovascular Disease Detection System with Chi-Square Feature Selection

22 May 2024 | Ayad E. Korial, Ivan Isho Gorial, Amjad J. Humaidi
This paper presents an improved cardiovascular disease (CVD) detection system using a voting ensemble of machine learning (ML) classifiers and chi-square feature selection. The system aims to enhance early detection of CVD, a leading cause of global mortality. The authors applied multiple ML classifiers—naïve Bayes, random forest, logistic regression (LR), and k-nearest neighbor (KNN)—and evaluated their performance using metrics such as accuracy, specificity, sensitivity, F1-score, confusion matrix, and AUC. The ensemble model combines the predictions from these classifiers through a voting mechanism, improving overall accuracy and reducing computational load by more than 50%. The chi-square feature selection method was applied to the Cleveland cardiac disease dataset to identify the 5 most important features, further enhancing the model's performance. The voting ensemble model achieved an accuracy of 92.11%, representing a 2.95% improvement over the single highest classifier (LR). The study demonstrates the effectiveness of the ensemble method in improving CVD prediction accuracy and computational efficiency.This paper presents an improved cardiovascular disease (CVD) detection system using a voting ensemble of machine learning (ML) classifiers and chi-square feature selection. The system aims to enhance early detection of CVD, a leading cause of global mortality. The authors applied multiple ML classifiers—naïve Bayes, random forest, logistic regression (LR), and k-nearest neighbor (KNN)—and evaluated their performance using metrics such as accuracy, specificity, sensitivity, F1-score, confusion matrix, and AUC. The ensemble model combines the predictions from these classifiers through a voting mechanism, improving overall accuracy and reducing computational load by more than 50%. The chi-square feature selection method was applied to the Cleveland cardiac disease dataset to identify the 5 most important features, further enhancing the model's performance. The voting ensemble model achieved an accuracy of 92.11%, representing a 2.95% improvement over the single highest classifier (LR). The study demonstrates the effectiveness of the ensemble method in improving CVD prediction accuracy and computational efficiency.
Reach us at info@study.space