22 May 2024 | Ayad E. Korial, Ivan Isho Gorial, and Amjad J. Humaidi
This paper presents an improved ensemble-based cardiovascular disease (CVD) detection system that integrates chi-square feature selection to enhance prediction accuracy and reduce computational load. The system employs a voting ensemble model combining multiple machine learning (ML) classifiers, including logistic regression (LR), random forest (RF), naive Bayes (NB), and k-nearest neighbor (KNN). The chi-square feature selection method is applied to the Cleveland heart disease dataset to identify the five most important features, reducing the number of features from 13 to 5 and significantly improving model performance. The voting ensemble model achieved an accuracy of 92.11%, which is an average improvement of 2.95% over the best individual classifier (LR). The system demonstrates superior performance in terms of accuracy, specificity, sensitivity, and F1-score compared to existing methods. The chi-square feature selection method not only enhances the model's accuracy but also reduces computational complexity by more than 50%. The results show that the proposed system is effective in early CVD detection and can be applied in practical healthcare settings. The study contributes to the development of more accurate and efficient CVD prediction systems by combining ensemble learning with feature selection techniques. The proposed approach is scalable and efficient, making it suitable for real-world applications in medical diagnostics.This paper presents an improved ensemble-based cardiovascular disease (CVD) detection system that integrates chi-square feature selection to enhance prediction accuracy and reduce computational load. The system employs a voting ensemble model combining multiple machine learning (ML) classifiers, including logistic regression (LR), random forest (RF), naive Bayes (NB), and k-nearest neighbor (KNN). The chi-square feature selection method is applied to the Cleveland heart disease dataset to identify the five most important features, reducing the number of features from 13 to 5 and significantly improving model performance. The voting ensemble model achieved an accuracy of 92.11%, which is an average improvement of 2.95% over the best individual classifier (LR). The system demonstrates superior performance in terms of accuracy, specificity, sensitivity, and F1-score compared to existing methods. The chi-square feature selection method not only enhances the model's accuracy but also reduces computational complexity by more than 50%. The results show that the proposed system is effective in early CVD detection and can be applied in practical healthcare settings. The study contributes to the development of more accurate and efficient CVD prediction systems by combining ensemble learning with feature selection techniques. The proposed approach is scalable and efficient, making it suitable for real-world applications in medical diagnostics.