Breast Cancer Prediction Based on Multiple Machine Learning Algorithms

Breast Cancer Prediction Based on Multiple Machine Learning Algorithms

January 22, 2024 | Sheng Zhou, Chujiao Hu, Shanshan Wei, and Xiaofan Yan
This study aims to develop an automated diagnostic system for breast cancer using machine learning algorithms. The research focuses on the Wisconsin Breast Cancer Dataset, which is a widely used dataset for breast cancer classification. The study employs various statistical methods, including Spearman correlation analysis and the Wilcoxon rank sum test, to preprocess and analyze the dataset. Seven machine learning algorithms—decision tree, stochastic gradient descent (SGD), random forest, K-NN, support vector machine (SVM), logistic regression, and AdaBoost—are trained and evaluated on the dataset. The AdaBoost-Logistic algorithm is found to be the most effective, achieving an accuracy of 99.12%, outperforming other algorithms. The study highlights the practical significance of the proposed model in reducing patient waiting times, enhancing diagnostic efficiency, and guiding treatment planning. The research also emphasizes the importance of feature selection and the integration of machine learning techniques in medical diagnosis. Future work will focus on deploying the model on Baidu AI Studio and implementing online learning techniques to improve its accuracy and generalization performance.This study aims to develop an automated diagnostic system for breast cancer using machine learning algorithms. The research focuses on the Wisconsin Breast Cancer Dataset, which is a widely used dataset for breast cancer classification. The study employs various statistical methods, including Spearman correlation analysis and the Wilcoxon rank sum test, to preprocess and analyze the dataset. Seven machine learning algorithms—decision tree, stochastic gradient descent (SGD), random forest, K-NN, support vector machine (SVM), logistic regression, and AdaBoost—are trained and evaluated on the dataset. The AdaBoost-Logistic algorithm is found to be the most effective, achieving an accuracy of 99.12%, outperforming other algorithms. The study highlights the practical significance of the proposed model in reducing patient waiting times, enhancing diagnostic efficiency, and guiding treatment planning. The research also emphasizes the importance of feature selection and the integration of machine learning techniques in medical diagnosis. Future work will focus on deploying the model on Baidu AI Studio and implementing online learning techniques to improve its accuracy and generalization performance.
Reach us at info@study.space