Cost-sensitive learning for imbalanced medical data: a review

Cost-sensitive learning for imbalanced medical data: a review

1 March 2024 | Imane Araf, Ali Idri, Ikram Chairi
This paper provides a comprehensive review of Cost-Sensitive Learning (CSL) for imbalanced medical data, addressing the significant challenge of class imbalance in medical datasets. The review covers 173 papers published from January 2010 to December 2022, sourced from five major digital libraries. The analysis is structured around nine Research Questions (RQs) to explore publication trends, research types, empirical methods, medical disciplines, tasks, CSL approaches, strengths and weaknesses, datasets, data types, evaluation metrics, and development tools. Key findings include a notable rise in publications since 2020, a preference for direct CSL approaches, and the prevalence of medical images as data types. The study highlights the underutilization of cost-related metrics and the dominance of Python as the primary programming tool. The strengths and weaknesses of CSL are discussed, emphasizing its computational efficiency and data distribution preservation. The paper serves as a valuable resource for researchers, providing insights into the current state of CSL in medical data and suggesting future research directions.This paper provides a comprehensive review of Cost-Sensitive Learning (CSL) for imbalanced medical data, addressing the significant challenge of class imbalance in medical datasets. The review covers 173 papers published from January 2010 to December 2022, sourced from five major digital libraries. The analysis is structured around nine Research Questions (RQs) to explore publication trends, research types, empirical methods, medical disciplines, tasks, CSL approaches, strengths and weaknesses, datasets, data types, evaluation metrics, and development tools. Key findings include a notable rise in publications since 2020, a preference for direct CSL approaches, and the prevalence of medical images as data types. The study highlights the underutilization of cost-related metrics and the dominance of Python as the primary programming tool. The strengths and weaknesses of CSL are discussed, emphasizing its computational efficiency and data distribution preservation. The paper serves as a valuable resource for researchers, providing insights into the current state of CSL in medical data and suggesting future research directions.
Reach us at info@study.space