Data reduction techniques for highly imbalanced medicare Big Data

Data reduction techniques for highly imbalanced medicare Big Data

(2024) 11:8 | John T. Hancock, Huanjing Wang, Taghi M. Khoshgoftaar, Qianxin Liang
This study evaluates the effectiveness of combining Random Undersampling (RUS) and a novel ensemble supervised feature selection method in optimizing Machine Learning models for Medicare insurance fraud detection. The research focuses on highly imbalanced Big Data, which poses significant challenges in fraud detection. Using datasets from the Centers for Medicare & Medicaid Services (CMS) labeled by the List of Excluded Individuals/Entities (LEIE), the study demonstrates that data reduction techniques significantly improve classification performance. The experimental design systematically investigates various scenarios, including the use of each technique in isolation and in combination. Results show that the synergistic application of both techniques outperforms models using all available features and data, leading to more explainable models. Given the substantial financial implications of Medicare fraud, the findings offer computational advantages and enhance the effectiveness of fraud detection systems, potentially improving healthcare services. The study also highlights the importance of using threshold-agnostic metrics like AUPRC for evaluating classification performance on imbalanced datasets.This study evaluates the effectiveness of combining Random Undersampling (RUS) and a novel ensemble supervised feature selection method in optimizing Machine Learning models for Medicare insurance fraud detection. The research focuses on highly imbalanced Big Data, which poses significant challenges in fraud detection. Using datasets from the Centers for Medicare & Medicaid Services (CMS) labeled by the List of Excluded Individuals/Entities (LEIE), the study demonstrates that data reduction techniques significantly improve classification performance. The experimental design systematically investigates various scenarios, including the use of each technique in isolation and in combination. Results show that the synergistic application of both techniques outperforms models using all available features and data, leading to more explainable models. Given the substantial financial implications of Medicare fraud, the findings offer computational advantages and enhance the effectiveness of fraud detection systems, potentially improving healthcare services. The study also highlights the importance of using threshold-agnostic metrics like AUPRC for evaluating classification performance on imbalanced datasets.
Reach us at info@study.space
[slides] Data reduction techniques for highly imbalanced medicare Big Data | StudySpace