18 March 2024 | Sarder Abdulla Al Shiam, Md Mahdi Hasan, Md Jubair Pantho, Sarmin Akter Shochona, Md Boktiar Nayeem, M Tazwar Hossain Choudhury and Tuan Ngoc Nguyen
This research article explores the use of Explainable AI (XAI) in credit risk prediction, aiming to develop models that are both accurate and interpretable. The study employs tree-based ensemble methods, with XGBoost identified as the most effective model. To enhance explainability, SHapley Additive exPlanations (SHAP) values are used to interpret the model's predictions, providing insights into the contribution of each feature. The research utilizes data from the Lending Club, a US-based peer-to-peer lending platform, to train and evaluate the models. The dataset includes over 2.2 million loans with a 3-year term, and the primary goal is to distinguish between default and non-default loans.
The study compares the performance of four machine learning models: Decision Tree, Light Gradient Boosting Machine (LGBM), Random Forests, and XGBoost. Results show that XGBoost outperforms the other models in terms of accuracy, recall, and AUC. However, the models face challenges due to the imbalanced nature of the credit default data, which affects precision values. The study emphasizes the importance of model explainability, using SHAP values to provide insights into the model's decision-making process. This approach not only enhances model performance but also meets regulatory and ethical requirements by ensuring transparency and fairness in lending decisions. The research concludes that XGBoost is the most effective model for credit risk prediction, and the use of SHAP values provides a clear explanation of the model's predictions, making it suitable for practical industrial applications. The study highlights the need for further research to refine and validate these methods on diverse datasets.This research article explores the use of Explainable AI (XAI) in credit risk prediction, aiming to develop models that are both accurate and interpretable. The study employs tree-based ensemble methods, with XGBoost identified as the most effective model. To enhance explainability, SHapley Additive exPlanations (SHAP) values are used to interpret the model's predictions, providing insights into the contribution of each feature. The research utilizes data from the Lending Club, a US-based peer-to-peer lending platform, to train and evaluate the models. The dataset includes over 2.2 million loans with a 3-year term, and the primary goal is to distinguish between default and non-default loans.
The study compares the performance of four machine learning models: Decision Tree, Light Gradient Boosting Machine (LGBM), Random Forests, and XGBoost. Results show that XGBoost outperforms the other models in terms of accuracy, recall, and AUC. However, the models face challenges due to the imbalanced nature of the credit default data, which affects precision values. The study emphasizes the importance of model explainability, using SHAP values to provide insights into the model's decision-making process. This approach not only enhances model performance but also meets regulatory and ethical requirements by ensuring transparency and fairness in lending decisions. The research concludes that XGBoost is the most effective model for credit risk prediction, and the use of SHAP values provides a clear explanation of the model's predictions, making it suitable for practical industrial applications. The study highlights the need for further research to refine and validate these methods on diverse datasets.