Accuracy Comparison between Five Machine Learning Algorithms for Financial Risk Evaluation

Accuracy Comparison between Five Machine Learning Algorithms for Financial Risk Evaluation

29 January 2024 | Haokun Dong, Rui Liu and Allan W. Tham *
This study evaluates the performance of five machine learning algorithms—k-nearest neighbors, naïve-bayes, decision tree, logistic regression, and random forest—in predicting loan default. The research focuses on large datasets and employs non-parametric approaches, emphasizing data preprocessing techniques such as normalization, standardization, imputation, and handling imbalanced data using SMOTE. The study compares various performance metrics, including accuracy, precision, recall, F1 scores, and ROC-AUC. The results show that random forest outperforms the other four classifiers in both training and prediction phases. The study also highlights the importance of hyper-parameter tuning and the impact of data size, complexity, multicollinearity, feature relevance, and imbalanced classes on model performance. The findings suggest that data domain knowledge is crucial for selecting the appropriate classifier and that investing in computing resources to find the best hyper-parameters is essential for achieving optimal model performance.This study evaluates the performance of five machine learning algorithms—k-nearest neighbors, naïve-bayes, decision tree, logistic regression, and random forest—in predicting loan default. The research focuses on large datasets and employs non-parametric approaches, emphasizing data preprocessing techniques such as normalization, standardization, imputation, and handling imbalanced data using SMOTE. The study compares various performance metrics, including accuracy, precision, recall, F1 scores, and ROC-AUC. The results show that random forest outperforms the other four classifiers in both training and prediction phases. The study also highlights the importance of hyper-parameter tuning and the impact of data size, complexity, multicollinearity, feature relevance, and imbalanced classes on model performance. The findings suggest that data domain knowledge is crucial for selecting the appropriate classifier and that investing in computing resources to find the best hyper-parameters is essential for achieving optimal model performance.
Reach us at info@study.space