Accuracy Comparison between Five Machine Learning Algorithms for Financial Risk Evaluation

Accuracy Comparison between Five Machine Learning Algorithms for Financial Risk Evaluation

29 January 2024 | Haokun Dong, Rui Liu and Allan W. Tham
This study compares the performance of five machine learning algorithms—k-nearest neighbour, naive-bayes, decision tree, logistic regression, and random forest—in predicting loan defaults for financial risk evaluation. The research uses two datasets: one with 10,000 observations and another with 30,000 observations, with an 80:20 training/testing split. The study evaluates various performance metrics, including accuracy, precision, recall, F1 score, and ROC-AUC. Data preprocessing steps such as normalization, standardization, missing value imputation, and handling imbalanced data using SMOTE are implemented. The study also examines the impact of hyperparameters on model performance. The results show that random forest outperforms the other four classifiers in both training and actual prediction. The study highlights the importance of data preprocessing and hyperparameter tuning in achieving accurate predictions. It also demonstrates that the performance of classifiers can vary significantly depending on the data size, complexity, and patterns. While random forest performs well even with imbalanced data, decision trees tend to overfit with larger datasets. The study concludes that data preprocessing, including feature selection and handling imbalanced data, is crucial for accurate financial risk evaluation. Additionally, the study emphasizes the importance of selecting appropriate hyperparameters and using sufficient computational resources to optimize model performance. The findings suggest that random forest is the most effective classifier for financial risk evaluation, particularly in handling imbalanced data and large datasets.This study compares the performance of five machine learning algorithms—k-nearest neighbour, naive-bayes, decision tree, logistic regression, and random forest—in predicting loan defaults for financial risk evaluation. The research uses two datasets: one with 10,000 observations and another with 30,000 observations, with an 80:20 training/testing split. The study evaluates various performance metrics, including accuracy, precision, recall, F1 score, and ROC-AUC. Data preprocessing steps such as normalization, standardization, missing value imputation, and handling imbalanced data using SMOTE are implemented. The study also examines the impact of hyperparameters on model performance. The results show that random forest outperforms the other four classifiers in both training and actual prediction. The study highlights the importance of data preprocessing and hyperparameter tuning in achieving accurate predictions. It also demonstrates that the performance of classifiers can vary significantly depending on the data size, complexity, and patterns. While random forest performs well even with imbalanced data, decision trees tend to overfit with larger datasets. The study concludes that data preprocessing, including feature selection and handling imbalanced data, is crucial for accurate financial risk evaluation. Additionally, the study emphasizes the importance of selecting appropriate hyperparameters and using sufficient computational resources to optimize model performance. The findings suggest that random forest is the most effective classifier for financial risk evaluation, particularly in handling imbalanced data and large datasets.
Reach us at info@study.space
[slides and audio] Accuracy Comparison between Five Machine Learning Algorithms for Financial Risk Evaluation