Detecting Anomalies in Blockchain Transactions using Machine Learning Classifiers and Explainability Analysis

Detecting Anomalies in Blockchain Transactions using Machine Learning Classifiers and Explainability Analysis

7 Jan 2024 | Mohammad Hasan, Mohammad Shahriar Rahman, Helge Janicke, Iqbal H. Sarker
This study proposes an approach for detecting anomalous Bitcoin transactions using machine learning classifiers and explainability analysis. The research addresses the challenge of imbalanced data in blockchain transactions, where anomalous transactions are rare compared to normal ones. To overcome this, the study introduces an under-sampling algorithm called XGBCLUS, which is compared with other under-sampling and over-sampling techniques. Additionally, the study evaluates the performance of tree-based ensemble classifiers, including stacking and voting methods, for anomaly detection. The study also employs SHAP (Shapley Additive exPlanation) to explain the model's predictions and identify the most influential features in classifying Bitcoin transactions. The dataset used consists of 30,248,134 Bitcoin transactions, with 108 labeled as anomalous and the rest as normal. After preprocessing and feature selection, the dataset is imbalanced, with a high ratio of normal to anomalous transactions. To address this, the study applies various under-sampling and over-sampling techniques, including XGBCLUS, SMOTE, ADASYN, SMOTEENN, and SMOTETOMEK. The performance of these techniques is evaluated using metrics such as accuracy, TPR (True Positive Rate), FPR (False Positive Rate), and ROC-AUC. The study compares the performance of single classifiers (e.g., Decision Tree, Gradient Boosting, Random Forest, Adaptive Boosting) with ensemble classifiers (stacking and voting). The results show that XGBCLUS outperforms other under-sampling techniques in terms of TPR and ROC-AUC scores. Additionally, the stacking and voting ensemble classifiers demonstrate superior performance compared to individual classifiers in terms of accuracy, TPR, and FPR. The study also introduces a set of rules derived from a tree-based model to interpret the results of the anomaly detection process. These rules help in understanding whether a Bitcoin transaction is anomalous or not. The SHAP method is used to explain the model's predictions and identify the most important features contributing to the classification of transactions. Overall, the study demonstrates that the proposed XGBCLUS under-sampling technique and ensemble classifiers significantly improve the detection of anomalous Bitcoin transactions. The use of SHAP for explainability analysis enhances the interpretability of the models, making them more transparent and trustworthy for applications in blockchain systems.This study proposes an approach for detecting anomalous Bitcoin transactions using machine learning classifiers and explainability analysis. The research addresses the challenge of imbalanced data in blockchain transactions, where anomalous transactions are rare compared to normal ones. To overcome this, the study introduces an under-sampling algorithm called XGBCLUS, which is compared with other under-sampling and over-sampling techniques. Additionally, the study evaluates the performance of tree-based ensemble classifiers, including stacking and voting methods, for anomaly detection. The study also employs SHAP (Shapley Additive exPlanation) to explain the model's predictions and identify the most influential features in classifying Bitcoin transactions. The dataset used consists of 30,248,134 Bitcoin transactions, with 108 labeled as anomalous and the rest as normal. After preprocessing and feature selection, the dataset is imbalanced, with a high ratio of normal to anomalous transactions. To address this, the study applies various under-sampling and over-sampling techniques, including XGBCLUS, SMOTE, ADASYN, SMOTEENN, and SMOTETOMEK. The performance of these techniques is evaluated using metrics such as accuracy, TPR (True Positive Rate), FPR (False Positive Rate), and ROC-AUC. The study compares the performance of single classifiers (e.g., Decision Tree, Gradient Boosting, Random Forest, Adaptive Boosting) with ensemble classifiers (stacking and voting). The results show that XGBCLUS outperforms other under-sampling techniques in terms of TPR and ROC-AUC scores. Additionally, the stacking and voting ensemble classifiers demonstrate superior performance compared to individual classifiers in terms of accuracy, TPR, and FPR. The study also introduces a set of rules derived from a tree-based model to interpret the results of the anomaly detection process. These rules help in understanding whether a Bitcoin transaction is anomalous or not. The SHAP method is used to explain the model's predictions and identify the most important features contributing to the classification of transactions. Overall, the study demonstrates that the proposed XGBCLUS under-sampling technique and ensemble classifiers significantly improve the detection of anomalous Bitcoin transactions. The use of SHAP for explainability analysis enhances the interpretability of the models, making them more transparent and trustworthy for applications in blockchain systems.
Reach us at info@study.space