20 January 2024 | Mustafa Abdul Salam, Khaled M. Fouad, Doaa L. Elbably, Salah M. Elsayed
This paper addresses the challenge of credit card fraud detection (CCFD) by proposing a federated learning model that balances data imbalance and preserves data privacy. The authors highlight the difficulties in developing effective fraud detection systems due to data security and privacy concerns, which often prevent banks from sharing transaction datasets. To overcome these challenges, the study employs TensorFlow and PyTorch frameworks for federated learning. The paper also discusses the significant imbalance in credit card transactions, where a small percentage of fraudulent transactions are outnumbered by valid ones, and proposes several resampling techniques to address this issue. These techniques include oversampling (e.g., SMOTE, ROS) and undersampling (e.g., RUS) methods, both individually and in combination. The effectiveness of these resampling techniques is evaluated using various classification algorithms, including Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), and Gaussian Naive Bayes (NB). The results show that hybrid resampling methods, particularly RF, perform well in combination with machine learning classifiers. Additionally, the federated learning model is tested on different platforms (TensorFlow and PyTorch) to determine the optimal framework for achieving high accuracy while preserving data privacy. The experimental results demonstrate that the proposed model achieves higher accuracy and better performance compared to traditional centralized models, with the best accuracy achieved by RF at 99.99%. The study also compares the performance of the CNN classifier with and without data balancing techniques, finding that the Smote + CNN model outperforms existing models in distinguishing between fraudulent and normal transactions. Finally, the paper discusses the impact of different batch sizes and optimization techniques on the performance of the federated learning model, concluding that the MSGD optimizer and PyTorch-psyft platform provide the best results.This paper addresses the challenge of credit card fraud detection (CCFD) by proposing a federated learning model that balances data imbalance and preserves data privacy. The authors highlight the difficulties in developing effective fraud detection systems due to data security and privacy concerns, which often prevent banks from sharing transaction datasets. To overcome these challenges, the study employs TensorFlow and PyTorch frameworks for federated learning. The paper also discusses the significant imbalance in credit card transactions, where a small percentage of fraudulent transactions are outnumbered by valid ones, and proposes several resampling techniques to address this issue. These techniques include oversampling (e.g., SMOTE, ROS) and undersampling (e.g., RUS) methods, both individually and in combination. The effectiveness of these resampling techniques is evaluated using various classification algorithms, including Random Forest (RF), Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), and Gaussian Naive Bayes (NB). The results show that hybrid resampling methods, particularly RF, perform well in combination with machine learning classifiers. Additionally, the federated learning model is tested on different platforms (TensorFlow and PyTorch) to determine the optimal framework for achieving high accuracy while preserving data privacy. The experimental results demonstrate that the proposed model achieves higher accuracy and better performance compared to traditional centralized models, with the best accuracy achieved by RF at 99.99%. The study also compares the performance of the CNN classifier with and without data balancing techniques, finding that the Smote + CNN model outperforms existing models in distinguishing between fraudulent and normal transactions. Finally, the paper discusses the impact of different batch sizes and optimization techniques on the performance of the federated learning model, concluding that the MSGD optimizer and PyTorch-psyft platform provide the best results.