April, 26th 2024 | Rahadian Kristiyanto Rachman, De Rosal Ignatius Moses Setiadi, Ajib Susanto, Kristiawan Nugroho, and Hussain Md Mehedul Islam
This research presents an enhanced Vision Transformer (ViT) and transfer learning approach for improving rice disease recognition. The study aims to address the limitations of traditional Convolutional Neural Networks (CNNs) in capturing global contextual information for rice disease classification. The ViT model, which leverages self-attention mechanisms, is adapted and optimized for this task. The proposed ViT model is evaluated on both balanced and imbalanced datasets, demonstrating superior performance compared to CNN models such as VGG, MobileNet, and EfficientNet. The ViT model achieves high recall (0.9792), precision (0.9815), specificity (0.9938), f1-score (0.9791), and accuracy (0.9792) on challenging datasets, establishing a new benchmark in rice disease recognition. The study highlights the ViT model's ability to capture global patterns and its potential as a transformative tool in agricultural applications. The research also explores the effectiveness of ViT in handling imbalanced datasets, showing stable and superior results compared to other models. The proposed ViT model is trained using a combination of data augmentation techniques and hyperparameter tuning, with the ViT-B16 variant selected for its efficiency and adaptability. The results indicate that the ViT model outperforms traditional CNN models in both balanced and imbalanced scenarios, demonstrating its potential for future agricultural AI applications. The study contributes to the field by showcasing the ViT model's superior performance and stability in rice disease recognition, paving the way for further research in plant disease detection through image processing technologies.This research presents an enhanced Vision Transformer (ViT) and transfer learning approach for improving rice disease recognition. The study aims to address the limitations of traditional Convolutional Neural Networks (CNNs) in capturing global contextual information for rice disease classification. The ViT model, which leverages self-attention mechanisms, is adapted and optimized for this task. The proposed ViT model is evaluated on both balanced and imbalanced datasets, demonstrating superior performance compared to CNN models such as VGG, MobileNet, and EfficientNet. The ViT model achieves high recall (0.9792), precision (0.9815), specificity (0.9938), f1-score (0.9791), and accuracy (0.9792) on challenging datasets, establishing a new benchmark in rice disease recognition. The study highlights the ViT model's ability to capture global patterns and its potential as a transformative tool in agricultural applications. The research also explores the effectiveness of ViT in handling imbalanced datasets, showing stable and superior results compared to other models. The proposed ViT model is trained using a combination of data augmentation techniques and hyperparameter tuning, with the ViT-B16 variant selected for its efficiency and adaptability. The results indicate that the ViT model outperforms traditional CNN models in both balanced and imbalanced scenarios, demonstrating its potential for future agricultural AI applications. The study contributes to the field by showcasing the ViT model's superior performance and stability in rice disease recognition, paving the way for further research in plant disease detection through image processing technologies.