Enhanced Vision Transformer and Transfer Learning Approach to Improve Rice Disease Recognition

Enhanced Vision Transformer and Transfer Learning Approach to Improve Rice Disease Recognition

April, 26th 2024 | Rahadian Kristiyanto Rachman, De Rosal Ignatius Moses Setiadi*, Ajib Susanto, Kristiawan Nugroho, Hussain Md Mhedul Islam
This research explores the application of the Vision Transformer (ViT) model for rice disease recognition, aiming to enhance the performance of traditional Convolutional Neural Networks (CNNs). The study focuses on two datasets: a balanced dataset with 2628 images and an imbalanced dataset with 320 images. The ViT Base (B) model, specifically the ViT-B16 variant, is adapted and fine-tuned for the task. The model's performance is evaluated using metrics such as recall, precision, specificity, F1-score, and overall accuracy. The results show that the proposed ViT model outperforms other CNN models, including VGG, MobileNet, and EfficientNet, achieving higher accuracy, recall, and precision. The ViT model demonstrates superior performance on both balanced and imbalanced datasets, highlighting its potential as a transformative tool in agricultural AI applications. The study also discusses the advantages of ViT's global context capture and self-attention mechanisms, which enable it to handle complex visual tasks more effectively than CNNs. Future work could focus on optimizing the model's computational efficiency and adaptability for real-time applications in agriculture.This research explores the application of the Vision Transformer (ViT) model for rice disease recognition, aiming to enhance the performance of traditional Convolutional Neural Networks (CNNs). The study focuses on two datasets: a balanced dataset with 2628 images and an imbalanced dataset with 320 images. The ViT Base (B) model, specifically the ViT-B16 variant, is adapted and fine-tuned for the task. The model's performance is evaluated using metrics such as recall, precision, specificity, F1-score, and overall accuracy. The results show that the proposed ViT model outperforms other CNN models, including VGG, MobileNet, and EfficientNet, achieving higher accuracy, recall, and precision. The ViT model demonstrates superior performance on both balanced and imbalanced datasets, highlighting its potential as a transformative tool in agricultural AI applications. The study also discusses the advantages of ViT's global context capture and self-attention mechanisms, which enable it to handle complex visual tasks more effectively than CNNs. Future work could focus on optimizing the model's computational efficiency and adaptability for real-time applications in agriculture.
Reach us at info@study.space
Understanding Enhanced Vision Transformer and Transfer Learning Approach to Improve Rice Disease Recognition