Automatic Food Recognition Using Deep Convolutional Neural Networks with Self-attention Mechanism

Automatic Food Recognition Using Deep Convolutional Neural Networks with Self-attention Mechanism

9 January 2024 | Rahib Abiyev, Joseph Adepoju
This research presents an advanced food recognition model, FRCNNSAM, which combines deep convolutional neural networks (CNNs) with a self-attention mechanism to achieve high accuracy in food image classification. The model was trained and evaluated on two datasets: Food-101 and MA Food-121, achieving impressive accuracy rates of 96.40% and 95.11%, respectively. Compared to baseline transfer learning models, FRCNNSAM outperformed them by 8.12%. The model also demonstrated strong generalization capabilities when tested on random internet images, making it suitable for food image recognition and classification tasks. The FRCNNSAM model integrates advanced techniques such as scaled dot-product attention and ensemble learning to accurately identify intricate patterns and relationships in food images. Its ensemble approach combines prediction probabilities through rigorous averaging methods, resulting in enhanced robustness and performance in food recognition tasks. The model is designed to optimize computational resources and memory usage, incorporating techniques like weight sharing and data compression for enhanced efficiency and scalability. The study also investigates whether CNN models developed without transfer learning can achieve performance levels comparable to those using transfer learning. The results show that FRCNNSAM, which does not use transfer learning, achieves high accuracy, demonstrating the potential of CNNs in food image recognition without relying on pre-trained models. The FRCNNSAM model was developed using a CNN architecture with multiple layers, including convolutional, pooling, and dense layers. The model was trained on two datasets, Food-101 and MA Food-121, and evaluated using metrics such as accuracy, precision, recall, and F1-score. The model's performance was further enhanced through data augmentation, regularization techniques, and the integration of a self-attention mechanism. The study highlights the effectiveness of ensemble learning and self-attention mechanisms in improving the accuracy and robustness of food image recognition systems. The results demonstrate that FRCNNSAM outperforms other models in food image classification tasks, showcasing the potential of advanced CNN architectures and ensemble learning in enhancing food image recognition systems. The findings support the notion that CNN models without transfer learning can achieve comparable performance to those using transfer learning techniques, emphasizing the importance of proper fine-tuning, parameter adjustments, and integration of advanced techniques in maximizing model accuracy.This research presents an advanced food recognition model, FRCNNSAM, which combines deep convolutional neural networks (CNNs) with a self-attention mechanism to achieve high accuracy in food image classification. The model was trained and evaluated on two datasets: Food-101 and MA Food-121, achieving impressive accuracy rates of 96.40% and 95.11%, respectively. Compared to baseline transfer learning models, FRCNNSAM outperformed them by 8.12%. The model also demonstrated strong generalization capabilities when tested on random internet images, making it suitable for food image recognition and classification tasks. The FRCNNSAM model integrates advanced techniques such as scaled dot-product attention and ensemble learning to accurately identify intricate patterns and relationships in food images. Its ensemble approach combines prediction probabilities through rigorous averaging methods, resulting in enhanced robustness and performance in food recognition tasks. The model is designed to optimize computational resources and memory usage, incorporating techniques like weight sharing and data compression for enhanced efficiency and scalability. The study also investigates whether CNN models developed without transfer learning can achieve performance levels comparable to those using transfer learning. The results show that FRCNNSAM, which does not use transfer learning, achieves high accuracy, demonstrating the potential of CNNs in food image recognition without relying on pre-trained models. The FRCNNSAM model was developed using a CNN architecture with multiple layers, including convolutional, pooling, and dense layers. The model was trained on two datasets, Food-101 and MA Food-121, and evaluated using metrics such as accuracy, precision, recall, and F1-score. The model's performance was further enhanced through data augmentation, regularization techniques, and the integration of a self-attention mechanism. The study highlights the effectiveness of ensemble learning and self-attention mechanisms in improving the accuracy and robustness of food image recognition systems. The results demonstrate that FRCNNSAM outperforms other models in food image classification tasks, showcasing the potential of advanced CNN architectures and ensemble learning in enhancing food image recognition systems. The findings support the notion that CNN models without transfer learning can achieve comparable performance to those using transfer learning techniques, emphasizing the importance of proper fine-tuning, parameter adjustments, and integration of advanced techniques in maximizing model accuracy.
Reach us at info@study.space