The paper presents an advanced food recognition model, FRCNNSAM, which combines deep convolutional neural networks (CNNs) with a self-attention mechanism. The model is designed to address the challenges of food image recognition, such as diverse food types, variations in presentation, and high-level semantics. By training multiple FRCNNSAM structures with varying parameters and averaging their predictions, the model achieves robust performance. Regularization techniques are used to prevent overfitting, and data augmentation is employed to generate additional training data. The FRCNNSAM model is tested on two datasets: Food-101 and MA Food-121, achieving impressive accuracy of 96.40% and 95.11%, respectively. Compared to baseline transfer learning models, the FRCNNSAM model outperforms them by 8.12%. The model's strong generalization ability is demonstrated through its performance on random internet images, making it suitable for food image recognition and classification tasks. The study also explores the effectiveness of self-attention mechanisms in enhancing CNN models, contributing to the field of food image recognition.The paper presents an advanced food recognition model, FRCNNSAM, which combines deep convolutional neural networks (CNNs) with a self-attention mechanism. The model is designed to address the challenges of food image recognition, such as diverse food types, variations in presentation, and high-level semantics. By training multiple FRCNNSAM structures with varying parameters and averaging their predictions, the model achieves robust performance. Regularization techniques are used to prevent overfitting, and data augmentation is employed to generate additional training data. The FRCNNSAM model is tested on two datasets: Food-101 and MA Food-121, achieving impressive accuracy of 96.40% and 95.11%, respectively. Compared to baseline transfer learning models, the FRCNNSAM model outperforms them by 8.12%. The model's strong generalization ability is demonstrated through its performance on random internet images, making it suitable for food image recognition and classification tasks. The study also explores the effectiveness of self-attention mechanisms in enhancing CNN models, contributing to the field of food image recognition.