InsectMamba is a novel model for insect pest classification that integrates State Space Models (SSMs), Convolutional Neural Networks (CNNs), Multi-Head Self-Attention (MSA), and Multilayer Perceptrons (MLPs) within Mix-SSM blocks. This integration enables the extraction of comprehensive visual features by leveraging the strengths of each encoding strategy. A selective module is also proposed to adaptively aggregate these features, enhancing the model's ability to discern pest characteristics. InsectMamba was evaluated against strong competitors across five insect pest classification datasets, demonstrating superior performance and verifying the significance of each model component through ablation studies.
The model addresses the challenges of accurately identifying and classifying pests in varied conditions by combining different visual encoding strategies. CNNs excel in local feature extraction, while MSA captures global features. SSMs are effective at recognizing long-distance dependencies, and MLPs specialize in channel-aware information inference. InsectMamba integrates these strategies through Mix-SSM blocks, which combine SSM, CNN, MSA, and MLP to extract more comprehensive visual features for insect pest classification. A selective module is introduced to adaptively aggregate visual features from different encoding strategies, allowing the model to select relevant features for classification.
In experiments, InsectMamba outperformed other methods on five insect pest classification datasets, demonstrating its effectiveness. An ablation study verified the significance of each module of the model. The results show that InsectMamba's Mix-SSM Block can integrate multiple visual encoding strategies to ensure comprehensive feature capture from input images, and the Selective Module further enhances the model's capability by adaptively weighting the contribution of different encoding strategies. The model's performance was evaluated using accuracy (ACC), precision (Prec), recall (Rec), and F1 score, with InsectMamba consistently achieving higher scores across all datasets. The results demonstrate that the integration of multiple visual encoding strategies is crucial for capturing the comprehensive visual characteristics of insects.InsectMamba is a novel model for insect pest classification that integrates State Space Models (SSMs), Convolutional Neural Networks (CNNs), Multi-Head Self-Attention (MSA), and Multilayer Perceptrons (MLPs) within Mix-SSM blocks. This integration enables the extraction of comprehensive visual features by leveraging the strengths of each encoding strategy. A selective module is also proposed to adaptively aggregate these features, enhancing the model's ability to discern pest characteristics. InsectMamba was evaluated against strong competitors across five insect pest classification datasets, demonstrating superior performance and verifying the significance of each model component through ablation studies.
The model addresses the challenges of accurately identifying and classifying pests in varied conditions by combining different visual encoding strategies. CNNs excel in local feature extraction, while MSA captures global features. SSMs are effective at recognizing long-distance dependencies, and MLPs specialize in channel-aware information inference. InsectMamba integrates these strategies through Mix-SSM blocks, which combine SSM, CNN, MSA, and MLP to extract more comprehensive visual features for insect pest classification. A selective module is introduced to adaptively aggregate visual features from different encoding strategies, allowing the model to select relevant features for classification.
In experiments, InsectMamba outperformed other methods on five insect pest classification datasets, demonstrating its effectiveness. An ablation study verified the significance of each module of the model. The results show that InsectMamba's Mix-SSM Block can integrate multiple visual encoding strategies to ensure comprehensive feature capture from input images, and the Selective Module further enhances the model's capability by adaptively weighting the contribution of different encoding strategies. The model's performance was evaluated using accuracy (ACC), precision (Prec), recall (Rec), and F1 score, with InsectMamba consistently achieving higher scores across all datasets. The results demonstrate that the integration of multiple visual encoding strategies is crucial for capturing the comprehensive visual characteristics of insects.