[slides and audio] Understanding Robustness of Visual State Space Models for Image Classification

The paper "Understanding Robustness of Visual State Space Models for Image Classification" by Chengbin Du, Yanxi Li, and Chang Xu from the University of Sydney explores the robustness of the Visual State Space Model (VMamba) in various computer vision tasks. VMamba is a promising architecture that has shown remarkable performance but lacks thorough robustness studies. The authors conduct a comprehensive investigation from multiple perspectives: 1. **Adversarial Robustness**: VMamba is evaluated against adversarial attacks, including whole-image and patch-specific attacks. Results show that VMamba has superior adversarial robustness compared to Transformer architectures but exhibits scalability weaknesses. 2. **General Robustness**: VMamba's performance is assessed against diverse scenarios, such as natural adversarial examples, out-of-distribution data, and common corruptions. It demonstrates exceptional generalizability with out-of-distribution data but shows scalability issues against natural adversarial examples and common corruptions. 3. **Gradients and Back-Propagation**: The study examines VMamba's gradients and back-propagation during white-box attacks, revealing unique vulnerabilities and defensive capabilities of its novel components. Parameters $A$, $B$, and $C$ play significant roles, with $A$ being hard to estimate by attack algorithms, $B$ and $C$ contributing to vulnerability, and $\Delta$ demonstrating defensive capabilities. 4. **Sensitivity to Image Structure**: VMamba's sensitivity to image structure variations is explored, highlighting vulnerabilities associated with the distribution of disturbance areas and spatial information. VMamba is highly sensitive to the continuity in the scanning trajectory and is more susceptible to perturbations near the image center. The research contributes to a deeper understanding of VMamba's robustness, providing valuable insights for refining and advancing deep neural networks in computer vision applications. The findings offer a roadmap for researchers to iteratively refine and optimize VMamba, ultimately enhancing its robustness and performance.The paper "Understanding Robustness of Visual State Space Models for Image Classification" by Chengbin Du, Yanxi Li, and Chang Xu from the University of Sydney explores the robustness of the Visual State Space Model (VMamba) in various computer vision tasks. VMamba is a promising architecture that has shown remarkable performance but lacks thorough robustness studies. The authors conduct a comprehensive investigation from multiple perspectives: 1. **Adversarial Robustness**: VMamba is evaluated against adversarial attacks, including whole-image and patch-specific attacks. Results show that VMamba has superior adversarial robustness compared to Transformer architectures but exhibits scalability weaknesses. 2. **General Robustness**: VMamba's performance is assessed against diverse scenarios, such as natural adversarial examples, out-of-distribution data, and common corruptions. It demonstrates exceptional generalizability with out-of-distribution data but shows scalability issues against natural adversarial examples and common corruptions. 3. **Gradients and Back-Propagation**: The study examines VMamba's gradients and back-propagation during white-box attacks, revealing unique vulnerabilities and defensive capabilities of its novel components. Parameters $A$, $B$, and $C$ play significant roles, with $A$ being hard to estimate by attack algorithms, $B$ and $C$ contributing to vulnerability, and $\Delta$ demonstrating defensive capabilities. 4. **Sensitivity to Image Structure**: VMamba's sensitivity to image structure variations is explored, highlighting vulnerabilities associated with the distribution of disturbance areas and spatial information. VMamba is highly sensitive to the continuity in the scanning trajectory and is more susceptible to perturbations near the image center. The research contributes to a deeper understanding of VMamba's robustness, providing valuable insights for refining and advancing deep neural networks in computer vision applications. The findings offer a roadmap for researchers to iteratively refine and optimize VMamba, ultimately enhancing its robustness and performance.

Understanding Robustness of Visual State Space Models for Image Classification

16 Mar 2024 | Chengbin Du, Yanxi Li, Chang Xu