2024 | Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song
The paper "Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping" addresses the issue of limited transferability of adversarial examples between different deep neural network architectures, particularly between Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). To enhance this transferability, the authors propose a novel method called Deformation-Constrained Warping Attack (DeCoWA). DeCoWA uses elastic deformation to augment input examples, preserving global semantics while increasing local detail diversity. This approach is designed to overcome the limitations of existing methods that often fail to achieve strong transferability across different model genera. Extensive experiments demonstrate that DeCoWA significantly improves the performance of adversarial attacks on various tasks, including image classification, video action recognition, and audio recognition, when applied to both CNN surrogates and ViT models. The method is also shown to be effective in cross-modal attacks, where the surrogate model is a CNN and the target system is a Transformer. The paper provides a comprehensive evaluation of DeCoWA's effectiveness and discusses its potential applications in future research.The paper "Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping" addresses the issue of limited transferability of adversarial examples between different deep neural network architectures, particularly between Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). To enhance this transferability, the authors propose a novel method called Deformation-Constrained Warping Attack (DeCoWA). DeCoWA uses elastic deformation to augment input examples, preserving global semantics while increasing local detail diversity. This approach is designed to overcome the limitations of existing methods that often fail to achieve strong transferability across different model genera. Extensive experiments demonstrate that DeCoWA significantly improves the performance of adversarial attacks on various tasks, including image classification, video action recognition, and audio recognition, when applied to both CNN surrogates and ViT models. The method is also shown to be effective in cross-modal attacks, where the surrogate model is a CNN and the target system is a Transformer. The paper provides a comprehensive evaluation of DeCoWA's effectiveness and discusses its potential applications in future research.