10 Jul 2023 | Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, and Dacheng Tao Fellow, IEEE
This paper provides a comprehensive review of vision transformer models, categorizing them into different tasks and analyzing their advantages and disadvantages. The main categories explored include backbone networks, high/mid-level vision, low-level vision, and video processing. The paper also discusses efficient transformer methods for real device-based applications and reviews the self-attention mechanism in computer vision. The authors highlight the challenges and provide several research directions for future work in vision transformers.
Transformer, Self-attention, Computer Vision, High-level vision, Low-level vision, Video.This paper provides a comprehensive review of vision transformer models, categorizing them into different tasks and analyzing their advantages and disadvantages. The main categories explored include backbone networks, high/mid-level vision, low-level vision, and video processing. The paper also discusses efficient transformer methods for real device-based applications and reviews the self-attention mechanism in computer vision. The authors highlight the challenges and provide several research directions for future work in vision transformers.
Transformer, Self-attention, Computer Vision, High-level vision, Low-level vision, Video.