[slides and audio] A Survey on Vision Transformer

This paper provides a comprehensive review of vision transformer models, categorizing them into different tasks and analyzing their advantages and disadvantages. The main categories explored include backbone networks, high/mid-level vision, low-level vision, and video processing. The paper also discusses efficient transformer methods for real device-based applications and reviews the self-attention mechanism in computer vision. The authors highlight the challenges and provide several research directions for future work in vision transformers. Transformer, Self-attention, Computer Vision, High-level vision, Low-level vision, Video.This paper provides a comprehensive review of vision transformer models, categorizing them into different tasks and analyzing their advantages and disadvantages. The main categories explored include backbone networks, high/mid-level vision, low-level vision, and video processing. The paper also discusses efficient transformer methods for real device-based applications and reviews the self-attention mechanism in computer vision. The authors highlight the challenges and provide several research directions for future work in vision transformers. Transformer, Self-attention, Computer Vision, High-level vision, Low-level vision, Video.

A Survey on Visual Transformer

10 Jul 2023 | Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, and Dacheng Tao Fellow, IEEE