Lightweight transformer image feature extraction network

Lightweight transformer image feature extraction network

31 January 2024 | Wenfeng Zheng, Siyu Lu, Youshuai Yang, Zhengtong Yin and Lirong Yin
This paper proposes a lightweight Transformer image feature extraction network that reduces computational complexity and improves efficiency. The method combines two approaches: a linear attention mechanism and token pruning. The linear attention mechanism reduces the quadratic complexity of the self-attention mechanism to linear, enhancing processing speed. Token pruning adaptively filters out unimportant tokens, reducing irrelevant input. These two methods are combined to create an efficient attention mechanism, achieving a 30%–50% reduction in computation for the original Transformer model and a 60%–70% reduction for the efficient attention mechanism. The study addresses the issue of high computational cost in vision Transformers, which are limited by quadratic complexity in processing high-resolution images. The proposed method reduces the computational burden by modifying the self-attention mechanism and applying token pruning. The linear attention mechanism replaces the Softmax operator with a combination function that ensures non-negativity and non-linear reweighting of attention matrix elements. Token pruning reduces the number of tokens by sampling based on their importance scores. The method was evaluated on the ImageNet1k and COCO datasets. Results show that the proposed method significantly reduces computational complexity while maintaining high accuracy. For image classification, the linear attention mechanism improved the FLOPs indicator by up to 60%, and the e-attention mechanism achieved a 60%–70% reduction in computation. For target detection, the method increased FPS by 50%–60% and reduced computational cost. The study concludes that the proposed method effectively reduces computational complexity and maintains model performance. The combination of linear attention and token pruning provides a lightweight solution for Transformer-based image feature extraction, making it suitable for deployment on edge devices. Future work may explore pruning from other dimensions, such as attention heads and neurons, and further improve the efficiency of the model.This paper proposes a lightweight Transformer image feature extraction network that reduces computational complexity and improves efficiency. The method combines two approaches: a linear attention mechanism and token pruning. The linear attention mechanism reduces the quadratic complexity of the self-attention mechanism to linear, enhancing processing speed. Token pruning adaptively filters out unimportant tokens, reducing irrelevant input. These two methods are combined to create an efficient attention mechanism, achieving a 30%–50% reduction in computation for the original Transformer model and a 60%–70% reduction for the efficient attention mechanism. The study addresses the issue of high computational cost in vision Transformers, which are limited by quadratic complexity in processing high-resolution images. The proposed method reduces the computational burden by modifying the self-attention mechanism and applying token pruning. The linear attention mechanism replaces the Softmax operator with a combination function that ensures non-negativity and non-linear reweighting of attention matrix elements. Token pruning reduces the number of tokens by sampling based on their importance scores. The method was evaluated on the ImageNet1k and COCO datasets. Results show that the proposed method significantly reduces computational complexity while maintaining high accuracy. For image classification, the linear attention mechanism improved the FLOPs indicator by up to 60%, and the e-attention mechanism achieved a 60%–70% reduction in computation. For target detection, the method increased FPS by 50%–60% and reduced computational cost. The study concludes that the proposed method effectively reduces computational complexity and maintains model performance. The combination of linear attention and token pruning provides a lightweight solution for Transformer-based image feature extraction, making it suitable for deployment on edge devices. Future work may explore pruning from other dimensions, such as attention heads and neurons, and further improve the efficiency of the model.
Reach us at info@study.space
[slides and audio] Lightweight transformer image feature extraction network