Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

16 Oct 2024 | Wangbo Zhao, Jiaosheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You
This paper addresses the limitations of existing parameter-efficient fine-tuning (PEFT) methods for Vision Transformers (ViTs) by proposing a novel approach called Dynamic Tuning (DyT). DyT aims to improve both parameter and inference efficiency during the adaptation of ViTs to downstream tasks. The key contribution of DyT is the introduction of a token dispatcher that dynamically determines which tokens should be activated or deactivated, allowing less informative tokens to skip certain blocks during inference, thereby reducing redundant computation. The authors explore multiple design variants of DyT and introduce an enhanced adapter inspired by the mixture-of-experts (MoE) mechanism to further boost performance. The effectiveness of DyT is validated across various visual tasks, including image/video recognition and semantic segmentation. Experimental results show that DyT achieves superior performance compared to existing PEFT methods while consuming significantly fewer FLOPs, demonstrating its efficiency and adaptability in different scenarios.This paper addresses the limitations of existing parameter-efficient fine-tuning (PEFT) methods for Vision Transformers (ViTs) by proposing a novel approach called Dynamic Tuning (DyT). DyT aims to improve both parameter and inference efficiency during the adaptation of ViTs to downstream tasks. The key contribution of DyT is the introduction of a token dispatcher that dynamically determines which tokens should be activated or deactivated, allowing less informative tokens to skip certain blocks during inference, thereby reducing redundant computation. The authors explore multiple design variants of DyT and introduce an enhanced adapter inspired by the mixture-of-experts (MoE) mechanism to further boost performance. The effectiveness of DyT is validated across various visual tasks, including image/video recognition and semantic segmentation. Experimental results show that DyT achieves superior performance compared to existing PEFT methods while consuming significantly fewer FLOPs, demonstrating its efficiency and adaptability in different scenarios.
Reach us at info@study.space