24 Jul 2024 | Wenyu Lv, Yian Zhao, Qinyao Chang, Kui Huang, Guanzhong Wang, Yi Liu
This technical report introduces RT-DETRv2, an improved version of the Real-Time DETection Transformer (RT-DETR). RT-DETRv2 enhances the flexibility and practicality of RT-DETR by introducing several bag-of-freebies and optimizing the training strategy to achieve better performance without sacrificing speed. Specifically, RT-DETRv2 proposes setting distinct numbers of sampling points for features at different scales in the deformable attention module to achieve selective multi-scale feature extraction. Additionally, it introduces an optional discrete sampling operator to replace the grid_sample operator, removing deployment constraints typically associated with DETRs. The training strategy includes dynamic data augmentation and scale-adaptive hyperparameters customization to improve performance. The results demonstrate that RT-DETRv2 outperforms RT-DETR at different scales of detectors while maintaining speed. The report also includes ablation studies on the number of sampling points and discrete sampling, showing that these modifications do not significantly degrade performance. Overall, RT-DETRv2 broadens the scope of RT-DETR applications and provides valuable insights for the DETR family.This technical report introduces RT-DETRv2, an improved version of the Real-Time DETection Transformer (RT-DETR). RT-DETRv2 enhances the flexibility and practicality of RT-DETR by introducing several bag-of-freebies and optimizing the training strategy to achieve better performance without sacrificing speed. Specifically, RT-DETRv2 proposes setting distinct numbers of sampling points for features at different scales in the deformable attention module to achieve selective multi-scale feature extraction. Additionally, it introduces an optional discrete sampling operator to replace the grid_sample operator, removing deployment constraints typically associated with DETRs. The training strategy includes dynamic data augmentation and scale-adaptive hyperparameters customization to improve performance. The results demonstrate that RT-DETRv2 outperforms RT-DETR at different scales of detectors while maintaining speed. The report also includes ablation studies on the number of sampling points and discrete sampling, showing that these modifications do not significantly degrade performance. Overall, RT-DETRv2 broadens the scope of RT-DETR applications and provides valuable insights for the DETR family.