[slides and audio] Deformable DETR%3A Deformable Transformers for End-to-End Object Detection

DETR has been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution due to the limitations of Transformer attention modules in processing image feature maps. To address these issues, the authors propose Deformable DETR, which uses attention modules that only attend to a small set of key sampling points around a reference. Deformable DETR achieves better performance than DETR (especially on small objects) with 10× fewer training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of the proposed approach. The code is released at <https://github.com/fundamentalvision/Deformable-DETR>.DETR has been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution due to the limitations of Transformer attention modules in processing image feature maps. To address these issues, the authors propose Deformable DETR, which uses attention modules that only attend to a small set of key sampling points around a reference. Deformable DETR achieves better performance than DETR (especially on small objects) with 10× fewer training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of the proposed approach. The code is released at <https://github.com/fundamentalvision/Deformable-DETR>.

DEFORMABLE DETR: DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION

18 Mar 2021 | Xizhou Zhu1*, Weijie Su2+†, Lewei Lu1, Bin Li2, Xiaogang Wang1,3, Jifeng Dai1†