End-to-End Object Detection with Transformers

End-to-End Object Detection with Transformers

28 May 2020 | Nicolas Carion*, Francisco Massa*, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko
The paper introduces DETR (Detection Transformer), a novel approach to object detection that views the task as a direct set prediction problem. DETR streamlines the detection pipeline by eliminating the need for hand-designed components like non-maximum suppression or anchor generation, which encode prior knowledge about the task. The core components of DETR are a set-based global loss that ensures unique predictions through bipartite matching and a transformer encoder-decoder architecture. Given a small set of learned object queries, DETR reasons about the relationships between objects and the global image context to output the final set of predictions in parallel. This approach is conceptually simple and does not require specialized libraries, making it easy to implement in any deep learning framework. On the challenging COCO dataset, DETR demonstrates comparable accuracy and run-time performance to the well-established Faster R-CNN baseline, with significant improvements on large objects. Additionally, DETR can be easily extended to produce panoptic segmentation in a unified manner, outperforming competitive baselines. The training code and pre-trained models are available on GitHub.The paper introduces DETR (Detection Transformer), a novel approach to object detection that views the task as a direct set prediction problem. DETR streamlines the detection pipeline by eliminating the need for hand-designed components like non-maximum suppression or anchor generation, which encode prior knowledge about the task. The core components of DETR are a set-based global loss that ensures unique predictions through bipartite matching and a transformer encoder-decoder architecture. Given a small set of learned object queries, DETR reasons about the relationships between objects and the global image context to output the final set of predictions in parallel. This approach is conceptually simple and does not require specialized libraries, making it easy to implement in any deep learning framework. On the challenging COCO dataset, DETR demonstrates comparable accuracy and run-time performance to the well-established Faster R-CNN baseline, with significant improvements on large objects. Additionally, DETR can be easily extended to produce panoptic segmentation in a unified manner, outperforming competitive baselines. The training code and pre-trained models are available on GitHub.
Reach us at info@study.space
Understanding End-to-End Object Detection with Transformers