11 Jul 2022 | Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum
DINO is a state-of-the-art end-to-end object detector that improves upon previous DETR-like models through contrastive denoising training, mixed query selection, and look forward twice. It achieves 49.4AP in 12 epochs and 51.3AP in 24 epochs on COCO with a ResNet-50 backbone and multi-scale features, outperforming DN-DETR by +6.0AP and +2.7AP, respectively. DINO scales well in both model size and data size, achieving 63.2AP on COCO val2017 and 63.3AP on test-dev after pre-training on Objects365 with a SwinL backbone. It significantly reduces model and pre-training data size while achieving better results than other models on the leaderboard. DINO is based on DAB-DETR and DN-DETR, incorporating deformable attention for computational efficiency. It introduces three novel methods: contrastive denoising training, mixed query selection, and look forward twice. These methods improve detection performance, especially for small objects. DINO's effectiveness is validated through extensive experiments on COCO benchmarks, demonstrating superior performance compared to traditional detectors and other DETR-like models. The model is end-to-end, with no hand-crafted components, and achieves state-of-the-art results on the COCO leaderboard.DINO is a state-of-the-art end-to-end object detector that improves upon previous DETR-like models through contrastive denoising training, mixed query selection, and look forward twice. It achieves 49.4AP in 12 epochs and 51.3AP in 24 epochs on COCO with a ResNet-50 backbone and multi-scale features, outperforming DN-DETR by +6.0AP and +2.7AP, respectively. DINO scales well in both model size and data size, achieving 63.2AP on COCO val2017 and 63.3AP on test-dev after pre-training on Objects365 with a SwinL backbone. It significantly reduces model and pre-training data size while achieving better results than other models on the leaderboard. DINO is based on DAB-DETR and DN-DETR, incorporating deformable attention for computational efficiency. It introduces three novel methods: contrastive denoising training, mixed query selection, and look forward twice. These methods improve detection performance, especially for small objects. DINO's effectiveness is validated through extensive experiments on COCO benchmarks, demonstrating superior performance compared to traditional detectors and other DETR-like models. The model is end-to-end, with no hand-crafted components, and achieves state-of-the-art results on the COCO leaderboard.