Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

16 Jul 2024 | Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen, Xuguang Lan
This paper introduces Relation-DETR, a novel approach to enhance the performance and convergence of DETR (DEtection TRansformer) detectors. The authors identify the slow convergence issue in DETR as arising from the lack of structural bias in self-attention, which does not incorporate position information. To address this, they propose incorporating explicit position relation priors into the attention mechanism. This involves constructing position relation embeddings using normalized relative geometry features, which are then integrated into the attention process through progressive attention refinement. The method extends the traditional streaming pipeline of DETR into a contrastive relation pipeline, addressing conflicts between non-duplicate predictions and positive supervision. Extensive experiments on various datasets, including COCO and task-specific datasets, demonstrate the effectiveness of Relation-DETR, showing significant improvements in accuracy, faster convergence, and better generalizability. The proposed position relation encoder is also shown to be transferable to existing DETR detectors, enhancing their performance with minimal modifications. Additionally, the authors introduce a class-agnostic detection dataset, SA-Det-100k, to validate the effectiveness of explicit position relations in universal object detection tasks.This paper introduces Relation-DETR, a novel approach to enhance the performance and convergence of DETR (DEtection TRansformer) detectors. The authors identify the slow convergence issue in DETR as arising from the lack of structural bias in self-attention, which does not incorporate position information. To address this, they propose incorporating explicit position relation priors into the attention mechanism. This involves constructing position relation embeddings using normalized relative geometry features, which are then integrated into the attention process through progressive attention refinement. The method extends the traditional streaming pipeline of DETR into a contrastive relation pipeline, addressing conflicts between non-duplicate predictions and positive supervision. Extensive experiments on various datasets, including COCO and task-specific datasets, demonstrate the effectiveness of Relation-DETR, showing significant improvements in accuracy, faster convergence, and better generalizability. The proposed position relation encoder is also shown to be transferable to existing DETR detectors, enhancing their performance with minimal modifications. Additionally, the authors introduce a class-agnostic detection dataset, SA-Det-100k, to validate the effectiveness of explicit position relations in universal object detection tasks.
Reach us at info@study.space