26 Aug 2024 | Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai, and Wen-Huang Cheng
DQ-DETR is a DETR-like model designed for tiny object detection, addressing the limitations of previous methods in handling imbalanced datasets with varying numbers of tiny objects. The model introduces three key components: a categorical counting module to estimate the number of objects, a counting-guided feature enhancement module to improve visual features, and dynamic query selection to adaptively adjust the number and position of object queries. The categorical counting module classifies the number of instances into four levels, guiding the number of queries used in the transformer decoder. The counting-guided feature enhancement module enhances the encoder's visual features with spatial information from density maps, improving the positional information of object queries. Dynamic query selection adjusts the number of queries based on the predicted counting number, enhancing detection performance in both sparse and dense images. DQ-DETR achieves state-of-the-art results on the AI-TOD-V2 dataset, with an mAP of 30.2%, outperforming previous CNN-based and DETR-like methods. The model is effective in handling imbalanced datasets and improves the detection of tiny objects by dynamically adjusting the number and position of queries. The contributions include identifying the limitations of previous DETR-like methods, designing a simple yet accurate categorical counting module, and enhancing the transformer's visual features with density maps. Experimental results show that DQ-DETR significantly surpasses existing methods in terms of AP and other metrics on the AI-TOD-V2 dataset. The model is also tested on the VisDrone and COCO datasets, demonstrating its effectiveness in various scenarios. The ablation studies confirm the effectiveness of each component, showing that the categorical counting module, counting-guided feature enhancement, and dynamic query selection contribute to improved performance. DQ-DETR is the first DETR-like model specifically designed for tiny object detection, achieving state-of-the-art results on the AI-TOD-V2 dataset.DQ-DETR is a DETR-like model designed for tiny object detection, addressing the limitations of previous methods in handling imbalanced datasets with varying numbers of tiny objects. The model introduces three key components: a categorical counting module to estimate the number of objects, a counting-guided feature enhancement module to improve visual features, and dynamic query selection to adaptively adjust the number and position of object queries. The categorical counting module classifies the number of instances into four levels, guiding the number of queries used in the transformer decoder. The counting-guided feature enhancement module enhances the encoder's visual features with spatial information from density maps, improving the positional information of object queries. Dynamic query selection adjusts the number of queries based on the predicted counting number, enhancing detection performance in both sparse and dense images. DQ-DETR achieves state-of-the-art results on the AI-TOD-V2 dataset, with an mAP of 30.2%, outperforming previous CNN-based and DETR-like methods. The model is effective in handling imbalanced datasets and improves the detection of tiny objects by dynamically adjusting the number and position of queries. The contributions include identifying the limitations of previous DETR-like methods, designing a simple yet accurate categorical counting module, and enhancing the transformer's visual features with density maps. Experimental results show that DQ-DETR significantly surpasses existing methods in terms of AP and other metrics on the AI-TOD-V2 dataset. The model is also tested on the VisDrone and COCO datasets, demonstrating its effectiveness in various scenarios. The ablation studies confirm the effectiveness of each component, showing that the categorical counting module, counting-guided feature enhancement, and dynamic query selection contribute to improved performance. DQ-DETR is the first DETR-like model specifically designed for tiny object detection, achieving state-of-the-art results on the AI-TOD-V2 dataset.