An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

5 Jan 2024 | Xiangyu Zhao1 Yicheng Chen1 Shilin Xu1 Xiangtai Li1 Xinjiang Wang2 Yining Li1 Haian Huang1,†
The paper introduces MM-Grounding-DINO, an open-source and comprehensive pipeline for unified object grounding and detection, built on the MMDetection toolbox. This pipeline addresses three key tasks: Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). The authors address the lack of detailed technical information in the original Grounding-DINO model by providing a detailed analysis of each reported result and detailed settings for reproduction. MM-Grounding-DINO is trained using a variety of vision datasets and fine-tuned on multiple detection and grounding datasets. Extensive experiments on various benchmarks demonstrate that MM-Grounding-DINO outperforms the Grounding-DINO baseline, particularly in zero-shot settings. The paper also evaluates the model's performance on specific datasets and tasks, such as Object Detection in the Wild (ODinW) and various downstream tasks like hazy and underwater object detection. The authors conclude that MM-Grounding-DINO is a valuable resource for further research in object grounding and detection tasks.The paper introduces MM-Grounding-DINO, an open-source and comprehensive pipeline for unified object grounding and detection, built on the MMDetection toolbox. This pipeline addresses three key tasks: Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). The authors address the lack of detailed technical information in the original Grounding-DINO model by providing a detailed analysis of each reported result and detailed settings for reproduction. MM-Grounding-DINO is trained using a variety of vision datasets and fine-tuned on multiple detection and grounding datasets. Extensive experiments on various benchmarks demonstrate that MM-Grounding-DINO outperforms the Grounding-DINO baseline, particularly in zero-shot settings. The paper also evaluates the model's performance on specific datasets and tasks, such as Object Detection in the Wild (ODinW) and various downstream tasks like hazy and underwater object detection. The authors conclude that MM-Grounding-DINO is a valuable resource for further research in object grounding and detection tasks.
Reach us at info@study.space