26 Mar 2024 | Mingfu Liang, Jong-Chyi Su, Samuel Schulner, Sparsh Garg, Shiyu Zhao, Ying Wu, Manmohan Chandraker
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Autonomous vehicles (AVs) rely on robust perception models for safety. However, road objects follow a long-tailed distribution, making rare or unseen categories challenging for deployed models. This necessitates costly data curation and annotation. The authors propose AIDE, an Automatic Data Engine that leverages vision-language models (VLMs) and large language models (LLMs) to automatically identify issues, curate data, auto-label, and verify models. The process is iterative, enabling continuous model improvement. A benchmark for open-world detection on AV datasets is established to evaluate various learning paradigms, demonstrating AIDE's superior performance at reduced cost.
AIDE consists of four components: Issue Finder, Data Feeder, Model Updater, and Verification. The Issue Finder identifies missing categories in the label space by comparing detection results and dense captions. The Data Feeder uses VLMs to efficiently search for relevant images, reducing inference time and filtering out irrelevant images. The Model Updater automatically labels queried images and continuously trains the model with pseudo-labels. The Verification module uses LLMs to generate diverse scene descriptions and evaluates the updated model.
The Issue Finder uses dense captioning models to identify novel categories. The Data Feeder employs VLMs for text-guided image retrieval, outperforming image similarity methods. The Model Updater uses two-stage pseudo-labeling: first, zero-shot detection with OWL-v2 to generate box proposals, then CLIP filtering to generate pseudo-labels. The Verification step uses LLMs to generate diverse scenarios and human review to ensure correctness.
AIDE outperforms existing methods on novel and known categories, achieving 2.3% AP improvement on novel categories and 8.9% AP on known categories without human annotations. It also reduces inference time and training costs. AIDE's iterative process allows continuous improvement, making it suitable for autonomous driving systems. The method is evaluated on AV datasets, demonstrating its effectiveness in handling novel categories and reducing labeling costs. Limitations include potential hallucinations in the Issue Finder and Verification steps. Despite this, AIDE is effective and recommended for safety-critical systems with some human oversight.AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Autonomous vehicles (AVs) rely on robust perception models for safety. However, road objects follow a long-tailed distribution, making rare or unseen categories challenging for deployed models. This necessitates costly data curation and annotation. The authors propose AIDE, an Automatic Data Engine that leverages vision-language models (VLMs) and large language models (LLMs) to automatically identify issues, curate data, auto-label, and verify models. The process is iterative, enabling continuous model improvement. A benchmark for open-world detection on AV datasets is established to evaluate various learning paradigms, demonstrating AIDE's superior performance at reduced cost.
AIDE consists of four components: Issue Finder, Data Feeder, Model Updater, and Verification. The Issue Finder identifies missing categories in the label space by comparing detection results and dense captions. The Data Feeder uses VLMs to efficiently search for relevant images, reducing inference time and filtering out irrelevant images. The Model Updater automatically labels queried images and continuously trains the model with pseudo-labels. The Verification module uses LLMs to generate diverse scene descriptions and evaluates the updated model.
The Issue Finder uses dense captioning models to identify novel categories. The Data Feeder employs VLMs for text-guided image retrieval, outperforming image similarity methods. The Model Updater uses two-stage pseudo-labeling: first, zero-shot detection with OWL-v2 to generate box proposals, then CLIP filtering to generate pseudo-labels. The Verification step uses LLMs to generate diverse scenarios and human review to ensure correctness.
AIDE outperforms existing methods on novel and known categories, achieving 2.3% AP improvement on novel categories and 8.9% AP on known categories without human annotations. It also reduces inference time and training costs. AIDE's iterative process allows continuous improvement, making it suitable for autonomous driving systems. The method is evaluated on AV datasets, demonstrating its effectiveness in handling novel categories and reducing labeling costs. Limitations include potential hallucinations in the Issue Finder and Verification steps. Despite this, AIDE is effective and recommended for safety-critical systems with some human oversight.