1 Jun 2024 | Jiaming Li, Jiacheng Zhang, Jichang Li, Ge Li, Si Liu, Liang Lin, Guanbin Li
The paper introduces a novel framework called LBP (Learning Background Prompts) for open-vocabulary object detection (OVD), which aims to enhance the detector's ability to recognize both base and novel categories. LBP addresses the challenges of background interpretation and model overfitting by proposing three key modules: Background Category-specific Prompt, Background Object Discovery, and Inference Probability Rectification. These modules work together to discover, represent, and leverage implicit object knowledge from background proposals, improving the detector's performance on OVD tasks.
1. **Background Category-specific Prompt**: This module discovers and represents background underlying categories estimated from background proposals using learnable context-specific prompts. It helps in improving background interpretation and reducing model bias towards base classes.
2. **Background Object Discovery**: This module further explores and exploits implicit object knowledge related to the estimated background underlying categories. It uses $k$-means clustering to extract meaningful categories from background proposals and generates pseudo labels to prevent model bias.
3. **Inference Probability Rectification**: This module addresses conceptual overlaps between estimated background categories and novel categories during inference. It rectifies probability scores for novel categories, ensuring accurate computation of their probabilities.
The effectiveness of LBP is evaluated on two benchmark datasets, OV-COCO and OV-LVIS, demonstrating superior performance over existing state-of-the-art methods. The paper also includes ablation studies to validate the individual contributions of each module and visualizations to illustrate the feature distributions and conceptual overlaps.The paper introduces a novel framework called LBP (Learning Background Prompts) for open-vocabulary object detection (OVD), which aims to enhance the detector's ability to recognize both base and novel categories. LBP addresses the challenges of background interpretation and model overfitting by proposing three key modules: Background Category-specific Prompt, Background Object Discovery, and Inference Probability Rectification. These modules work together to discover, represent, and leverage implicit object knowledge from background proposals, improving the detector's performance on OVD tasks.
1. **Background Category-specific Prompt**: This module discovers and represents background underlying categories estimated from background proposals using learnable context-specific prompts. It helps in improving background interpretation and reducing model bias towards base classes.
2. **Background Object Discovery**: This module further explores and exploits implicit object knowledge related to the estimated background underlying categories. It uses $k$-means clustering to extract meaningful categories from background proposals and generates pseudo labels to prevent model bias.
3. **Inference Probability Rectification**: This module addresses conceptual overlaps between estimated background categories and novel categories during inference. It rectifies probability scores for novel categories, ensuring accurate computation of their probabilities.
The effectiveness of LBP is evaluated on two benchmark datasets, OV-COCO and OV-LVIS, demonstrating superior performance over existing state-of-the-art methods. The paper also includes ablation studies to validate the individual contributions of each module and visualizations to illustrate the feature distributions and conceptual overlaps.