[slides and audio] Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

This paper addresses the challenging task of cross-domain few-shot object detection (CD-FSOD), aiming to develop accurate object detectors for novel domains with minimal labeled examples. While transformer-based open-set detectors, such as DE-ViT, show promise in traditional few-shot object detection, their generalization to CD-FSOD remains unclear. The authors establish a new benchmark named CD-FSOD to evaluate object detection methods, revealing that most current approaches fail to generalize across domains. They identify three key issues: style, inter-class variance (ICV), and indefinable boundaries (IB). To address these challenges, they propose several novel modules: learnable instance features, an instance reweighting module, and a domain prompter. These modules enhance feature distinctiveness, prioritize high-quality instances with slight IB, and encourage features to be resilient to different styles by synthesizing imaginary domains. The proposed techniques collectively contribute to the development of the Cross-Domain Vision Transformer for CD-FSOD (CD-ViT), which significantly improves upon the base DE-ViT. Experimental results validate the efficacy of the proposed model, demonstrating its superior performance across various target datasets. The paper also provides detailed analysis of the impact of domain gaps on object detection performance and highlights the importance of finetuning in adapting open-set models to CD-FSOD.This paper addresses the challenging task of cross-domain few-shot object detection (CD-FSOD), aiming to develop accurate object detectors for novel domains with minimal labeled examples. While transformer-based open-set detectors, such as DE-ViT, show promise in traditional few-shot object detection, their generalization to CD-FSOD remains unclear. The authors establish a new benchmark named CD-FSOD to evaluate object detection methods, revealing that most current approaches fail to generalize across domains. They identify three key issues: style, inter-class variance (ICV), and indefinable boundaries (IB). To address these challenges, they propose several novel modules: learnable instance features, an instance reweighting module, and a domain prompter. These modules enhance feature distinctiveness, prioritize high-quality instances with slight IB, and encourage features to be resilient to different styles by synthesizing imaginary domains. The proposed techniques collectively contribute to the development of the Cross-Domain Vision Transformer for CD-FSOD (CD-ViT), which significantly improves upon the base DE-ViT. Experimental results validate the efficacy of the proposed model, demonstrating its superior performance across various target datasets. The paper also provides detailed analysis of the impact of domain gaps on object detection performance and highlights the importance of finetuning in adapting open-set models to CD-FSOD.

Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

16 Jul 2024 | Yuqian Fu*1,2,3, Yu Wang*1, Yixuan Pan*4, Lian Huai†5, Xingyu Qiu1, Zeyu Shangguan5, Tong Liu5, Yanwei Fu1, Luc Van Gool2,3, Xingqun Jiang5

16 Jul 2024 | Yuqian Fu1,2,3, Yu Wang1, Yixuan Pan*4, Lian Huai†5, Xingyu Qiu1, Zeyu Shangguan5, Tong Liu5, Yanwei Fu1, Luc Van Gool2,3, Xingqun Jiang5