[slides] DetDiffusion%3A Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

**DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception** **Authors:** Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang **Institution:** Tsinghua University, CUHK, HKUST, Huawei Noah’s Ark Lab, Research Institute of Tsinghua, Pearl River Delta **Abstract:** Current perceptive models heavily rely on resource-intensive datasets, prompting the need for innovative solutions. DetDiffusion is the first framework to harmonize generative and perceptive models, enhancing data generation and perception capabilities. By introducing a perception-aware loss (PA. loss) and utilizing perception-aware attributes (PA. Attr), DetDiffusion improves image generation quality and controllability. The method customizes data augmentation by extracting and using PA. Attr during generation, boosting the performance of specific perceptive models. Experimental results on the object detection task highlight DetDiffusion's superior performance, achieving a new state-of-the-art in layout-guided generation and significantly enhancing downstream detection performance. **Contributions:** 1. Proposes DetDiffusion, the first framework to explore the synergy between generative and perceptive models. 2. Introduces a perception-aware loss (P.A. loss) based on segmentation and object masks to boost generation quality. 3. Introduces object attributes (P.A. Attr) to enhance the efficacy of synthetic data in perceptive models. 4. Demonstrates DetDiffusion's superior performance in generating high-quality images and improving detection accuracy. **Methods:** - **P.A. loss:** Utilizes segmentation features and object masks to improve image generation quality and controllability. - **P.A. Attr:** Extracts object attributes from a pre-trained detector and integrates them into the generative model's training, enhancing the realism and alignment with perceptive criteria. - **Objective Function:** Combines the perception-aware loss with the foundational loss function of the Latent Diffusion Model (LDM) to balance generative and perceptive aspects. **Experiments:** - **Fidelity:** Evaluates image quality using metrics like FID and YOLO Score, showing DetDiffusion outperforms competitors. - **Trainability:** Demonstrates the effectiveness of generated images in training object detectors, achieving significant improvements in detection accuracy. **Conclusion:** DetDiffusion leverages the synergy between generative and perceptive models to enhance data generation and perception capabilities, achieving state-of-the-art results in layout-guided generation and improving downstream detection performance.**DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception** **Authors:** Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang **Institution:** Tsinghua University, CUHK, HKUST, Huawei Noah’s Ark Lab, Research Institute of Tsinghua, Pearl River Delta **Abstract:** Current perceptive models heavily rely on resource-intensive datasets, prompting the need for innovative solutions. DetDiffusion is the first framework to harmonize generative and perceptive models, enhancing data generation and perception capabilities. By introducing a perception-aware loss (PA. loss) and utilizing perception-aware attributes (PA. Attr), DetDiffusion improves image generation quality and controllability. The method customizes data augmentation by extracting and using PA. Attr during generation, boosting the performance of specific perceptive models. Experimental results on the object detection task highlight DetDiffusion's superior performance, achieving a new state-of-the-art in layout-guided generation and significantly enhancing downstream detection performance. **Contributions:** 1. Proposes DetDiffusion, the first framework to explore the synergy between generative and perceptive models. 2. Introduces a perception-aware loss (P.A. loss) based on segmentation and object masks to boost generation quality. 3. Introduces object attributes (P.A. Attr) to enhance the efficacy of synthetic data in perceptive models. 4. Demonstrates DetDiffusion's superior performance in generating high-quality images and improving detection accuracy. **Methods:** - **P.A. loss:** Utilizes segmentation features and object masks to improve image generation quality and controllability. - **P.A. Attr:** Extracts object attributes from a pre-trained detector and integrates them into the generative model's training, enhancing the realism and alignment with perceptive criteria. - **Objective Function:** Combines the perception-aware loss with the foundational loss function of the Latent Diffusion Model (LDM) to balance generative and perceptive aspects. **Experiments:** - **Fidelity:** Evaluates image quality using metrics like FID and YOLO Score, showing DetDiffusion outperforms competitors. - **Trainability:** Demonstrates the effectiveness of generated images in training object detectors, achieving significant improvements in detection accuracy. **Conclusion:** DetDiffusion leverages the synergy between generative and perceptive models to enhance data generation and perception capabilities, achieving state-of-the-art results in layout-guided generation and improving downstream detection performance.

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

20 Mar 2024 | Yibo Wang, Ruiyuan Gao, Kai Chen, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang