26 May 2024 | Qihao Zhao, Yalun Dai, Hao Li, Wei Hu, Fan Zhang, Jun Liu
The paper introduces a novel framework called LTGC (Long-tail Recognition via Leveraging LLMs-driven Generated Content) to address the challenges of long-tail recognition, particularly the imbalance and scarcity of tail-class data. LTGC leverages the implicit knowledge of large models (LLMs and LMMs) to generate diverse and accurate tail-class content. The framework includes several innovative designs to ensure the quality and diversity of generated data and to efficiently fine-tune the model using both generated and original data. Key components of LTGC include:
1. **Diverse Tail Images Generation**: LTGC uses LMMs to analyze existing tail-class images and generate new, diverse descriptions for absent classes. These descriptions are then transformed into images using a text-to-image (T2I) model. A self-reflection and iterative evaluation module ensures the quality and diversity of the generated images.
2. **BalanceMix Module**: This module addresses the domain gap between generated and original data by mixing them using a balanced sampling approach, enhancing the model's performance on long-tail recognition tasks.
Experimental results on popular long-tail benchmarks (ImageNet-LT, Places-LT, and iNaturalist 2018) demonstrate that LTGC outperforms existing state-of-the-art methods, achieving higher overall and few-shot accuracies. Visualizations show the effectiveness of LTGC in generating accurate and diverse tail-class images. The paper also includes ablation studies to validate the contributions of each module in LTGC.The paper introduces a novel framework called LTGC (Long-tail Recognition via Leveraging LLMs-driven Generated Content) to address the challenges of long-tail recognition, particularly the imbalance and scarcity of tail-class data. LTGC leverages the implicit knowledge of large models (LLMs and LMMs) to generate diverse and accurate tail-class content. The framework includes several innovative designs to ensure the quality and diversity of generated data and to efficiently fine-tune the model using both generated and original data. Key components of LTGC include:
1. **Diverse Tail Images Generation**: LTGC uses LMMs to analyze existing tail-class images and generate new, diverse descriptions for absent classes. These descriptions are then transformed into images using a text-to-image (T2I) model. A self-reflection and iterative evaluation module ensures the quality and diversity of the generated images.
2. **BalanceMix Module**: This module addresses the domain gap between generated and original data by mixing them using a balanced sampling approach, enhancing the model's performance on long-tail recognition tasks.
Experimental results on popular long-tail benchmarks (ImageNet-LT, Places-LT, and iNaturalist 2018) demonstrate that LTGC outperforms existing state-of-the-art methods, achieving higher overall and few-shot accuracies. Visualizations show the effectiveness of LTGC in generating accurate and diverse tail-class images. The paper also includes ablation studies to validate the contributions of each module in LTGC.