LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

26 May 2024 | Qihao Zhao, Yalun Dai, Hao Li, Wei Hu, Fan Zhang, Jun Liu
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content This paper proposes a novel generative and fine-tuning framework, LTGC, to address the challenges of long-tail recognition. Long-tail recognition is challenging due to class imbalance and scarcity of tail-class data. LTGC leverages the implicit knowledge of large language models (LLMs) and large multimodal models (LMMs) to generate diverse tail-class data. The framework first analyzes existing tail-class data to obtain descriptions, then extends these descriptions using LLMs. The extended descriptions are transformed into images using text-to-image (T2I) models. An iterative evaluation module ensures the quality and diversity of the generated images by refining descriptions based on feedback from CLIP. Additionally, a BalanceMix module is introduced to efficiently fine-tune the model using both generated and original data. Experimental results show that LTGC outperforms existing state-of-the-art methods on popular long-tailed benchmarks. The framework effectively addresses the challenges of long-tail recognition by leveraging the rich implicit knowledge of large models to generate diverse and high-quality data, and by using a BalanceMix module to handle domain shifts during fine-tuning. The results demonstrate the effectiveness of the proposed framework in improving long-tail recognition performance.LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content This paper proposes a novel generative and fine-tuning framework, LTGC, to address the challenges of long-tail recognition. Long-tail recognition is challenging due to class imbalance and scarcity of tail-class data. LTGC leverages the implicit knowledge of large language models (LLMs) and large multimodal models (LMMs) to generate diverse tail-class data. The framework first analyzes existing tail-class data to obtain descriptions, then extends these descriptions using LLMs. The extended descriptions are transformed into images using text-to-image (T2I) models. An iterative evaluation module ensures the quality and diversity of the generated images by refining descriptions based on feedback from CLIP. Additionally, a BalanceMix module is introduced to efficiently fine-tune the model using both generated and original data. Experimental results show that LTGC outperforms existing state-of-the-art methods on popular long-tailed benchmarks. The framework effectively addresses the challenges of long-tail recognition by leveraging the rich implicit knowledge of large models to generate diverse and high-quality data, and by using a BalanceMix module to handle domain shifts during fine-tuning. The results demonstrate the effectiveness of the proposed framework in improving long-tail recognition performance.
Reach us at info@study.space
[slides and audio] LTGC%3A Long-Tail Recognition via Leveraging LLMs-Driven Generated Content