May 13–17, 2024 | Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, Xi Xiao
This paper introduces PMG, a novel method for personalized multimodal generation using large language models (LLMs). PMG addresses the challenge of generating personalized content by converting user behaviors into natural language to extract user preferences, which are then used to condition the generation process. The method combines explicit keywords and implicit embeddings to represent user preferences, enabling the generator to produce content that aligns with user preferences while maintaining accuracy. The approach optimizes a weighted sum of accuracy and preference scores to balance the generated content. Experimental results show that PMG significantly improves personalization, achieving up to 8% improvement in LPIPS while retaining generation accuracy. The method is validated on two datasets, demonstrating its effectiveness in generating personalized images, posters, and emoticons. PMG also shows potential for downstream recommendation tasks by leveraging generated images as additional visual features. The work contributes to the field of personalized generation by providing a comprehensive solution that integrates user behavior analysis with multimodal generation. Future work aims to enhance the realism of generated images through retrieval-based augmentation.This paper introduces PMG, a novel method for personalized multimodal generation using large language models (LLMs). PMG addresses the challenge of generating personalized content by converting user behaviors into natural language to extract user preferences, which are then used to condition the generation process. The method combines explicit keywords and implicit embeddings to represent user preferences, enabling the generator to produce content that aligns with user preferences while maintaining accuracy. The approach optimizes a weighted sum of accuracy and preference scores to balance the generated content. Experimental results show that PMG significantly improves personalization, achieving up to 8% improvement in LPIPS while retaining generation accuracy. The method is validated on two datasets, demonstrating its effectiveness in generating personalized images, posters, and emoticons. PMG also shows potential for downstream recommendation tasks by leveraging generated images as additional visual features. The work contributes to the field of personalized generation by providing a comprehensive solution that integrates user behavior analysis with multimodal generation. Future work aims to enhance the realism of generated images through retrieval-based augmentation.