[slides] PMG %3A Personalized Multimodal Generation with Large Language Models

The paper introduces Personalized Multimodal Generation (PMG), a novel method for generating personalized multimodal content using large language models (LLMs). PMG addresses the challenge of integrating personalization into multimodal generation, which is crucial for enhancing user experience and meeting specific user needs. The method converts user behaviors, such as clicks or conversations, into natural language descriptions that LLMs can understand and extract user preferences from. These preferences are then fed into a generator, such as a multimodal LLM or diffusion model, to produce personalized content. To capture user preferences comprehensively and accurately, PMG combines explicit keywords and implicit embeddings. The combination of these elements serves as prompts to condition the generator, ensuring that the generated content aligns with both the user's preferences and the target item. The method optimizes a weighted sum of accuracy and preference scores to balance the generation process, achieving a significant improvement in personalization while maintaining generation accuracy. The paper presents extensive experimental results on two datasets, demonstrating PMG's effectiveness in generating personalized images, movie posters, and emoticons. Compared to a baseline method without personalization, PMG shows a 8% improvement in LPIPS metrics while retaining high accuracy. The method also explores its potential in downstream recommendation tasks, showing that incorporating generated images as additional visual features can enhance recommendation accuracy. Overall, PMG represents a significant advancement in personalized multimodal generation, offering a versatile and effective solution for a wide range of applications, from online advertising to personalized content generation in various domains.The paper introduces Personalized Multimodal Generation (PMG), a novel method for generating personalized multimodal content using large language models (LLMs). PMG addresses the challenge of integrating personalization into multimodal generation, which is crucial for enhancing user experience and meeting specific user needs. The method converts user behaviors, such as clicks or conversations, into natural language descriptions that LLMs can understand and extract user preferences from. These preferences are then fed into a generator, such as a multimodal LLM or diffusion model, to produce personalized content. To capture user preferences comprehensively and accurately, PMG combines explicit keywords and implicit embeddings. The combination of these elements serves as prompts to condition the generator, ensuring that the generated content aligns with both the user's preferences and the target item. The method optimizes a weighted sum of accuracy and preference scores to balance the generation process, achieving a significant improvement in personalization while maintaining generation accuracy. The paper presents extensive experimental results on two datasets, demonstrating PMG's effectiveness in generating personalized images, movie posters, and emoticons. Compared to a baseline method without personalization, PMG shows a 8% improvement in LPIPS metrics while retaining high accuracy. The method also explores its potential in downstream recommendation tasks, showing that incorporating generated images as additional visual features can enhance recommendation accuracy. Overall, PMG represents a significant advancement in personalized multimodal generation, offering a versatile and effective solution for a wide range of applications, from online advertising to personalized content generation in various domains.

PMG : Personalized Multimodal Generation with Large Language Models

May 13–17, 2024, Singapore | Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, Xi Xiao