This paper introduces a unified multi-modal personalization system (UniMP) that leverages large vision-language models to address the challenges of personalized recommendation and other tasks. The system aims to effectively integrate heterogeneous data from user histories, including images, text, and item IDs, to provide a more accurate and customized experience. UniMP is designed to handle a wide range of personalized tasks, such as item recommendation, product search, preference prediction, explanation generation, and user-guided image generation. The system uses a flexible framework that allows for seamless ingestion of multi-modal user history information and generates multi-modal outputs tailored to individual needs. The proposed framework incorporates a novel user modeling architecture that enables fine-grained multi-modal information extraction and alignment, ensuring precise user preference predictions. Additionally, the system employs multi-task optimization strategies, including token-level re-weighting and context reconstruction, to enhance the generalization capabilities of the model. The paper presents extensive experiments on a real-world benchmark, demonstrating that UniMP outperforms competitive methods specialized for each task. The results show that the model can effectively handle diverse personalization tasks and significantly improve its transferability. The system is evaluated on various tasks, including personalized image generation, multi-modal explanation, and multi-modal search, showcasing its potential in addressing a wide range of user needs. The paper also discusses the limitations of traditional personalization methods and highlights the advantages of using large vision-language models for multi-modal personalization. The proposed framework is flexible and extensible, making it suitable for a variety of applications in e-commerce and other domains. The study contributes to the development of more effective and efficient personalization systems that can adapt to diverse user requirements and provide a more personalized experience.This paper introduces a unified multi-modal personalization system (UniMP) that leverages large vision-language models to address the challenges of personalized recommendation and other tasks. The system aims to effectively integrate heterogeneous data from user histories, including images, text, and item IDs, to provide a more accurate and customized experience. UniMP is designed to handle a wide range of personalized tasks, such as item recommendation, product search, preference prediction, explanation generation, and user-guided image generation. The system uses a flexible framework that allows for seamless ingestion of multi-modal user history information and generates multi-modal outputs tailored to individual needs. The proposed framework incorporates a novel user modeling architecture that enables fine-grained multi-modal information extraction and alignment, ensuring precise user preference predictions. Additionally, the system employs multi-task optimization strategies, including token-level re-weighting and context reconstruction, to enhance the generalization capabilities of the model. The paper presents extensive experiments on a real-world benchmark, demonstrating that UniMP outperforms competitive methods specialized for each task. The results show that the model can effectively handle diverse personalization tasks and significantly improve its transferability. The system is evaluated on various tasks, including personalized image generation, multi-modal explanation, and multi-modal search, showcasing its potential in addressing a wide range of user needs. The paper also discusses the limitations of traditional personalization methods and highlights the advantages of using large vision-language models for multi-modal personalization. The proposed framework is flexible and extensible, making it suitable for a variety of applications in e-commerce and other domains. The study contributes to the development of more effective and efficient personalization systems that can adapt to diverse user requirements and provide a more personalized experience.