This survey provides a comprehensive overview of multimodal pretraining, adaptation, and generation techniques in the context of recommendation systems. It highlights the limitations of traditional recommendation models, which primarily rely on unique IDs and categorical features, and discusses how multimodal data can enhance recommendation accuracy. The survey covers recent advancements in large multimodal models, including text, audio, and video, and explores their applications in multimedia services such as news, music, and short-video platforms. Key topics include:
1. **Multimodal Pretraining**: Techniques for enhancing in-domain multimodal pretraining using domain-specific data, such as reconstructive, contrastive, and autoregressive paradigms.
2. **Multimodal Adaptation**: Methods for adapting pretrained models to downstream recommendation tasks, including representation transfer, model finetuning, adapter tuning, and prompt tuning.
3. **Multimodal Generation**: Applications of AI-generated content (AIGC) in recommendation systems, focusing on text, image, and video generation.
4. **Applications**: Common domains requiring multimodal recommendation techniques, such as e-commerce, advertising, and social media.
5. **Challenges and Opportunities**: Open challenges and future research directions, including multimodal information fusion, multi-domain recommendation, and efficient training and inference.
The survey aims to provide valuable insights and inspire further research in this evolving field, emphasizing the potential of multimodal models to enhance personalized recommendations.This survey provides a comprehensive overview of multimodal pretraining, adaptation, and generation techniques in the context of recommendation systems. It highlights the limitations of traditional recommendation models, which primarily rely on unique IDs and categorical features, and discusses how multimodal data can enhance recommendation accuracy. The survey covers recent advancements in large multimodal models, including text, audio, and video, and explores their applications in multimedia services such as news, music, and short-video platforms. Key topics include:
1. **Multimodal Pretraining**: Techniques for enhancing in-domain multimodal pretraining using domain-specific data, such as reconstructive, contrastive, and autoregressive paradigms.
2. **Multimodal Adaptation**: Methods for adapting pretrained models to downstream recommendation tasks, including representation transfer, model finetuning, adapter tuning, and prompt tuning.
3. **Multimodal Generation**: Applications of AI-generated content (AIGC) in recommendation systems, focusing on text, image, and video generation.
4. **Applications**: Common domains requiring multimodal recommendation techniques, such as e-commerce, advertising, and social media.
5. **Challenges and Opportunities**: Open challenges and future research directions, including multimodal information fusion, multi-domain recommendation, and efficient training and inference.
The survey aims to provide valuable insights and inspire further research in this evolving field, emphasizing the potential of multimodal models to enhance personalized recommendations.