This survey provides a comprehensive overview of the latest advancements and future directions in multimodal pretraining, adaptation, and generation techniques for recommendation systems. The paper discusses the challenges and opportunities in leveraging multimodal data to enhance recommendation accuracy and user experience. Traditional recommendation models rely on user-item IDs and categorical features, which may overlook the rich content across multiple modalities such as text, image, audio, and video. Recent developments in large multimodal models offer new opportunities for content-aware recommendation systems. The survey explores various techniques, including self-supervised pretraining, contrastive learning, autoregressive generation, and prompt tuning, to improve recommendation performance. It also discusses the application of AI-generated content (AIGC) in recommendation scenarios, such as text, image, and video generation. The paper highlights the importance of multimodal information fusion, cross-domain recommendation, and efficient training methods for practical deployment. Challenges such as domain generalization, catastrophic forgetting, and ethical concerns are addressed, along with future research directions. The survey emphasizes the potential of multimodal recommendation systems in various applications, including e-commerce, advertising, news, video, music, and fashion. It concludes with a call for further research to advance this evolving field.This survey provides a comprehensive overview of the latest advancements and future directions in multimodal pretraining, adaptation, and generation techniques for recommendation systems. The paper discusses the challenges and opportunities in leveraging multimodal data to enhance recommendation accuracy and user experience. Traditional recommendation models rely on user-item IDs and categorical features, which may overlook the rich content across multiple modalities such as text, image, audio, and video. Recent developments in large multimodal models offer new opportunities for content-aware recommendation systems. The survey explores various techniques, including self-supervised pretraining, contrastive learning, autoregressive generation, and prompt tuning, to improve recommendation performance. It also discusses the application of AI-generated content (AIGC) in recommendation scenarios, such as text, image, and video generation. The paper highlights the importance of multimodal information fusion, cross-domain recommendation, and efficient training methods for practical deployment. Challenges such as domain generalization, catastrophic forgetting, and ethical concerns are addressed, along with future research directions. The survey emphasizes the potential of multimodal recommendation systems in various applications, including e-commerce, advertising, news, video, music, and fashion. It concludes with a call for further research to advance this evolving field.