This survey provides a comprehensive overview of the integration of large language models (LLMs) with multimodal learning, focusing on multimodal generation and editing across various domains such as image, video, 3D, and audio. The authors categorize the studies into LLM-based and CLIP/T5-based methods, highlighting notable advancements and milestone works. They discuss the critical technical components behind these methods and the multimodal datasets used. Additionally, the survey explores tool-augmented multimodal agents that leverage existing generative models for human-computer interaction and advances in generative AI safety. The work aims to advance the development of Artificial Intelligence for Generative Content (AIGC) and world models, providing a systematic and insightful overview of multimodal generation and processing. The survey is structured into sections covering image, video, 3D, and audio generation and editing, as well as safety, emerging applications, and future prospects.This survey provides a comprehensive overview of the integration of large language models (LLMs) with multimodal learning, focusing on multimodal generation and editing across various domains such as image, video, 3D, and audio. The authors categorize the studies into LLM-based and CLIP/T5-based methods, highlighting notable advancements and milestone works. They discuss the critical technical components behind these methods and the multimodal datasets used. Additionally, the survey explores tool-augmented multimodal agents that leverage existing generative models for human-computer interaction and advances in generative AI safety. The work aims to advance the development of Artificial Intelligence for Generative Content (AIGC) and world models, providing a systematic and insightful overview of multimodal generation and processing. The survey is structured into sections covering image, video, 3D, and audio generation and editing, as well as safety, emerging applications, and future prospects.