WorldGPT is a novel generalist world model designed to understand and predict state transitions across various modalities. The model is built on a Multimodal Large Language Model (MLLM) and trained using millions of videos from diverse domains. To enhance its capabilities in specialized scenarios and long-term tasks, WorldGPT incorporates a cognitive architecture that includes memory offloading, knowledge retrieval, and context reflection. The cognitive architecture consists of a working memory mechanism, a knowledge retrieval system, and a ContextReflector for extracting relevant information from retrieved contexts. WorldGPT is evaluated using WorldNet, a comprehensive dataset for multimodal state transition predictions, which includes both raw internet videos (WorldNet-Wild) and high-quality, curated samples (WorldNet-Crafted). Experiments demonstrate WorldGPT's proficiency in modeling world dynamics and its effectiveness as a universal world simulator, capable of synthesizing dynamic scenes and transferring specialized knowledge to downstream agents through dream tuning. The project is available on GitHub.WorldGPT is a novel generalist world model designed to understand and predict state transitions across various modalities. The model is built on a Multimodal Large Language Model (MLLM) and trained using millions of videos from diverse domains. To enhance its capabilities in specialized scenarios and long-term tasks, WorldGPT incorporates a cognitive architecture that includes memory offloading, knowledge retrieval, and context reflection. The cognitive architecture consists of a working memory mechanism, a knowledge retrieval system, and a ContextReflector for extracting relevant information from retrieved contexts. WorldGPT is evaluated using WorldNet, a comprehensive dataset for multimodal state transition predictions, which includes both raw internet videos (WorldNet-Wild) and high-quality, curated samples (WorldNet-Crafted). Experiments demonstrate WorldGPT's proficiency in modeling world dynamics and its effectiveness as a universal world simulator, capable of synthesizing dynamic scenes and transferring specialized knowledge to downstream agents through dream tuning. The project is available on GitHub.