VEnhancer is a generative space-time enhancement framework that improves text-to-video results by adding more spatial details and synthetic detailed motion in the temporal domain. Given a low-quality video, VEnhancer increases its spatial and temporal resolution simultaneously with arbitrary up-sampling scales through a unified video diffusion model. It effectively removes spatial artifacts and temporal flickering in generated videos. Based on a pretrained video diffusion model, VEnhancer trains a video ControlNet and injects it as a condition for low frame-rate and low-resolution videos. To train the video ControlNet, space-time data augmentation and video-aware conditioning are designed. These designs enable stable and efficient end-to-end training. Extensive experiments show that VEnhancer surpasses existing state-of-the-art video super-resolution and space-time super-resolution methods in enhancing AI-generated videos. Moreover, with VEnhancer, the open-source text-to-video method VideoCrafter-2 reaches the top one in the video generation benchmark VBench. VEnhancer is flexible and can adapt to different up-sampling factors for spatial and temporal super-resolution, providing flexible control for handling diverse video artifacts. It also surpasses existing state-of-the-art video super-resolution methods and space-time super-resolution methods in enhancing generated videos. VEnhancer is based on a generative video prior and can address temporal/spatial super-resolution and refinement in a unified model. It is compared with state-of-the-art video super-resolution and space-time super-resolution methods, showing superior performance in various metrics. VEnhancer also improves the results of existing state-of-the-art text-to-video methods, lifting VideoCrafter-2's ranking to the top one. However, VEnhancer has limitations, such as longer inference time due to diffusion models and challenges in handling long AI-generated videos.VEnhancer is a generative space-time enhancement framework that improves text-to-video results by adding more spatial details and synthetic detailed motion in the temporal domain. Given a low-quality video, VEnhancer increases its spatial and temporal resolution simultaneously with arbitrary up-sampling scales through a unified video diffusion model. It effectively removes spatial artifacts and temporal flickering in generated videos. Based on a pretrained video diffusion model, VEnhancer trains a video ControlNet and injects it as a condition for low frame-rate and low-resolution videos. To train the video ControlNet, space-time data augmentation and video-aware conditioning are designed. These designs enable stable and efficient end-to-end training. Extensive experiments show that VEnhancer surpasses existing state-of-the-art video super-resolution and space-time super-resolution methods in enhancing AI-generated videos. Moreover, with VEnhancer, the open-source text-to-video method VideoCrafter-2 reaches the top one in the video generation benchmark VBench. VEnhancer is flexible and can adapt to different up-sampling factors for spatial and temporal super-resolution, providing flexible control for handling diverse video artifacts. It also surpasses existing state-of-the-art video super-resolution methods and space-time super-resolution methods in enhancing generated videos. VEnhancer is based on a generative video prior and can address temporal/spatial super-resolution and refinement in a unified model. It is compared with state-of-the-art video super-resolution and space-time super-resolution methods, showing superior performance in various metrics. VEnhancer also improves the results of existing state-of-the-art text-to-video methods, lifting VideoCrafter-2's ranking to the top one. However, VEnhancer has limitations, such as longer inference time due to diffusion models and challenges in handling long AI-generated videos.