VEnhancer is a generative space-time enhancement framework designed to improve the quality of AI-generated videos by enhancing spatial and temporal details. It addresses the limitations of existing methods, such as the need for separate spatial and temporal super-resolution, fixed upscaling factors, and poor generalization. VEnhancer uses a unified video diffusion model and a trainable video ControlNet to condition the generation process, allowing for flexible up-sampling scales and effective artifact removal. The framework includes space-time data augmentation and video-aware conditioning to enhance training stability and performance. Extensive experiments demonstrate that VEnhancer outperforms state-of-the-art video super-resolution and space-time super-resolution methods, and it significantly improves the results of the existing text-to-video method, VideoCrafter-2, on the VBench benchmark. VEnhancer's contributions include a unified framework for generative space-time super-resolution, a flexible and efficient training process, and superior performance in enhancing AI-generated videos.VEnhancer is a generative space-time enhancement framework designed to improve the quality of AI-generated videos by enhancing spatial and temporal details. It addresses the limitations of existing methods, such as the need for separate spatial and temporal super-resolution, fixed upscaling factors, and poor generalization. VEnhancer uses a unified video diffusion model and a trainable video ControlNet to condition the generation process, allowing for flexible up-sampling scales and effective artifact removal. The framework includes space-time data augmentation and video-aware conditioning to enhance training stability and performance. Extensive experiments demonstrate that VEnhancer outperforms state-of-the-art video super-resolution and space-time super-resolution methods, and it significantly improves the results of the existing text-to-video method, VideoCrafter-2, on the VBench benchmark. VEnhancer's contributions include a unified framework for generative space-time super-resolution, a flexible and efficient training process, and superior performance in enhancing AI-generated videos.