VideoMV is a novel method for generating consistent multi-view images based on large video generative models. The method leverages pre-trained video generative models and introduces a 3D-Aware Denoising Sampling strategy to enhance multi-view consistency. Unlike previous approaches that rely on multi-view attention modules, VideoMV uses a feed-forward reconstruction module to generate an explicit global 3D model, which is then used to improve the denoising sampling process. This approach allows for the generation of 24 dense views with high-quality and consistent results, significantly faster than state-of-the-art methods. VideoMV also provides a fast way to create 3D assets represented by 3D Gaussians. The method outperforms existing approaches in both quantitative metrics and visual effects. The paper presents experimental results showing that VideoMV achieves better performance in terms of efficiency and quality compared to other methods. The method is evaluated on two tasks: text-based multi-view generation and image-based multi-view generation. The results show that VideoMV generates high-quality, consistent multi-view images with accurate camera control and content alignment. The method is also effective for downstream tasks such as dense view reconstruction and distillation-based 3D generation. The paper concludes that VideoMV provides a promising approach for generating consistent multi-view images and has potential applications in various 3D generation and video-related tasks.VideoMV is a novel method for generating consistent multi-view images based on large video generative models. The method leverages pre-trained video generative models and introduces a 3D-Aware Denoising Sampling strategy to enhance multi-view consistency. Unlike previous approaches that rely on multi-view attention modules, VideoMV uses a feed-forward reconstruction module to generate an explicit global 3D model, which is then used to improve the denoising sampling process. This approach allows for the generation of 24 dense views with high-quality and consistent results, significantly faster than state-of-the-art methods. VideoMV also provides a fast way to create 3D assets represented by 3D Gaussians. The method outperforms existing approaches in both quantitative metrics and visual effects. The paper presents experimental results showing that VideoMV achieves better performance in terms of efficiency and quality compared to other methods. The method is evaluated on two tasks: text-based multi-view generation and image-based multi-view generation. The results show that VideoMV generates high-quality, consistent multi-view images with accurate camera control and content alignment. The method is also effective for downstream tasks such as dense view reconstruction and distillation-based 3D generation. The paper concludes that VideoMV provides a promising approach for generating consistent multi-view images and has potential applications in various 3D generation and video-related tasks.