The paper introduces VideoMV, a novel framework for generating multi-view images based on text or single-image prompts. Unlike previous approaches that use 2D diffusion models, VideoMV leverages off-the-shelf video generative models fine-tuned from object-centric videos. This approach addresses two key challenges: using appropriate data for training and ensuring multi-view consistency. VideoMV introduces a dense consistent multi-view generation model and a 3D-Aware Denoising Sampling strategy to enhance multi-view consistency. The method outperforms state-of-the-art approaches in terms of efficiency and quality, generating 24 dense views with faster convergence and comparable visual quality and consistency. The project page is available at [age3d.github.io/VideoMV](https://age3d.github.io/VideoMV).The paper introduces VideoMV, a novel framework for generating multi-view images based on text or single-image prompts. Unlike previous approaches that use 2D diffusion models, VideoMV leverages off-the-shelf video generative models fine-tuned from object-centric videos. This approach addresses two key challenges: using appropriate data for training and ensuring multi-view consistency. VideoMV introduces a dense consistent multi-view generation model and a 3D-Aware Denoising Sampling strategy to enhance multi-view consistency. The method outperforms state-of-the-art approaches in terms of efficiency and quality, generating 24 dense views with faster convergence and comparable visual quality and consistency. The project page is available at [age3d.github.io/VideoMV](https://age3d.github.io/VideoMV).