The Sora model has demonstrated remarkable capabilities in video generation, sparking discussions about its ability to simulate real-world phenomena. However, there is a lack of established metrics to evaluate its fidelity to real-world physics quantitatively. This paper introduces a new benchmark that assesses the quality of generated videos based on their adherence to real-world physics principles. The method involves transforming generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily dependent on video quality. From the perspective of 3D reconstruction, the fidelity of geometric constraints satisfied by the constructed 3D models is used as a proxy to measure how well the generated videos conform to real-world physics rules.
The paper compares Sora with other video generation models such as Pika and Gen2. The results show that Sora outperforms these models in terms of geometric consistency. The benchmark uses metrics derived from 3D reconstruction, including the number of matching points, inlier ratios, and geometric reprojection errors. These metrics are calculated by extracting frames from the videos and performing 3D reconstruction using Structure-from-Motion (SfM) and Gaussian Splatting. The results indicate that Sora generates videos with higher geometric consistency and better 3D reconstruction quality compared to other models.
The paper also discusses the importance of physical accuracy in video generation and proposes using 3D reconstruction metrics to evaluate the geometric properties of generated videos. The results show that Sora's videos have higher geometric consistency and better 3D reconstruction quality compared to other models. The paper concludes that the proposed benchmark provides a new way to evaluate the geometric consistency of video generation models.The Sora model has demonstrated remarkable capabilities in video generation, sparking discussions about its ability to simulate real-world phenomena. However, there is a lack of established metrics to evaluate its fidelity to real-world physics quantitatively. This paper introduces a new benchmark that assesses the quality of generated videos based on their adherence to real-world physics principles. The method involves transforming generated videos into 3D models, leveraging the premise that the accuracy of 3D reconstruction is heavily dependent on video quality. From the perspective of 3D reconstruction, the fidelity of geometric constraints satisfied by the constructed 3D models is used as a proxy to measure how well the generated videos conform to real-world physics rules.
The paper compares Sora with other video generation models such as Pika and Gen2. The results show that Sora outperforms these models in terms of geometric consistency. The benchmark uses metrics derived from 3D reconstruction, including the number of matching points, inlier ratios, and geometric reprojection errors. These metrics are calculated by extracting frames from the videos and performing 3D reconstruction using Structure-from-Motion (SfM) and Gaussian Splatting. The results indicate that Sora generates videos with higher geometric consistency and better 3D reconstruction quality compared to other models.
The paper also discusses the importance of physical accuracy in video generation and proposes using 3D reconstruction metrics to evaluate the geometric properties of generated videos. The results show that Sora's videos have higher geometric consistency and better 3D reconstruction quality compared to other models. The paper concludes that the proposed benchmark provides a new way to evaluate the geometric consistency of video generation models.