7 May 2024 | Zhixuan Chu, Lei Zhang, Yichen Sun, Siqiao Xue, Zhibo Wang, Zhan Qin, Kui Ren
The paper introduces Sora Detector, a unified framework designed to detect hallucinations in large text-to-video (T2V) models, including the cutting-edge Sora model. Hallucinations, which refer to content that contradicts the input text, pose a significant challenge to the reliability and practical deployment of these models. Sora Detector leverages keyframe extraction, object detection, knowledge graph construction, and multimodal large language models to evaluate the consistency between video content summaries and textual prompts. It constructs static and dynamic knowledge graphs from frames to detect hallucinations both within and across frames. The framework provides a robust measure of consistency, static, and dynamic hallucinations.
Additionally, the authors developed the Sora Detector Agent, an automated system that integrates Sora Detector with large language models to streamline the hallucination detection process and generate comprehensive video quality reports. They also introduced T2VHaluBench, a benchmark designed to facilitate the evaluation of advancements in T2V hallucination detection.
The experimental results demonstrate the effectiveness of Sora Detector in detecting hallucinations across various video generation tasks, outperforming baseline methods in precision, recall, and F1 score. The code and dataset are available at https://github.com/TruthAI-Lab/SoraDetector.The paper introduces Sora Detector, a unified framework designed to detect hallucinations in large text-to-video (T2V) models, including the cutting-edge Sora model. Hallucinations, which refer to content that contradicts the input text, pose a significant challenge to the reliability and practical deployment of these models. Sora Detector leverages keyframe extraction, object detection, knowledge graph construction, and multimodal large language models to evaluate the consistency between video content summaries and textual prompts. It constructs static and dynamic knowledge graphs from frames to detect hallucinations both within and across frames. The framework provides a robust measure of consistency, static, and dynamic hallucinations.
Additionally, the authors developed the Sora Detector Agent, an automated system that integrates Sora Detector with large language models to streamline the hallucination detection process and generate comprehensive video quality reports. They also introduced T2VHaluBench, a benchmark designed to facilitate the evaluation of advancements in T2V hallucination detection.
The experimental results demonstrate the effectiveness of Sora Detector in detecting hallucinations across various video generation tasks, outperforming baseline methods in precision, recall, and F1 score. The code and dataset are available at https://github.com/TruthAI-Lab/SoraDetector.