Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

1 Mar 2024 | Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, and Tat-Seng Chua
The paper introduces MM-AU, a large-scale dataset for multi-modal accident video understanding, containing 11,727 in-the-wild ego-view accident videos with temporal-aligned text descriptions. The dataset includes over 2.23 million object boxes and 58,650 pairs of video-based accident reasons, covering 58 accident categories. To facilitate various accident understanding tasks, particularly multimodal video diffusion for cause-effect chain analysis, the authors propose AdVersa-SD, an abductive accident video understanding framework for safe driving perception. AdVersa-SD uses an Object-Centric Video Diffusion (OVD) method driven by an abductive CLIP model, which learns the co-occurrence of normal, near-accident, and accident frames with corresponding text descriptions. OVD enforces causal region learning while fixing the content of the original frame background, enabling the generation of dominant cause-effect chains for specific accidents. Extensive experiments validate the abductive ability of AdVersa-SD and the superiority of OVD over state-of-the-art diffusion models. The dataset and code are available at www.lotvsmmau.net.The paper introduces MM-AU, a large-scale dataset for multi-modal accident video understanding, containing 11,727 in-the-wild ego-view accident videos with temporal-aligned text descriptions. The dataset includes over 2.23 million object boxes and 58,650 pairs of video-based accident reasons, covering 58 accident categories. To facilitate various accident understanding tasks, particularly multimodal video diffusion for cause-effect chain analysis, the authors propose AdVersa-SD, an abductive accident video understanding framework for safe driving perception. AdVersa-SD uses an Object-Centric Video Diffusion (OVD) method driven by an abductive CLIP model, which learns the co-occurrence of normal, near-accident, and accident frames with corresponding text descriptions. OVD enforces causal region learning while fixing the content of the original frame background, enabling the generation of dominant cause-effect chains for specific accidents. Extensive experiments validate the abductive ability of AdVersa-SD and the superiority of OVD over state-of-the-art diffusion models. The dataset and code are available at www.lotvsmmau.net.
Reach us at info@study.space