V²A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection

V²A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection

2024-08-10 | Xuanyu Zhang¹,²,*, Youmin Xu¹,*, Runyi Li¹, Jiwenn Yu¹, Weiqi Li¹, Zhipei Xu¹, Jian Zhang¹,²
V²A-Mark is a versatile deep visual-audio watermarking framework designed for manipulation localization and copyright protection in AI-generated videos. The framework embeds invisible cross-modal watermarks into video frames and audio, enabling precise tamper detection and copyright verification. It combines the fragility of video-into-video steganography with deep robust watermarking to embed both localization and copyright watermarks. The system includes a temporal alignment and fusion module (TAFM) and degradation prompt learning (DPL) to enhance localization accuracy and decoding robustness. Additionally, a sample-level audio localization method and cross-modal copyright extraction mechanism are introduced to couple audio and video information. The effectiveness of V²A-Mark has been verified on a visual-audio tampering dataset, demonstrating superior localization precision and copyright accuracy. The framework can detect tampered visual areas and audio periods, and recover copyright information. It is robust against common video and audio degradations and can handle various types of video editing. The method is evaluated on different video editing scenarios and shows high accuracy in both visual and audio tamper localization. The framework is also applied to real-world scenarios such as deepfake detection, where it can accurately identify tampered areas and audio alterations. The results show that V²A-Mark outperforms existing methods in terms of localization accuracy, copyright recovery, and robustness against common degradations. The framework is designed to be versatile and can be applied to various video editing scenarios. The method is effective against all forms of local visual-audio manipulation.V²A-Mark is a versatile deep visual-audio watermarking framework designed for manipulation localization and copyright protection in AI-generated videos. The framework embeds invisible cross-modal watermarks into video frames and audio, enabling precise tamper detection and copyright verification. It combines the fragility of video-into-video steganography with deep robust watermarking to embed both localization and copyright watermarks. The system includes a temporal alignment and fusion module (TAFM) and degradation prompt learning (DPL) to enhance localization accuracy and decoding robustness. Additionally, a sample-level audio localization method and cross-modal copyright extraction mechanism are introduced to couple audio and video information. The effectiveness of V²A-Mark has been verified on a visual-audio tampering dataset, demonstrating superior localization precision and copyright accuracy. The framework can detect tampered visual areas and audio periods, and recover copyright information. It is robust against common video and audio degradations and can handle various types of video editing. The method is evaluated on different video editing scenarios and shows high accuracy in both visual and audio tamper localization. The framework is also applied to real-world scenarios such as deepfake detection, where it can accurately identify tampered areas and audio alterations. The results show that V²A-Mark outperforms existing methods in terms of localization accuracy, copyright recovery, and robustness against common degradations. The framework is designed to be versatile and can be applied to various video editing scenarios. The method is effective against all forms of local visual-audio manipulation.
Reach us at info@study.space
[slides and audio] V2A-Mark%3A Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection