AutoAD III: The Prequel – Back to the Pixels

AutoAD III: The Prequel – Back to the Pixels

22 Apr 2024 | Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
AutoAD III: The Prequel – Back to the Pixels introduces two new movie Audio Description (AD) datasets, CMD-AD and HowTo-AD, which use raw video pixels to generate AD. CMD-AD is constructed from audio descriptions and movie clips, while HowTo-AD is derived from the HowTo100M dataset by transforming instruction-based captions into AD. The paper proposes a new architecture for AD generation based on the Q-former, which uses frozen pre-trained visual encoders and large language models to generate character-aware descriptions. It also introduces new evaluation metrics, such as CRITIC, which assesses the accuracy of character naming in AD, and an LLM-based evaluation method for assessing the semantic quality of AD. The paper demonstrates that the proposed architecture achieves state-of-the-art results on AD generation, outperforming previous methods on both the standard MAD dataset and the new test set. The new datasets and evaluation methods provide a more accurate and comprehensive benchmark for AD generation.AutoAD III: The Prequel – Back to the Pixels introduces two new movie Audio Description (AD) datasets, CMD-AD and HowTo-AD, which use raw video pixels to generate AD. CMD-AD is constructed from audio descriptions and movie clips, while HowTo-AD is derived from the HowTo100M dataset by transforming instruction-based captions into AD. The paper proposes a new architecture for AD generation based on the Q-former, which uses frozen pre-trained visual encoders and large language models to generate character-aware descriptions. It also introduces new evaluation metrics, such as CRITIC, which assesses the accuracy of character naming in AD, and an LLM-based evaluation method for assessing the semantic quality of AD. The paper demonstrates that the proposed architecture achieves state-of-the-art results on AD generation, outperforming previous methods on both the standard MAD dataset and the new test set. The new datasets and evaluation methods provide a more accurate and comprehensive benchmark for AD generation.
Reach us at info@study.space
[slides] AutoAD III%3A The Prequel - Back to the Pixels | StudySpace