HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

28 Jul 2024 | Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin
HumanVid is a large-scale, high-quality dataset designed for human image animation, combining real-world and synthetic data. It addresses the lack of accessible, high-quality datasets and the neglect of camera motion in existing methods. The dataset includes 20,000 human-centric videos in 1080P resolution, annotated with human and camera motions. Synthetic data includes 2,300 copyright-free 3D avatar assets, with a rule-based camera trajectory generation method to enhance camera motion diversity. The dataset is used to train CamAnimate, a baseline model that considers both human and camera motions for controllable animation. Experiments show that CamAnimate achieves state-of-the-art performance in controlling human pose and camera motion. The dataset and model are publicly available for research and development in human image animation.HumanVid is a large-scale, high-quality dataset designed for human image animation, combining real-world and synthetic data. It addresses the lack of accessible, high-quality datasets and the neglect of camera motion in existing methods. The dataset includes 20,000 human-centric videos in 1080P resolution, annotated with human and camera motions. Synthetic data includes 2,300 copyright-free 3D avatar assets, with a rule-based camera trajectory generation method to enhance camera motion diversity. The dataset is used to train CamAnimate, a baseline model that considers both human and camera motions for controllable animation. Experiments show that CamAnimate achieves state-of-the-art performance in controlling human pose and camera motion. The dataset and model are publicly available for research and development in human image animation.
Reach us at info@study.space