HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

28 Jul 2024 | Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin
HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation **Authors:** Zhenzhi Wang **Affiliations:** 1CUHK, 2Shanghai Artificial Intelligence Laboratory **URL:** https://humanvid.github.io/ **Abstract:** Human image animation involves generating videos from a character photo, allowing user control and unlocking potential in video and movie production. While recent approaches yield impressive results using high-quality training data, the inaccessibility of these datasets hampers fair and transparent benchmarking. Moreover, these approaches prioritize 2D human motion and overlook the significance of camera motions, leading to limited control and unstable video generation. To address these issues, we present HumanVid, the first large-scale high-quality dataset tailored for human image animation, combining crafted real-world and synthetic data. For real-world data, we compile a vast collection of copyright-free real-world videos, ensuring high-quality through a rule-based filtering strategy, resulting in 20K human-centric videos in 1080P resolution. Human and camera motion annotations are achieved using a 2D pose estimator and a SLAM-based method. For synthetic data, we gather 2,300 copyright-free 3D avatar assets and introduce a rule-based camera trajectory generation method. To validate the effectiveness of HumanVid, we establish a baseline model named CamAnimate, which considers both human and camera motions. Extensive experiments demonstrate that training on HumanVid achieves state-of-the-art performance in controlling human pose and camera motions, setting a new benchmark. Code and data are publicly available at https://github.com/zhenzhiwang/HumanVid/. High-quality and highly controllable human image animation has significant potential in video and movie production. Human image animation faces challenges due to the lack of high-quality public datasets and the neglect of camera motions in existing methods. Our proposed dataset combines real-world and synthetic data, enhancing visual quality and controllability. Extensive experiments validate the effectiveness of HumanVid, demonstrating superior performance in controlling human pose and camera motions. Our contributions aim to foster more transparent and comprehensive evaluations in the field, promoting advancements in video and movie production. - Introduction to human image animation and the challenges of camera motions. - Overview of existing datasets and limitations. - Detailed construction of the synthetic data, including character creation, motion retargeting, and camera trajectory design. - Curation of real-world data from copyright-free internet platforms. - Evaluation of the dataset and baseline model, including user studies and qualitative comparisons. - Limitations and broader impacts of the proposed approach. - HumanVid: A large-scale dataset for human image animation. - CamAnimate: A baseline model for camera-controllable human animation. - Extensive experiments and user studies demonstrating superior performance. - Public availability of code and data.HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation **Authors:** Zhenzhi Wang **Affiliations:** 1CUHK, 2Shanghai Artificial Intelligence Laboratory **URL:** https://humanvid.github.io/ **Abstract:** Human image animation involves generating videos from a character photo, allowing user control and unlocking potential in video and movie production. While recent approaches yield impressive results using high-quality training data, the inaccessibility of these datasets hampers fair and transparent benchmarking. Moreover, these approaches prioritize 2D human motion and overlook the significance of camera motions, leading to limited control and unstable video generation. To address these issues, we present HumanVid, the first large-scale high-quality dataset tailored for human image animation, combining crafted real-world and synthetic data. For real-world data, we compile a vast collection of copyright-free real-world videos, ensuring high-quality through a rule-based filtering strategy, resulting in 20K human-centric videos in 1080P resolution. Human and camera motion annotations are achieved using a 2D pose estimator and a SLAM-based method. For synthetic data, we gather 2,300 copyright-free 3D avatar assets and introduce a rule-based camera trajectory generation method. To validate the effectiveness of HumanVid, we establish a baseline model named CamAnimate, which considers both human and camera motions. Extensive experiments demonstrate that training on HumanVid achieves state-of-the-art performance in controlling human pose and camera motions, setting a new benchmark. Code and data are publicly available at https://github.com/zhenzhiwang/HumanVid/. High-quality and highly controllable human image animation has significant potential in video and movie production. Human image animation faces challenges due to the lack of high-quality public datasets and the neglect of camera motions in existing methods. Our proposed dataset combines real-world and synthetic data, enhancing visual quality and controllability. Extensive experiments validate the effectiveness of HumanVid, demonstrating superior performance in controlling human pose and camera motions. Our contributions aim to foster more transparent and comprehensive evaluations in the field, promoting advancements in video and movie production. - Introduction to human image animation and the challenges of camera motions. - Overview of existing datasets and limitations. - Detailed construction of the synthetic data, including character creation, motion retargeting, and camera trajectory design. - Curation of real-world data from copyright-free internet platforms. - Evaluation of the dataset and baseline model, including user studies and qualitative comparisons. - Limitations and broader impacts of the proposed approach. - HumanVid: A large-scale dataset for human image animation. - CamAnimate: A baseline model for camera-controllable human animation. - Extensive experiments and user studies demonstrating superior performance. - Public availability of code and data.
Reach us at info@study.space