27 Mar 2024 | Li Siyao, Tianpei Gu, Zhengyu Lin, Zhitao Yang, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy
The paper introduces a novel task in 3D dance generation called *dance accompaniment*, which involves generating responsive movements from a dance partner (the "follower") that synchronize with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, this duet dance scenario requires a higher degree of interaction between the two participants, necessitating delicate coordination in both pose and position. To support this task, the authors build a large-scale and diverse duet interactive dance dataset, *DD100*, by recording about 117 minutes of professional dancers' performances. They propose a GPT-based model, *Duolando*, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's, and the follower's movements. To enhance the model's capabilities in generating stable results on unseen conditions, they devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, they establish a benchmark with several carefully designed metrics to evaluate the quality of the generated movements, the interaction between partners, and the alignment with the background music. The contributions of the paper are threefold: (1) introducing a novel multi-modal task, dance accompaniment, and providing a large-scale and diverse dataset for training and testing; (2) constructing a GPT-based network capable of generating motion sequences, taking into account the coordination between partners; (3) introducing an off-policy reinforcement learning strategy for GPT to address out-of-distribution challenges.The paper introduces a novel task in 3D dance generation called *dance accompaniment*, which involves generating responsive movements from a dance partner (the "follower") that synchronize with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, this duet dance scenario requires a higher degree of interaction between the two participants, necessitating delicate coordination in both pose and position. To support this task, the authors build a large-scale and diverse duet interactive dance dataset, *DD100*, by recording about 117 minutes of professional dancers' performances. They propose a GPT-based model, *Duolando*, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's, and the follower's movements. To enhance the model's capabilities in generating stable results on unseen conditions, they devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, they establish a benchmark with several carefully designed metrics to evaluate the quality of the generated movements, the interaction between partners, and the alignment with the background music. The contributions of the paper are threefold: (1) introducing a novel multi-modal task, dance accompaniment, and providing a large-scale and diverse dataset for training and testing; (2) constructing a GPT-based network capable of generating motion sequences, taking into account the coordination between partners; (3) introducing an off-policy reinforcement learning strategy for GPT to address out-of-distribution challenges.