Duolando: FOLLOWER GPT WITH OFF-POLICY REINFORCEMENT LEARNING FOR DANCE ACCOMPANIMENT

Duolando: FOLLOWER GPT WITH OFF-POLICY REINFORCEMENT LEARNING FOR DANCE ACCOMPANIMENT

2024 | Li Siyao, Tianpei Gu, Zhengyu Lin, Zhitao Yang, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy
Duolando is a novel method for dance accompaniment, which involves generating responsive movements from a follower that are synchronized with the leader's movements and the underlying musical rhythm. The method is based on a GPT model enhanced with off-policy reinforcement learning to improve its generalization ability. To support this task, a large-scale dataset named DD100 was created, containing 10 different genres of duet dances, with each genre featuring strong interaction between the dancers. The dataset includes 10 distinct clips for each genre, with a total duration of approximately 117 minutes. The dataset was collected using 20 optical MoCap cameras to capture body data for two dancers at 120 FPS. The raw Mocap data consist of the 3D positions of 53 marker points on body surface in each frame. To process these data into SMPL-X format, the data was first selected for each dancer to fit his/her body shape parameters. Then, the pre-processed body shape was used to assist pose parameters regression in each frame after filtering out those invisible points with confidence scores lower than a threshold. During this procedure, the hand movement was not fitted. Since the hands are prone to self-occlusion or inter-occlusions between two individuals, inertial motion capture gloves were used to capture these data. The raw data of meta gloves are stored in BioVision Motion Capture (BVH) format, describing hierarchical rotations of finger joints in Euler angles. These Euler angles were transferred to axis angles aligning the MANO model initialized with “flat” gestures for both hands. As the “flat” pose of MANO differs from the initial one in BVH format with fingers apart in specific angles, this difference was subtracted while mapping from BVH to MANO. Finally, the body and hand parameters were combined with the wrist rotations from MoCap regression. Each dance clip data in DD100 consists of the SMPL-X sequences for both the leader and follower, along with the corresponding music within an average rhythmical beat of 118 beat per minute (BPM) from 72 BPM to 163 BPM. The method proposed in this paper is a two-stage framework to generate follower motion in harmony with both the background music and the leader's movements. In the first stage, VQ-VAEs are trained to embed and quantize the dance movements of different body parts and the relative translation between the two dancers. In the second stage, an interaction-coordinated GPT is devised to autoregressively predict the next token, conditioned on the amalgamated information of the music signal, the leader's motion, and the previous follower sequence. Stability issues arise when confronted with unheard music or unseen leader motion patterns. A common observation is that the lower body movement appears incompatible with the global displacement, resulting in skating artifacts. To address this challenge, an off-policy reinforcement learning strategy is introduced for GDuolando is a novel method for dance accompaniment, which involves generating responsive movements from a follower that are synchronized with the leader's movements and the underlying musical rhythm. The method is based on a GPT model enhanced with off-policy reinforcement learning to improve its generalization ability. To support this task, a large-scale dataset named DD100 was created, containing 10 different genres of duet dances, with each genre featuring strong interaction between the dancers. The dataset includes 10 distinct clips for each genre, with a total duration of approximately 117 minutes. The dataset was collected using 20 optical MoCap cameras to capture body data for two dancers at 120 FPS. The raw Mocap data consist of the 3D positions of 53 marker points on body surface in each frame. To process these data into SMPL-X format, the data was first selected for each dancer to fit his/her body shape parameters. Then, the pre-processed body shape was used to assist pose parameters regression in each frame after filtering out those invisible points with confidence scores lower than a threshold. During this procedure, the hand movement was not fitted. Since the hands are prone to self-occlusion or inter-occlusions between two individuals, inertial motion capture gloves were used to capture these data. The raw data of meta gloves are stored in BioVision Motion Capture (BVH) format, describing hierarchical rotations of finger joints in Euler angles. These Euler angles were transferred to axis angles aligning the MANO model initialized with “flat” gestures for both hands. As the “flat” pose of MANO differs from the initial one in BVH format with fingers apart in specific angles, this difference was subtracted while mapping from BVH to MANO. Finally, the body and hand parameters were combined with the wrist rotations from MoCap regression. Each dance clip data in DD100 consists of the SMPL-X sequences for both the leader and follower, along with the corresponding music within an average rhythmical beat of 118 beat per minute (BPM) from 72 BPM to 163 BPM. The method proposed in this paper is a two-stage framework to generate follower motion in harmony with both the background music and the leader's movements. In the first stage, VQ-VAEs are trained to embed and quantize the dance movements of different body parts and the relative translation between the two dancers. In the second stage, an interaction-coordinated GPT is devised to autoregressively predict the next token, conditioned on the amalgamated information of the music signal, the leader's motion, and the previous follower sequence. Stability issues arise when confronted with unheard music or unseen leader motion patterns. A common observation is that the lower body movement appears incompatible with the global displacement, resulting in skating artifacts. To address this challenge, an off-policy reinforcement learning strategy is introduced for G
Reach us at info@study.space
[slides] Duolando%3A Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment | StudySpace