30 Apr 2024 | Jingbo Wang, Zhengyi Luo, Ye Yuan, Yixuan Li, Bo Dai
PACER+ is a simulation-based framework for generating diverse and natural pedestrian animations on-demand in driving scenarios. The framework enables rich zero-shot control beyond trajectory following, allowing the creation of diverse animations in both manual and real-world scenarios. It supports fine-grained control over different body parts while following the given trajectory, achieved by selectively tracking specific body parts instead of rigidly tracking the entire body. This creates room for more life-like animations, such as walking while making a phone call, and ensures smoothness of motion, compatibility with terrain, and adherence to the provided trajectory. The framework can recreate real-world pedestrian animations into simulation environments without re-training or fine-tuning, automatically infilling missing parts.
The key insight behind PACER+ lies in the synergy between motion imitation and trajectory following tasks. While lower-body motion is often influenced by the trajectory and terrain, upper-body motion has the flexibility to encompass a diverse range of motions. Therefore, a joint training scheme is established to create a synergistic relationship between motion imitation and trajectory following tasks. This allows a single policy to track partial body motion and follow trajectories simultaneously in a physically plausible way.
The framework is designed for both manually synthetic scenarios and real-world scenarios. In real-world scenarios, it uses high-confidence frames and tracks the entire body motion for these frames to maintain optimal motion content. In low-confidence frames, it assigns a value of 1 only to keypoints with high-confidence estimation scores. This enables motion capture even when half of the body is occluded without requiring additional optimization steps.
The framework is evaluated against PACER, a state-of-the-art controllable pedestrian animation approach. It demonstrates superior motion quality and diversity, achieving lower FID and better diversity compared to PACER. It also shows improved motion tracking performance, particularly in whole-body, upper-body, and left/right arm tracking. The framework is capable of zero-shot animation recreation of real-world pedestrians, demonstrating its effectiveness in generating diverse and natural human animations. The framework is also capable of simulating pedestrian animation following the motion content of real-world videos, enhancing the realism of the simulated motion.PACER+ is a simulation-based framework for generating diverse and natural pedestrian animations on-demand in driving scenarios. The framework enables rich zero-shot control beyond trajectory following, allowing the creation of diverse animations in both manual and real-world scenarios. It supports fine-grained control over different body parts while following the given trajectory, achieved by selectively tracking specific body parts instead of rigidly tracking the entire body. This creates room for more life-like animations, such as walking while making a phone call, and ensures smoothness of motion, compatibility with terrain, and adherence to the provided trajectory. The framework can recreate real-world pedestrian animations into simulation environments without re-training or fine-tuning, automatically infilling missing parts.
The key insight behind PACER+ lies in the synergy between motion imitation and trajectory following tasks. While lower-body motion is often influenced by the trajectory and terrain, upper-body motion has the flexibility to encompass a diverse range of motions. Therefore, a joint training scheme is established to create a synergistic relationship between motion imitation and trajectory following tasks. This allows a single policy to track partial body motion and follow trajectories simultaneously in a physically plausible way.
The framework is designed for both manually synthetic scenarios and real-world scenarios. In real-world scenarios, it uses high-confidence frames and tracks the entire body motion for these frames to maintain optimal motion content. In low-confidence frames, it assigns a value of 1 only to keypoints with high-confidence estimation scores. This enables motion capture even when half of the body is occluded without requiring additional optimization steps.
The framework is evaluated against PACER, a state-of-the-art controllable pedestrian animation approach. It demonstrates superior motion quality and diversity, achieving lower FID and better diversity compared to PACER. It also shows improved motion tracking performance, particularly in whole-body, upper-body, and left/right arm tracking. The framework is capable of zero-shot animation recreation of real-world pedestrians, demonstrating its effectiveness in generating diverse and natural human animations. The framework is also capable of simulating pedestrian animation following the motion content of real-world videos, enhancing the realism of the simulated motion.