EgoGen: An Egocentric Synthetic Data Generator

EgoGen: An Egocentric Synthetic Data Generator

11 Apr 2024 | Gen Li¹, Kaifeng Zhao¹, Siwei Zhang¹, Xiaozhong Lyu¹, Mihai Dusmanu², Yan Zhang¹, Marc Pollefeys¹,², Siyu Tang¹
EgoGen is a novel synthetic data generator for egocentric perception tasks, capable of producing accurate and rich ground-truth training data. It integrates deep reinforcement learning (RL) with egocentric vision cues to synthesize human motions. The core of EgoGen is a novel human motion synthesis model that leverages egocentric visual inputs to sense the 3D environment. Combined with collision-avoiding motion primitives and a two-stage RL approach, the model offers a closed-loop solution for simulating natural human movements and behaviors. EgoGen enables virtual humans to explore and avoid obstacles in dynamic environments, and is directly applicable to dynamic settings without pre-defined paths. The system also includes a scalable data generation pipeline that outfits virtual humans with clothing, automates cloth animation, and integrates 3D assets from various sources. EgoGen's synthetic data with precise annotations improves the performance of state-of-the-art methods in tasks such as mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. The system is fully open-sourced, providing a practical solution for creating realistic egocentric training data and serving as a useful tool for egocentric computer vision research. EgoGen's motion synthesis model is closely related to previous works but distinguishes itself by enabling virtual humans to explore using egocentric visual cues without predefined paths, synthesizing egocentric perception behaviors beyond locomotion, and handling dynamic environments and multi-agent behavior without re-training. The system also supports simulating diverse head-mounted devices with different camera models, such as fisheye and pinhole cameras, and renders photorealistic egocentric synthetic data with rich and accurate ground truth annotations. EgoGen's synthetic data has been evaluated in various tasks, demonstrating its effectiveness in improving the performance of state-of-the-art algorithms. The system is designed to be scalable and efficient, with the potential to significantly impact egocentric perception tasks.EgoGen is a novel synthetic data generator for egocentric perception tasks, capable of producing accurate and rich ground-truth training data. It integrates deep reinforcement learning (RL) with egocentric vision cues to synthesize human motions. The core of EgoGen is a novel human motion synthesis model that leverages egocentric visual inputs to sense the 3D environment. Combined with collision-avoiding motion primitives and a two-stage RL approach, the model offers a closed-loop solution for simulating natural human movements and behaviors. EgoGen enables virtual humans to explore and avoid obstacles in dynamic environments, and is directly applicable to dynamic settings without pre-defined paths. The system also includes a scalable data generation pipeline that outfits virtual humans with clothing, automates cloth animation, and integrates 3D assets from various sources. EgoGen's synthetic data with precise annotations improves the performance of state-of-the-art methods in tasks such as mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. The system is fully open-sourced, providing a practical solution for creating realistic egocentric training data and serving as a useful tool for egocentric computer vision research. EgoGen's motion synthesis model is closely related to previous works but distinguishes itself by enabling virtual humans to explore using egocentric visual cues without predefined paths, synthesizing egocentric perception behaviors beyond locomotion, and handling dynamic environments and multi-agent behavior without re-training. The system also supports simulating diverse head-mounted devices with different camera models, such as fisheye and pinhole cameras, and renders photorealistic egocentric synthetic data with rich and accurate ground truth annotations. EgoGen's synthetic data has been evaluated in various tasks, demonstrating its effectiveness in improving the performance of state-of-the-art algorithms. The system is designed to be scalable and efficient, with the potential to significantly impact egocentric perception tasks.
Reach us at info@study.space