[slides and audio] ReGenNet%3A Towards Human Action-Reaction Synthesis

**ReGenNet: Towards Human Action-Reaction Synthesis** This paper addresses the challenge of generating human reactions conditioned on given human actions, a task that is less explored compared to static scene and object interaction synthesis. The authors propose ReGenNet, a diffusion-based generative model with a Transformer decoder architecture, designed to synthesize plausible and instant human reactions in an online manner, where future actor motions are unavailable to reactors. The model is trained on annotated datasets (NTU120, InterHuman, and Chi3D) that include actor-reactor annotations, which are crucial for capturing the asymmetric, dynamic, synchronous, and detailed nature of human interactions. The proposed model is evaluated on multiple datasets and compared to state-of-the-art methods, demonstrating superior performance in terms of FID scores, action recognition accuracy, diversity, and multi-modality. Extensive experiments show that ReGenNet can generate realistic human reactions with low latency and generalizes well to unseen actor motions and viewpoint changes. The model is modular and flexible, making it adaptable to various practical applications such as multi-person interaction generation tasks.**ReGenNet: Towards Human Action-Reaction Synthesis** This paper addresses the challenge of generating human reactions conditioned on given human actions, a task that is less explored compared to static scene and object interaction synthesis. The authors propose ReGenNet, a diffusion-based generative model with a Transformer decoder architecture, designed to synthesize plausible and instant human reactions in an online manner, where future actor motions are unavailable to reactors. The model is trained on annotated datasets (NTU120, InterHuman, and Chi3D) that include actor-reactor annotations, which are crucial for capturing the asymmetric, dynamic, synchronous, and detailed nature of human interactions. The proposed model is evaluated on multiple datasets and compared to state-of-the-art methods, demonstrating superior performance in terms of FID scores, action recognition accuracy, diversity, and multi-modality. Extensive experiments show that ReGenNet can generate realistic human reactions with low latency and generalizes well to unseen actor motions and viewpoint changes. The model is modular and flexible, making it adaptable to various practical applications such as multi-person interaction generation tasks.

ReGenNet: Towards Human Action-Reaction Synthesis

18 Mar 2024 | Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, Wenjun Zeng