ReGenNet: Towards Human Action-Reaction Synthesis
This paper introduces ReGenNet, a diffusion-based generative model for human action-reaction synthesis. The model is designed to generate human reactions based on given human actions, considering the asymmetric, dynamic, synchronous, and detailed nature of human-human interactions. The paper proposes the first multi-setting human action-reaction synthesis benchmark with three annotated datasets: NTU120-AS, InterHuman-AS, and Chi3D-AS. These datasets include annotations for actor-reactor order, enabling the model to generate reactions conditioned on given actions.
ReGenNet is a diffusion model with a Transformer decoder architecture, combined with an explicit distance-based interaction loss. The model is trained on SMPL-X body models, which include detailed articulated hand poses. The model is designed to generate instant and plausible human reactions, even when the future states of actors are unavailable to reactors. The model is evaluated on the NTU120-AS, InterHuman-AS, and Chi3D-AS datasets, showing that it can generate realistic human reactions with the lowest time delay compared to baselines and can generalize to unseen actor motions and viewpoint changes.
The model is modular and flexible, allowing for adaptation to different settings. It can be trimmed for other practical applications such as multi-person interaction generation tasks. The model is evaluated on various settings, including online and unconstrained settings, and shows superior performance in generating realistic human reactions. The model is also tested for generalization to viewpoint changes, showing robustness to viewpoint changes.
The paper also presents ablation studies on the model's components, including the explicit interaction loss, the number of decoder layers, and the number of DDIM sampling timesteps. The results show that the model's performance is significantly improved by these components. The model is also tested on different settings, including offline and constrained settings, showing its flexibility and effectiveness.
The paper concludes that ReGenNet is a significant contribution to the field of human action-reaction synthesis, providing a new benchmark and a new model for generating realistic human reactions. The model is evaluated on various datasets and settings, showing its effectiveness and generalization ability. The paper also discusses the limitations of the model, including the complexity of real-world human-human interactions and the need for high-quality human-human interaction datasets with actor-reactor annotations.ReGenNet: Towards Human Action-Reaction Synthesis
This paper introduces ReGenNet, a diffusion-based generative model for human action-reaction synthesis. The model is designed to generate human reactions based on given human actions, considering the asymmetric, dynamic, synchronous, and detailed nature of human-human interactions. The paper proposes the first multi-setting human action-reaction synthesis benchmark with three annotated datasets: NTU120-AS, InterHuman-AS, and Chi3D-AS. These datasets include annotations for actor-reactor order, enabling the model to generate reactions conditioned on given actions.
ReGenNet is a diffusion model with a Transformer decoder architecture, combined with an explicit distance-based interaction loss. The model is trained on SMPL-X body models, which include detailed articulated hand poses. The model is designed to generate instant and plausible human reactions, even when the future states of actors are unavailable to reactors. The model is evaluated on the NTU120-AS, InterHuman-AS, and Chi3D-AS datasets, showing that it can generate realistic human reactions with the lowest time delay compared to baselines and can generalize to unseen actor motions and viewpoint changes.
The model is modular and flexible, allowing for adaptation to different settings. It can be trimmed for other practical applications such as multi-person interaction generation tasks. The model is evaluated on various settings, including online and unconstrained settings, and shows superior performance in generating realistic human reactions. The model is also tested for generalization to viewpoint changes, showing robustness to viewpoint changes.
The paper also presents ablation studies on the model's components, including the explicit interaction loss, the number of decoder layers, and the number of DDIM sampling timesteps. The results show that the model's performance is significantly improved by these components. The model is also tested on different settings, including offline and constrained settings, showing its flexibility and effectiveness.
The paper concludes that ReGenNet is a significant contribution to the field of human action-reaction synthesis, providing a new benchmark and a new model for generating realistic human reactions. The model is evaluated on various datasets and settings, showing its effectiveness and generalization ability. The paper also discusses the limitations of the model, including the complexity of real-world human-human interactions and the need for high-quality human-human interaction datasets with actor-reactor annotations.