18 Jan 2024 | Jeonghwan Kim*, Jisoo Kim*, Jeonghyeon Na Hanbyul Joo
The paper introduces the ParaHome system, designed to capture and parameterize dynamic 3D movements of humans and objects within a common home environment. The system consists of a multi-view setup with 70 synchronized RGB cameras and wearable motion capture devices, including an IMU-based body suit and hand motion capture gloves. By leveraging the ParaHome system, the authors collect a large-scale dataset of human-object interactions, which offers key advancements over existing datasets in three main aspects: (1) capturing 3D body and dexterous hand manipulation motion alongside 3D object movement within a contextual home environment during natural activities; (2) encompassing human interaction with multiple objects in various episodic scenarios with corresponding descriptions in texts; (3) including articulated objects with multiple parts expressed with parameterized articulations. The dataset contains data from 30 participants, 22 objects, 101 scenarios, and 440 minutes of sequences in clock time. The authors also introduce new research tasks aimed at building a generative model for learning and synthesizing human-object interactions in a real-world room setting. The paper discusses the system's hardware and software solutions, data acquisition methods, and evaluations of the system's robustness and accuracy. The authors plan to expand the dataset by involving more participants and scenarios, and they acknowledge the limitations of the current system, such as the need for a markerless system and the inclusion of more diverse objects and room layouts.The paper introduces the ParaHome system, designed to capture and parameterize dynamic 3D movements of humans and objects within a common home environment. The system consists of a multi-view setup with 70 synchronized RGB cameras and wearable motion capture devices, including an IMU-based body suit and hand motion capture gloves. By leveraging the ParaHome system, the authors collect a large-scale dataset of human-object interactions, which offers key advancements over existing datasets in three main aspects: (1) capturing 3D body and dexterous hand manipulation motion alongside 3D object movement within a contextual home environment during natural activities; (2) encompassing human interaction with multiple objects in various episodic scenarios with corresponding descriptions in texts; (3) including articulated objects with multiple parts expressed with parameterized articulations. The dataset contains data from 30 participants, 22 objects, 101 scenarios, and 440 minutes of sequences in clock time. The authors also introduce new research tasks aimed at building a generative model for learning and synthesizing human-object interactions in a real-world room setting. The paper discusses the system's hardware and software solutions, data acquisition methods, and evaluations of the system's robustness and accuracy. The authors plan to expand the dataset by involving more participants and scenarios, and they acknowledge the limitations of the current system, such as the need for a markerless system and the inclusion of more diverse objects and room layouts.