[slides and audio] HOI-M3%3A Capture Multiple Humans and Objects Interaction within Contextual Environment

The paper introduces HOI-M³, a large-scale dataset designed to capture and model interactions involving multiple humans and objects within contextual environments. The dataset features 181 million video frames recorded from 42 diverse viewpoints, covering a wide range of daily scenarios. It provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs, covering 199 sequences and 181 million frames of diverse humans and objects under rich activities. Key contributions of the paper include: 1. **HOI-M³ Dataset**: A comprehensive motion dataset for multi-person and multi-object interactions, featuring high quality, large size, and rich modality. 2. **Robust Joint Optimization**: A robust pipeline for accurate 3D tracking of humans and objects from dense RGB and IMU inputs. 3. **Novel Tasks and Baselines**: Two novel tasks—monocular capture of multiple HOI and unstructured generation of multiple HOI—are introduced, along with strong baselines for these tasks. 4. **Experiments and Evaluation**: Extensive evaluations demonstrate the effectiveness of the dataset and baseline methods, highlighting the challenges and potential of capturing or generating vivid motions of multiple human-object interactions. The paper also discusses the limitations of the dataset, such as its indoor setting and limited scene diversity, and provides details on the dataset's statistics, preprocessing, and hardware setup. The HOI-M³ dataset, along with corresponding codes and pre-trained models, is made available to the community to facilitate further research on human-object interactions.The paper introduces HOI-M³, a large-scale dataset designed to capture and model interactions involving multiple humans and objects within contextual environments. The dataset features 181 million video frames recorded from 42 diverse viewpoints, covering a wide range of daily scenarios. It provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs, covering 199 sequences and 181 million frames of diverse humans and objects under rich activities. Key contributions of the paper include: 1. **HOI-M³ Dataset**: A comprehensive motion dataset for multi-person and multi-object interactions, featuring high quality, large size, and rich modality. 2. **Robust Joint Optimization**: A robust pipeline for accurate 3D tracking of humans and objects from dense RGB and IMU inputs. 3. **Novel Tasks and Baselines**: Two novel tasks—monocular capture of multiple HOI and unstructured generation of multiple HOI—are introduced, along with strong baselines for these tasks. 4. **Experiments and Evaluation**: Extensive evaluations demonstrate the effectiveness of the dataset and baseline methods, highlighting the challenges and potential of capturing or generating vivid motions of multiple human-object interactions. The paper also discusses the limitations of the dataset, such as its indoor setting and limited scene diversity, and provides details on the dataset's statistics, preprocessing, and hardware setup. The HOI-M³ dataset, along with corresponding codes and pre-trained models, is made available to the community to facilitate further research on human-object interactions.

HOI-M³: Capture Multiple Humans and Objects Interaction within Contextual Environment

2 Apr 2024 | Juze Zhang, Jingyan Zhang, Zining Song, Zhanhe Shi, Chengfeng Zhao, Ye Shi, Jingyi Yu, Lan Xu, Jingya Wang