Understanding Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

The paper introduces the HANDS23 challenge, which aims to address the challenges of 3D hand-object interaction estimation from egocentric views. The challenge is based on the AssemblyHands and ARCTIC datasets, which provide multi-view egocentric videos of hands interacting with objects. The paper outlines two main tasks: 3D hand pose estimation from a single-view image (AssemblyHands) and consistent motion reconstruction (ARCTIC). It presents a detailed analysis of the submitted methods, highlighting their performance and contributions. Key findings include the effectiveness of addressing egocentric camera distortion, the use of high-capacity transformers for complex interactions, and the importance of multi-view fusion techniques. The study also identifies challenging scenarios, such as fast hand motion, narrow egocentric views, and intricate hand-object interactions. The paper concludes with a discussion of future research directions, emphasizing the need for more efficient training, leveraging 3D priors, and exploring diverse interaction scenarios.The paper introduces the HANDS23 challenge, which aims to address the challenges of 3D hand-object interaction estimation from egocentric views. The challenge is based on the AssemblyHands and ARCTIC datasets, which provide multi-view egocentric videos of hands interacting with objects. The paper outlines two main tasks: 3D hand pose estimation from a single-view image (AssemblyHands) and consistent motion reconstruction (ARCTIC). It presents a detailed analysis of the submitted methods, highlighting their performance and contributions. Key findings include the effectiveness of addressing egocentric camera distortion, the use of high-capacity transformers for complex interactions, and the importance of multi-view fusion techniques. The study also identifies challenging scenarios, such as fast hand motion, narrow egocentric views, and intricate hand-object interactions. The paper concludes with a discussion of future research directions, emphasizing the need for more efficient training, leveraging 3D priors, and exploring diverse interaction scenarios.