TAPVid-3D: A Benchmark for Tracking Any Point in 3D

TAPVid-3D: A Benchmark for Tracking Any Point in 3D

8 Jul 2024 | Skanda Koppula, Ignacio Rocco, Yi Yang, Joe Heyward, João Carreira, Andrew Zisserman, Gabriel Brostow, Carl Doersch
The paper introduces TAPVid-3D, a new benchmark for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). Unlike existing benchmarks for 2D point tracking, which primarily focus on real-world videos, TAPVid-3D is the first to address 3D point tracking in real-world scenarios. The benchmark features over 4,000 real-world videos from three different data sources: Aria Digital Twin, DriveTrack, and Panoptic Studio, covering a wide range of object types, motion patterns, and environments. To measure performance, the authors formulate new metrics that extend the Jaccard-based metric used in 2D tracking to handle complexities such as ambiguous depth scales, occlusions, and multi-track spatio-temporal smoothness. The benchmark includes ground-truth 3D trajectories and occlusion information, and the authors manually verify a large sample of trajectories to ensure accuracy. They also construct competitive baselines using existing tracking models to assess the current state of TAP-3D. The paper discusses the limitations and ethical considerations of the benchmark and highlights its potential applications in various fields, including robotic manipulation, video generation, and visual odometry.The paper introduces TAPVid-3D, a new benchmark for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). Unlike existing benchmarks for 2D point tracking, which primarily focus on real-world videos, TAPVid-3D is the first to address 3D point tracking in real-world scenarios. The benchmark features over 4,000 real-world videos from three different data sources: Aria Digital Twin, DriveTrack, and Panoptic Studio, covering a wide range of object types, motion patterns, and environments. To measure performance, the authors formulate new metrics that extend the Jaccard-based metric used in 2D tracking to handle complexities such as ambiguous depth scales, occlusions, and multi-track spatio-temporal smoothness. The benchmark includes ground-truth 3D trajectories and occlusion information, and the authors manually verify a large sample of trajectories to ensure accuracy. They also construct competitive baselines using existing tracking models to assess the current state of TAP-3D. The paper discusses the limitations and ethical considerations of the benchmark and highlights its potential applications in various fields, including robotic manipulation, video generation, and visual odometry.
Reach us at info@study.space
[slides and audio] TAPVid-3D%3A A Benchmark for Tracking Any Point in 3D