Understanding TAPVid-3D%3A A Benchmark for Tracking Any Point in 3D

TAPVid-3D is a new benchmark for evaluating the task of long-range tracking of any point in 3D (TAP-3D). It consists of 4,000+ real-world videos from three distinct data sources: Aria Digital Twin, DriveTrack, and Panoptic Studio. These data sources span a variety of object types, motion patterns, and environments. The benchmark provides 3D point tracking annotations and includes metrics to measure the accuracy of 3D track estimation. It also includes a comprehensive assessment of the current state of TAP-3D by evaluating existing tracking models. The benchmark aims to improve the ability to understand precise 3D motion and surface deformation from monocular video. The dataset is available for download, generation, and model evaluation at https://tapvid3d.github.io/. The benchmark addresses the challenges of 3D point tracking, including ambiguous depth scales, occlusions, and multi-track spatio-temporal smoothness. It introduces new metrics that extend the Jaccard-based metric used in 2D tracking to handle the complexities of 3D tracking. The benchmark also includes a detailed description of the data sources, the pipeline for extracting ground truth 3D trajectories, and the metrics used for evaluation. The benchmark is designed to provide a more complete test of dynamic scene understanding compared to existing benchmarks. It includes a variety of metrics, including APD (average percent of points within a certain error), OA (occlusion accuracy), and AJ (average Jaccard). The benchmark also addresses the issue of scale ambiguity by re-scaling predictions to match ground truth. The benchmark is intended to accelerate research on TAP-3D and allow the development of models with greater dynamic scene understanding from monocular video.TAPVid-3D is a new benchmark for evaluating the task of long-range tracking of any point in 3D (TAP-3D). It consists of 4,000+ real-world videos from three distinct data sources: Aria Digital Twin, DriveTrack, and Panoptic Studio. These data sources span a variety of object types, motion patterns, and environments. The benchmark provides 3D point tracking annotations and includes metrics to measure the accuracy of 3D track estimation. It also includes a comprehensive assessment of the current state of TAP-3D by evaluating existing tracking models. The benchmark aims to improve the ability to understand precise 3D motion and surface deformation from monocular video. The dataset is available for download, generation, and model evaluation at https://tapvid3d.github.io/. The benchmark addresses the challenges of 3D point tracking, including ambiguous depth scales, occlusions, and multi-track spatio-temporal smoothness. It introduces new metrics that extend the Jaccard-based metric used in 2D tracking to handle the complexities of 3D tracking. The benchmark also includes a detailed description of the data sources, the pipeline for extracting ground truth 3D trajectories, and the metrics used for evaluation. The benchmark is designed to provide a more complete test of dynamic scene understanding compared to existing benchmarks. It includes a variety of metrics, including APD (average percent of points within a certain error), OA (occlusion accuracy), and AJ (average Jaccard). The benchmark also addresses the issue of scale ambiguity by re-scaling predictions to match ground truth. The benchmark is intended to accelerate research on TAP-3D and allow the development of models with greater dynamic scene understanding from monocular video.

TAPVid-3D: A Benchmark for Tracking Any Point in 3D

8 Jul 2024 | Skanda Koppula, Ignacio Rocco, Yi Yang, Joe Heyward, João Carreira, Andrew Zisserman, Gabriel Brostow, Carl Doersch