SpatialTracker: Tracking Any 2D Pixels in 3D Space

SpatialTracker: Tracking Any 2D Pixels in 3D Space

5 Apr 2024 | Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou
SpatialTracker is a method for tracking 2D pixels in 3D space. The method lifts 2D pixels into 3D using monocular depth estimators and represents the 3D content of each frame using a triplane representation. It performs iterative updates using a transformer to estimate 3D trajectories. Tracking in 3D allows the method to leverage as-rigid-as-possible (ARAP) constraints while learning a rigidity embedding that clusters pixels into different rigid parts. The method achieves state-of-the-art tracking performance both qualitatively and quantitatively, particularly in challenging scenarios such as out-of-plane rotation. The method is evaluated on various public tracking benchmarks, including TAP-Vid, BADJA, and PointOdyssey, and demonstrates superior performance in handling fast complex motion and extended occlusion. The method also shows competitive performance in 2D tracking on the BADJA dataset and outperforms baselines in 3D tracking on the PointOdyssey dataset. The method is effective in handling occlusions and provides accurate long-range motion tracks even under fast movements and severe occlusion. The method uses a triplane representation to encode the 3D scene of each frame and performs iterative trajectory prediction using a transformer. The method also incorporates an ARAP constraint to enforce rigidity within rigid groups and uses a rigidity embedding to cluster pixels into different rigid parts. The method is trained using a combination of trajectory loss, visibility loss, and ARAP loss. The method is evaluated on multiple benchmarks and shows superior performance in both 2D and 3D tracking. The method is effective in handling complex motion and occlusions and provides accurate long-range motion tracks even under fast movements and severe occlusion. The method is also effective in handling long videos and provides accurate tracking results across overlapping windows. The method is trained using a combination of trajectory loss, visibility loss, and ARAP loss, and is evaluated on multiple benchmarks to demonstrate its effectiveness in both 2D and 3D tracking. The method is effective in handling complex motion and occlusions and provides accurate long-range motion tracks even under fast movements and severe occlusion.SpatialTracker is a method for tracking 2D pixels in 3D space. The method lifts 2D pixels into 3D using monocular depth estimators and represents the 3D content of each frame using a triplane representation. It performs iterative updates using a transformer to estimate 3D trajectories. Tracking in 3D allows the method to leverage as-rigid-as-possible (ARAP) constraints while learning a rigidity embedding that clusters pixels into different rigid parts. The method achieves state-of-the-art tracking performance both qualitatively and quantitatively, particularly in challenging scenarios such as out-of-plane rotation. The method is evaluated on various public tracking benchmarks, including TAP-Vid, BADJA, and PointOdyssey, and demonstrates superior performance in handling fast complex motion and extended occlusion. The method also shows competitive performance in 2D tracking on the BADJA dataset and outperforms baselines in 3D tracking on the PointOdyssey dataset. The method is effective in handling occlusions and provides accurate long-range motion tracks even under fast movements and severe occlusion. The method uses a triplane representation to encode the 3D scene of each frame and performs iterative trajectory prediction using a transformer. The method also incorporates an ARAP constraint to enforce rigidity within rigid groups and uses a rigidity embedding to cluster pixels into different rigid parts. The method is trained using a combination of trajectory loss, visibility loss, and ARAP loss. The method is evaluated on multiple benchmarks and shows superior performance in both 2D and 3D tracking. The method is effective in handling complex motion and occlusions and provides accurate long-range motion tracks even under fast movements and severe occlusion. The method is also effective in handling long videos and provides accurate tracking results across overlapping windows. The method is trained using a combination of trajectory loss, visibility loss, and ARAP loss, and is evaluated on multiple benchmarks to demonstrate its effectiveness in both 2D and 3D tracking. The method is effective in handling complex motion and occlusions and provides accurate long-range motion tracks even under fast movements and severe occlusion.
Reach us at info@study.space
[slides] SpatialTracker%3A Tracking Any 2D Pixels in 3D Space | StudySpace