16 Aug 2016 | David Held, Sebastian Thrun, Silvio Savarese
The paper "Learning to Track at 100 FPS with Deep Regression Networks" by David Held, Sebastian Thrun, and Silvio Savarese proposes a method for offline training of neural networks to track novel objects at 100 frames per second (FPS). The authors address the issue that most generic object trackers are trained from scratch online, missing out on the benefits of large offline training datasets. Their proposed method, called GOTURN (Generic Object Tracking Using Regression Networks), trains a neural network to track objects in a feed-forward manner without online fine-tuning, achieving real-time performance. The tracker learns a generic relationship between object motion and appearance, enabling it to track novel objects that do not appear in the training set. The paper demonstrates that GOTURN outperforms state-of-the-art trackers on a standard tracking benchmark and shows that performance improves with more training videos. The method is evaluated on the VOT 2014 Tracking Challenge, where GOTURN achieves the highest overall rank (average of accuracy and robustness ranks) while running at 100 FPS. The authors also analyze the generality and specificity of the tracker, showing that it can track novel objects and specialize in tracking specific classes of objects. The paper includes a detailed experimental setup, results, and ablation studies to support the effectiveness of the proposed method.The paper "Learning to Track at 100 FPS with Deep Regression Networks" by David Held, Sebastian Thrun, and Silvio Savarese proposes a method for offline training of neural networks to track novel objects at 100 frames per second (FPS). The authors address the issue that most generic object trackers are trained from scratch online, missing out on the benefits of large offline training datasets. Their proposed method, called GOTURN (Generic Object Tracking Using Regression Networks), trains a neural network to track objects in a feed-forward manner without online fine-tuning, achieving real-time performance. The tracker learns a generic relationship between object motion and appearance, enabling it to track novel objects that do not appear in the training set. The paper demonstrates that GOTURN outperforms state-of-the-art trackers on a standard tracking benchmark and shows that performance improves with more training videos. The method is evaluated on the VOT 2014 Tracking Challenge, where GOTURN achieves the highest overall rank (average of accuracy and robustness ranks) while running at 100 FPS. The authors also analyze the generality and specificity of the tracker, showing that it can track novel objects and specialize in tracking specific classes of objects. The paper includes a detailed experimental setup, results, and ablation studies to support the effectiveness of the proposed method.