GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild

GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild

20 Nov 2019 | Lianghua Huang, Xin Zhao, Member, IEEE, and Kaiqi Huang, Senior Member, IEEE
GOT-10k is a large, high-diversity benchmark for generic object tracking in the wild. It includes over 10,000 video segments with more than 1.5 million manually labeled bounding boxes, enabling unified training and stable evaluation of deep trackers. The dataset is built using the semantic hierarchy of WordNet to guide class population, ensuring comprehensive and unbiased coverage of diverse moving objects. It introduces the one-shot protocol for tracker evaluation, where training and test classes are zero-overlapped, avoiding biased results towards familiar objects and promoting generalization. Additional labels such as motion classes and object visible ratios are provided to facilitate motion-aware and occlusion-aware tracking. The dataset includes 563 object classes and 87 motion classes, offering a much wider coverage of moving object classes than similar-scale counterparts. It also provides a comprehensive platform for the tracking community with full-featured evaluation toolkits, an online evaluation server, and a responsive leaderboard. The database, toolkits, evaluation server, and baseline results are available at http://got-10k.aitestunion.com. The dataset is compared with other tracking datasets in terms of scale, diversity, and attribute annotations, showing that GOT-10k is much larger and offers a much wider coverage of object classes. It is the only benchmark that follows the one-shot protocol in tracker evaluation to avoid evaluation bias towards seen classes. The dataset is constructed by collecting and annotating videos with two-dimensional labels: object and motion classes. The dataset is split into unified training, validation, and test sets to enable fair comparison of tracking approaches. The test set contains 420 videos, 84 classes of moving objects, and 31 forms of motion, with a reasonably stable ranking. The dataset is evaluated using metrics such as average overlap (AO) and success rate (SR), with class-balanced metrics mAO and mSR to avoid evaluation results dominated by larger-scale object classes. The dataset is benchmarked with 39 recent state-of-the-art tracking approaches and their variants, showing that tracking in real-world unconstrained videos is still challenging. The dataset is evaluated by challenges such as occlusion/truncation, scale variation, aspect ratio variation, fast motion, illumination variation, and low resolution targets, showing that tracking under fast object state and appearance changes is still challenging. The dataset is evaluated by object and motion classes, showing that small, thin, and fast-moving objects are harder to track than large or slow objects, and objects with large deformation lead to lower tracking performance. The dataset is evaluated by different object and motion classes, showing that the person class represents moderate difficulty. The dataset is evaluated by different challenges, showing that tracking under fast object state and appearance changes is still challenging. The dataset is evaluated by different object and motion classes, showing that the person class represents moderate difficulty.GOT-10k is a large, high-diversity benchmark for generic object tracking in the wild. It includes over 10,000 video segments with more than 1.5 million manually labeled bounding boxes, enabling unified training and stable evaluation of deep trackers. The dataset is built using the semantic hierarchy of WordNet to guide class population, ensuring comprehensive and unbiased coverage of diverse moving objects. It introduces the one-shot protocol for tracker evaluation, where training and test classes are zero-overlapped, avoiding biased results towards familiar objects and promoting generalization. Additional labels such as motion classes and object visible ratios are provided to facilitate motion-aware and occlusion-aware tracking. The dataset includes 563 object classes and 87 motion classes, offering a much wider coverage of moving object classes than similar-scale counterparts. It also provides a comprehensive platform for the tracking community with full-featured evaluation toolkits, an online evaluation server, and a responsive leaderboard. The database, toolkits, evaluation server, and baseline results are available at http://got-10k.aitestunion.com. The dataset is compared with other tracking datasets in terms of scale, diversity, and attribute annotations, showing that GOT-10k is much larger and offers a much wider coverage of object classes. It is the only benchmark that follows the one-shot protocol in tracker evaluation to avoid evaluation bias towards seen classes. The dataset is constructed by collecting and annotating videos with two-dimensional labels: object and motion classes. The dataset is split into unified training, validation, and test sets to enable fair comparison of tracking approaches. The test set contains 420 videos, 84 classes of moving objects, and 31 forms of motion, with a reasonably stable ranking. The dataset is evaluated using metrics such as average overlap (AO) and success rate (SR), with class-balanced metrics mAO and mSR to avoid evaluation results dominated by larger-scale object classes. The dataset is benchmarked with 39 recent state-of-the-art tracking approaches and their variants, showing that tracking in real-world unconstrained videos is still challenging. The dataset is evaluated by challenges such as occlusion/truncation, scale variation, aspect ratio variation, fast motion, illumination variation, and low resolution targets, showing that tracking under fast object state and appearance changes is still challenging. The dataset is evaluated by object and motion classes, showing that small, thin, and fast-moving objects are harder to track than large or slow objects, and objects with large deformation lead to lower tracking performance. The dataset is evaluated by different object and motion classes, showing that the person class represents moderate difficulty. The dataset is evaluated by different challenges, showing that tracking under fast object state and appearance changes is still challenging. The dataset is evaluated by different object and motion classes, showing that the person class represents moderate difficulty.
Reach us at info@study.space
[slides and audio] GOT-10k%3A A Large High-Diversity Benchmark for Generic Object Tracking in the Wild