7 Apr 2016 | Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele
The Cityscapes dataset is a large-scale benchmark for semantic urban scene understanding, consisting of stereo video sequences recorded in 50 cities, with 5000 high-quality pixel-level annotations and 20,000 coarse annotations. It provides a diverse range of urban scenes, including complex inner-city traffic, and includes depth information through stereo vision. The dataset includes annotations for 30 visual classes, grouped into eight categories, and is designed to support both pixel-level and instance-level semantic labeling. It also includes vehicle odometry, temperature, and GPS data. The dataset is split into training, validation, and test sets, with the test set containing annotations withheld for benchmarking. The dataset is compared to other datasets in terms of annotation volume, distribution of visual classes, and scene complexity. It is found to have significantly more annotated images and better annotation quality than previous datasets. The dataset includes a novel instance-level metric, iIoU, which evaluates the performance of instance-level semantic labeling. The dataset is also used to evaluate several state-of-the-art approaches, showing that it provides a more challenging and representative benchmark for semantic urban scene understanding. The dataset is also used to assess the compatibility and complementarity of Cityscapes with other datasets, showing that it enables training models that perform as well as or better than methods trained on other benchmarks. The dataset is also used to evaluate instance-level semantic labeling, showing that it is particularly challenging, with a low AP score. The dataset is expected to stimulate further research in the field of semantic urban scene understanding.The Cityscapes dataset is a large-scale benchmark for semantic urban scene understanding, consisting of stereo video sequences recorded in 50 cities, with 5000 high-quality pixel-level annotations and 20,000 coarse annotations. It provides a diverse range of urban scenes, including complex inner-city traffic, and includes depth information through stereo vision. The dataset includes annotations for 30 visual classes, grouped into eight categories, and is designed to support both pixel-level and instance-level semantic labeling. It also includes vehicle odometry, temperature, and GPS data. The dataset is split into training, validation, and test sets, with the test set containing annotations withheld for benchmarking. The dataset is compared to other datasets in terms of annotation volume, distribution of visual classes, and scene complexity. It is found to have significantly more annotated images and better annotation quality than previous datasets. The dataset includes a novel instance-level metric, iIoU, which evaluates the performance of instance-level semantic labeling. The dataset is also used to evaluate several state-of-the-art approaches, showing that it provides a more challenging and representative benchmark for semantic urban scene understanding. The dataset is also used to assess the compatibility and complementarity of Cityscapes with other datasets, showing that it enables training models that perform as well as or better than methods trained on other benchmarks. The dataset is also used to evaluate instance-level semantic labeling, showing that it is particularly challenging, with a low AP score. The dataset is expected to stimulate further research in the field of semantic urban scene understanding.