7 Aug 2016 | Stephan R. Richter, Vibhav Vineet, Stefan Roth, Vladlen Koltun
The paper "Playing for Data: Ground Truth from Computer Games" by Stephan R. Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun presents a method to rapidly create pixel-accurate semantic label maps for images extracted from modern computer games. The authors address the challenge of creating large datasets with pixel-level labels, which is traditionally costly and time-consuming due to the need for human annotation. By leveraging the communication between the game and the graphics hardware, they develop a technique called detouring to reconstruct associations between image patches. This allows for the rapid propagation of semantic labels within and across images, without access to the game's source code or content. The approach is validated by producing dense pixel-level semantic annotations for 25,000 images from the game Grand Theft Auto V, completing the labeling process in just 49 hours. Experiments on semantic segmentation datasets, such as CamVid and KITTI, show that using the acquired data significantly improves model accuracy and reduces the need for expensive real-world labeling. The paper also discusses the diversity of the collected data and the potential for extending the method to other dense prediction problems.The paper "Playing for Data: Ground Truth from Computer Games" by Stephan R. Richter, Vibhav Vineet, Stefan Roth, and Vladlen Koltun presents a method to rapidly create pixel-accurate semantic label maps for images extracted from modern computer games. The authors address the challenge of creating large datasets with pixel-level labels, which is traditionally costly and time-consuming due to the need for human annotation. By leveraging the communication between the game and the graphics hardware, they develop a technique called detouring to reconstruct associations between image patches. This allows for the rapid propagation of semantic labels within and across images, without access to the game's source code or content. The approach is validated by producing dense pixel-level semantic annotations for 25,000 images from the game Grand Theft Auto V, completing the labeling process in just 49 hours. Experiments on semantic segmentation datasets, such as CamVid and KITTI, show that using the acquired data significantly improves model accuracy and reduces the need for expensive real-world labeling. The paper also discusses the diversity of the collected data and the potential for extending the method to other dense prediction problems.