Understanding LSUN%3A Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

The paper "LSUN: Construction of a Large-Scale Image Dataset using Deep Learning with Humans in the Loop" by Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao from Princeton University addresses the challenge of building large-scale image datasets for visual recognition models. The authors propose a semi-automated labeling scheme that leverages deep learning and human-in-the-loop collaboration to efficiently collect and label images. This approach aims to overcome the limitations of existing datasets, which are often data-hungry and outdated in terms of size and density. The key contributions of the paper include: 1. **LSUN Dataset**: The authors construct a new image dataset called LSUN, containing around one million labeled images for each of 10 scene categories and 20 object categories. 2. **Labeling Pipeline**: They introduce an iterative labeling pipeline that uses deep learning to classify images, then asks humans to label a subset of images, and repeats the process until the dataset is large enough for manual labeling. 3. **Quality Control**: The system includes mechanisms to ensure high-quality annotations, such as redundant labeling and strict instructions. 4. **Performance Evaluation**: Experiments show that popular convolutional networks achieve significant performance gains when trained on the LSUN dataset compared to datasets like ImageNet and Places. The paper highlights the importance of dense and diverse datasets for improving the performance of deep learning models in visual recognition tasks. The LSUN dataset is designed to address the limitations of existing datasets, providing a more comprehensive and realistic training set for future research and development.The paper "LSUN: Construction of a Large-Scale Image Dataset using Deep Learning with Humans in the Loop" by Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao from Princeton University addresses the challenge of building large-scale image datasets for visual recognition models. The authors propose a semi-automated labeling scheme that leverages deep learning and human-in-the-loop collaboration to efficiently collect and label images. This approach aims to overcome the limitations of existing datasets, which are often data-hungry and outdated in terms of size and density. The key contributions of the paper include: 1. **LSUN Dataset**: The authors construct a new image dataset called LSUN, containing around one million labeled images for each of 10 scene categories and 20 object categories. 2. **Labeling Pipeline**: They introduce an iterative labeling pipeline that uses deep learning to classify images, then asks humans to label a subset of images, and repeats the process until the dataset is large enough for manual labeling. 3. **Quality Control**: The system includes mechanisms to ensure high-quality annotations, such as redundant labeling and strict instructions. 4. **Performance Evaluation**: Experiments show that popular convolutional networks achieve significant performance gains when trained on the LSUN dataset compared to datasets like ImageNet and Places. The paper highlights the importance of dense and diverse datasets for improving the performance of deep learning models in visual recognition tasks. The LSUN dataset is designed to address the limitations of existing datasets, providing a more comprehensive and realistic training set for future research and development.

LSUN: Construction of a Large-Scale Image Dataset using Deep Learning with Humans in the Loop

4 Jun 2016 | Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, Jianxiong Xiao