26 Jul 2016 | Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta
The paper introduces a novel approach called "Hollywood in Homes" for collecting diverse and realistic data for activity understanding in computer vision. This method involves crowdsourcing the entire process of video creation, from script writing to video recording and annotation, ensuring a wide range of activities and scenarios. The resulting dataset, named Charades, contains 9,848 annotated videos with an average length of 30 seconds, featuring activities performed by 267 people from three continents. Each video is annotated with multiple free-text descriptions, action labels, action intervals, and object classes. The dataset is designed to provide a realistic and diverse representation of daily activities, which is crucial for developing new computer vision techniques. The authors evaluate several tasks using this dataset, including action recognition and automatic description generation, and provide baseline results to inspire future research. The paper highlights the benefits of the Hollywood in Homes approach, such as increased diversity, realism, and the ability to control the composition and length of video scenes.The paper introduces a novel approach called "Hollywood in Homes" for collecting diverse and realistic data for activity understanding in computer vision. This method involves crowdsourcing the entire process of video creation, from script writing to video recording and annotation, ensuring a wide range of activities and scenarios. The resulting dataset, named Charades, contains 9,848 annotated videos with an average length of 30 seconds, featuring activities performed by 267 people from three continents. Each video is annotated with multiple free-text descriptions, action labels, action intervals, and object classes. The dataset is designed to provide a realistic and diverse representation of daily activities, which is crucial for developing new computer vision techniques. The authors evaluate several tasks using this dataset, including action recognition and automatic description generation, and provide baseline results to inspire future research. The paper highlights the benefits of the Hollywood in Homes approach, such as increased diversity, realism, and the ability to control the composition and length of video scenes.