Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

26 Jul 2016 | Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta
The paper introduces the Charades dataset, collected through a crowdsourcing approach called "Hollywood in Homes," which involves hundreds of people recording videos of everyday activities in their homes. This dataset includes 9,848 annotated videos, each averaging 30 seconds, showing activities of 267 people from three continents. The dataset contains 27,847 video descriptions, 66,500 temporally localized intervals for 157 action classes, and 41,104 labels for 46 object classes. The dataset is designed to provide realistic, diverse, and casual examples of daily activities, which are essential for training computer vision models for tasks such as action recognition and automatic description generation. The dataset was created by distributing the entire video creation process, including script writing, video direction, and annotation, to workers on Amazon Mechanical Turk. This approach ensures diversity in the data, as it involves a wide range of scenarios and people, and allows for the control of video composition and length. The dataset is compared to other video datasets and is found to be more diverse and realistic, with a balanced distribution of actions and objects. The paper evaluates several state-of-the-art algorithms on the Charades dataset for action classification and sentence prediction. The results show that the dataset provides a challenging benchmark for action recognition, with some classes performing better than others. The dataset also includes video descriptions, which are used to evaluate sentence prediction models. The results indicate that the best models, such as S2VT, produce coherent but sometimes irrelevant captions. The Charades dataset is expected to provide new opportunities for the computer vision community, particularly in the areas of action recognition, object-action interactions, and understanding of daily activities. The dataset is publicly available and can be used for benchmarking future algorithms and exploring new domains. The paper concludes that the dataset offers a unique and valuable resource for researchers in computer vision.The paper introduces the Charades dataset, collected through a crowdsourcing approach called "Hollywood in Homes," which involves hundreds of people recording videos of everyday activities in their homes. This dataset includes 9,848 annotated videos, each averaging 30 seconds, showing activities of 267 people from three continents. The dataset contains 27,847 video descriptions, 66,500 temporally localized intervals for 157 action classes, and 41,104 labels for 46 object classes. The dataset is designed to provide realistic, diverse, and casual examples of daily activities, which are essential for training computer vision models for tasks such as action recognition and automatic description generation. The dataset was created by distributing the entire video creation process, including script writing, video direction, and annotation, to workers on Amazon Mechanical Turk. This approach ensures diversity in the data, as it involves a wide range of scenarios and people, and allows for the control of video composition and length. The dataset is compared to other video datasets and is found to be more diverse and realistic, with a balanced distribution of actions and objects. The paper evaluates several state-of-the-art algorithms on the Charades dataset for action classification and sentence prediction. The results show that the dataset provides a challenging benchmark for action recognition, with some classes performing better than others. The dataset also includes video descriptions, which are used to evaluate sentence prediction models. The results indicate that the best models, such as S2VT, produce coherent but sometimes irrelevant captions. The Charades dataset is expected to provide new opportunities for the computer vision community, particularly in the areas of action recognition, object-action interactions, and understanding of daily activities. The dataset is publicly available and can be used for benchmarking future algorithms and exploring new domains. The paper concludes that the dataset offers a unique and valuable resource for researchers in computer vision.
Reach us at info@study.space