Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

31 Jul 2018 | Dima Damen1, Hazel Doughty1, Giovanni Maria Farinella2, Sanja Fidler3, Antonino Furnari2, Evangelos Kazakos1, Davide Moltisanti1, Jonathan Munro1, Toby Perrett1, Will Price1, and Michael Wray1
The EPIC-KITCHENS dataset is a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. It features 55 hours of video with 11.5 million frames, densely labeled with 39,600 action segments and 454,300 object bounding boxes. The dataset includes narrations from participants, providing true intentions, and crowd-sourced ground-truths. It addresses challenges in object detection, action recognition, and anticipation. The dataset includes diverse cooking styles from 10 nationalities across four cities, capturing natural, multi-tasking activities. It is significantly larger than existing datasets, with 11.5M frames versus 1M in ADL, 90x more action segments, and 4x more object bounding boxes. The dataset is used to evaluate baselines on seen and unseen kitchens. It includes annotations for action segments, object bounding boxes, and verb/noun classes. The dataset is available for research and includes online leaderboards for tracking progress. The dataset is valuable for first-person vision research, offering rich, realistic data for understanding daily activities and interactions. It supports challenges in object detection, action recognition, and anticipation, with baseline results showing the potential for advancing fine-grained video understanding. The dataset is released with sequences, annotations, and leaderboards for community use.The EPIC-KITCHENS dataset is a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. It features 55 hours of video with 11.5 million frames, densely labeled with 39,600 action segments and 454,300 object bounding boxes. The dataset includes narrations from participants, providing true intentions, and crowd-sourced ground-truths. It addresses challenges in object detection, action recognition, and anticipation. The dataset includes diverse cooking styles from 10 nationalities across four cities, capturing natural, multi-tasking activities. It is significantly larger than existing datasets, with 11.5M frames versus 1M in ADL, 90x more action segments, and 4x more object bounding boxes. The dataset is used to evaluate baselines on seen and unseen kitchens. It includes annotations for action segments, object bounding boxes, and verb/noun classes. The dataset is available for research and includes online leaderboards for tracking progress. The dataset is valuable for first-person vision research, offering rich, realistic data for understanding daily activities and interactions. It supports challenges in object detection, action recognition, and anticipation, with baseline results showing the potential for advancing fine-grained video understanding. The dataset is released with sequences, annotations, and leaderboards for community use.
Reach us at info@study.space
[slides] Scaling Egocentric Vision%3A The EPIC-KITCHENS Dataset | StudySpace