PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition

PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition

31 Jan 2024 | Otto Brookes*, Majid Mirmehdi*, Colleen Stephens2, Samuel Angedakin2, Katherine Corogenes2, Dervla Dowd3, Paula Dieguez2, Thurston C. Hicks6, Sorrel Jones2, Kevin Lee2, Vera Leinert2,3, Juan Lapuente2, Maureen S. McCarthy2, Amelia Meier2, Mizuki Mura2, Emmanuelle Normand3, Virginie Vergnes3, Erin G. Wessling4, Roman M. Wittig2,7, Kevin Langergraber8, Nuria Maldonado2, Xinyu Yang1, Klaus Zuberbühler5, Christophe Boesch2,3, Mimi Arandjelovic2*, Hjalmar Küh1*, Tilo Burghardt*
The PanAf20K dataset is the largest and most diverse open-access video dataset of great apes in their natural environment, containing over 7 million frames across 20,000 camera trap videos of chimpanzees and gorillas collected at 14 field sites in tropical Africa. The dataset includes rich annotations and benchmarks, making it suitable for training and testing various computer vision tasks, including ape detection and behaviour recognition. It is essential for AI analysis of camera trap data, given that all great ape species are classified as endangered or critically endangered by the IUCN. The dataset and code are available from the project website. The dataset includes two parts: PanAf20K, which contains 20,000 videos with multi-label behavioural annotations, and PanAf500, which includes 500 videos with fine-grained annotations. The PanAf500 dataset was manually annotated by community scientists and researchers, while the PanAf20K dataset includes multi-label annotations from community scientists. The dataset covers a wide range of behaviours, including sitting, standing, walking, running, climbing, and camera interaction. The behavioural actions are categorized into head classes (frequent) and tail classes (rare), with significant imbalances in class distribution. Experiments on the PanAf500 and PanAf20K datasets show that the best performing models for ape detection include MegaDetector, Swin Transformer, and ResNet-101. For behavioural action recognition, the X3D model achieves the best top-1 accuracy, while the 3D ResNet-50 model performs best in average per-class accuracy. For multi-label behaviour recognition, the 3D ResNet-50 model with focal loss and logit adjustment achieves the best performance on tail classes. The dataset is suitable for various computer vision tasks, including animal detection, tracking, and behaviour recognition. It provides a comprehensive view of great ape populations and their behaviours, which is essential for conservation efforts. The dataset is publicly available and can be used by researchers in the ecological, biological, and computer vision domains to benchmark and expand great ape monitoring capabilities. The dataset is also valuable for community science projects, as it allows non-expert scientists to contribute to the annotation of complex data. The dataset is non-invasive, with no animal contact or direct observation, and all data collection was approved by relevant authorities.The PanAf20K dataset is the largest and most diverse open-access video dataset of great apes in their natural environment, containing over 7 million frames across 20,000 camera trap videos of chimpanzees and gorillas collected at 14 field sites in tropical Africa. The dataset includes rich annotations and benchmarks, making it suitable for training and testing various computer vision tasks, including ape detection and behaviour recognition. It is essential for AI analysis of camera trap data, given that all great ape species are classified as endangered or critically endangered by the IUCN. The dataset and code are available from the project website. The dataset includes two parts: PanAf20K, which contains 20,000 videos with multi-label behavioural annotations, and PanAf500, which includes 500 videos with fine-grained annotations. The PanAf500 dataset was manually annotated by community scientists and researchers, while the PanAf20K dataset includes multi-label annotations from community scientists. The dataset covers a wide range of behaviours, including sitting, standing, walking, running, climbing, and camera interaction. The behavioural actions are categorized into head classes (frequent) and tail classes (rare), with significant imbalances in class distribution. Experiments on the PanAf500 and PanAf20K datasets show that the best performing models for ape detection include MegaDetector, Swin Transformer, and ResNet-101. For behavioural action recognition, the X3D model achieves the best top-1 accuracy, while the 3D ResNet-50 model performs best in average per-class accuracy. For multi-label behaviour recognition, the 3D ResNet-50 model with focal loss and logit adjustment achieves the best performance on tail classes. The dataset is suitable for various computer vision tasks, including animal detection, tracking, and behaviour recognition. It provides a comprehensive view of great ape populations and their behaviours, which is essential for conservation efforts. The dataset is publicly available and can be used by researchers in the ecological, biological, and computer vision domains to benchmark and expand great ape monitoring capabilities. The dataset is also valuable for community science projects, as it allows non-expert scientists to contribute to the annotation of complex data. The dataset is non-invasive, with no animal contact or direct observation, and all data collection was approved by relevant authorities.
Reach us at info@study.space
[slides] PanAf20K%3A A Large Video Dataset for Wild Ape Detection and Behaviour Recognition | StudySpace