2018 | Mohammad Sadegh Norouzzadeh, Anh Nguyen, Margaret Kosmala, Alexandra Swanson, Meredith S. Palmer, Craig Packer, and Jeff Clune
This supplementary information document provides detailed technical details and experimental results for the paper "Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning." The document covers pre-processing steps, hyperparameter selection, and various experiments conducted to evaluate the performance of deep learning models on camera-trap images.
**Pre-processing and Training:**
- **Image Scaling and Normalization:** Original images (2,048×1,536 pixels) were scaled down to 256×256 pixels to reduce computational costs. Pixel intensities were normalized by subtracting the mean and dividing by the standard deviation.
- **Data Augmentation:** Random cropping, horizontal flipping, brightness, and contrast modifications were applied to enhance model robustness.
- **Training:** Networks were trained using Stochastic Gradient Descent (SGD) with momentum and weight decay. Hyperparameters were optimized, and models were trained for 55 epochs.
**One-stage Identification:**
- A two-step pipeline was used to process camera-trap images, but the authors explored a one-stage approach where empty images are classified as a separate class. This approach was found to have drawbacks, including class imbalance and reduced model flexibility.
**Results on Volunteer-Labeled Test Set:**
- The volunteer-labeled test set included 17,400 capture events with labels for species, counts, and additional attributes. Models achieved high accuracy on species identification and counting tasks, with the ensemble of models performing best.
**Comparing to Gomez et al. (2016):**
- The authors compared their results to those of Gomez et al., who used transfer learning from the ImageNet dataset. They found that their models achieved significantly higher accuracy, especially for the top-1 accuracy metric.
**Day vs. Night Accuracy:**
- The performance of the deep learning system was similar for day and night images, with slight improvements in counting accuracy at night.
**Transfer Learning:**
- Transfer learning from ImageNet did not significantly improve accuracy when training on the full SS dataset. However, it provided substantial performance improvements when training on smaller datasets with limited labeled examples.
**Prediction Averaging:**
- Averaging probabilities from multiple models was used to improve reliability, with the ensemble of models showing the best performance.
**Classifying Capture Events:**
- Combining predictions from multiple images within a capture event improved accuracy compared to classifying individual images or entire capture events.
**Confidence Thresholding:**
- Thresholding predictions based on confidence levels was used to focus on images with high confidence, improving reliability.
**Improving Accuracy for Rare Classes:**
- Methods such as weighted loss, oversampling, and emphasis sampling were applied to address class imbalance, with some methods showing improvements for rare classes.
Overall, the document provides a comprehensive overview of the technical details and experimental results, highlighting the effectiveness of deep learning models in identifying, counting, and describing wild animals in camera-trap imagesThis supplementary information document provides detailed technical details and experimental results for the paper "Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning." The document covers pre-processing steps, hyperparameter selection, and various experiments conducted to evaluate the performance of deep learning models on camera-trap images.
**Pre-processing and Training:**
- **Image Scaling and Normalization:** Original images (2,048×1,536 pixels) were scaled down to 256×256 pixels to reduce computational costs. Pixel intensities were normalized by subtracting the mean and dividing by the standard deviation.
- **Data Augmentation:** Random cropping, horizontal flipping, brightness, and contrast modifications were applied to enhance model robustness.
- **Training:** Networks were trained using Stochastic Gradient Descent (SGD) with momentum and weight decay. Hyperparameters were optimized, and models were trained for 55 epochs.
**One-stage Identification:**
- A two-step pipeline was used to process camera-trap images, but the authors explored a one-stage approach where empty images are classified as a separate class. This approach was found to have drawbacks, including class imbalance and reduced model flexibility.
**Results on Volunteer-Labeled Test Set:**
- The volunteer-labeled test set included 17,400 capture events with labels for species, counts, and additional attributes. Models achieved high accuracy on species identification and counting tasks, with the ensemble of models performing best.
**Comparing to Gomez et al. (2016):**
- The authors compared their results to those of Gomez et al., who used transfer learning from the ImageNet dataset. They found that their models achieved significantly higher accuracy, especially for the top-1 accuracy metric.
**Day vs. Night Accuracy:**
- The performance of the deep learning system was similar for day and night images, with slight improvements in counting accuracy at night.
**Transfer Learning:**
- Transfer learning from ImageNet did not significantly improve accuracy when training on the full SS dataset. However, it provided substantial performance improvements when training on smaller datasets with limited labeled examples.
**Prediction Averaging:**
- Averaging probabilities from multiple models was used to improve reliability, with the ensemble of models showing the best performance.
**Classifying Capture Events:**
- Combining predictions from multiple images within a capture event improved accuracy compared to classifying individual images or entire capture events.
**Confidence Thresholding:**
- Thresholding predictions based on confidence levels was used to focus on images with high confidence, improving reliability.
**Improving Accuracy for Rare Classes:**
- Methods such as weighted loss, oversampling, and emphasis sampling were applied to address class imbalance, with some methods showing improvements for rare classes.
Overall, the document provides a comprehensive overview of the technical details and experimental results, highlighting the effectiveness of deep learning models in identifying, counting, and describing wild animals in camera-trap images