Selecting pseudo-absences for species distribution models: how, where and how many?

Selecting pseudo-absences for species distribution models: how, where and how many?

2012 | Morgane Barbet-Massin, Frédéric Jiguet, Cécile Hélène Albert, Wilfried Thuiller
This study investigates the optimal methods for selecting pseudo-absences (background or pseudo-absence data) to build reliable species distribution models (SDMs). The authors conducted a comprehensive comparative analysis using simulated species distributions to determine how, where, and how many pseudo-absences should be generated. They found that the method of selecting pseudo-absences had the greatest impact on model accuracy, with randomly selected pseudo-absences yielding the most reliable models. For regression techniques (GLM, GAM, and MARS), a large number of pseudo-absences (e.g., 10,000) with equal weighting for presences and absences was recommended. For classification and machine-learning techniques (MDA, CTA, BRT, and RF), the number of pseudo-absences had the greatest impact on model accuracy, and averaging several runs with fewer pseudo-absences (e.g., 100) was suggested. The study also recommended random selection of pseudo-absences for regression techniques and random selection of geographically and environmentally stratified pseudo-absences for classification and machine-learning techniques. Overall, the study provides guidelines for selecting pseudo-absences to improve the accuracy of SDMs.This study investigates the optimal methods for selecting pseudo-absences (background or pseudo-absence data) to build reliable species distribution models (SDMs). The authors conducted a comprehensive comparative analysis using simulated species distributions to determine how, where, and how many pseudo-absences should be generated. They found that the method of selecting pseudo-absences had the greatest impact on model accuracy, with randomly selected pseudo-absences yielding the most reliable models. For regression techniques (GLM, GAM, and MARS), a large number of pseudo-absences (e.g., 10,000) with equal weighting for presences and absences was recommended. For classification and machine-learning techniques (MDA, CTA, BRT, and RF), the number of pseudo-absences had the greatest impact on model accuracy, and averaging several runs with fewer pseudo-absences (e.g., 100) was suggested. The study also recommended random selection of pseudo-absences for regression techniques and random selection of geographically and environmentally stratified pseudo-absences for classification and machine-learning techniques. Overall, the study provides guidelines for selecting pseudo-absences to improve the accuracy of SDMs.
Reach us at info@study.space
[slides and audio] Selecting pseudo%E2%80%90absences for species distribution models%3A how%2C where and how many%3F