Logistic Regression in Rare Events Data

Logistic Regression in Rare Events Data

2001 | Gary King, Langche Zeng
This paper addresses the challenges of analyzing rare events data, which are binary dependent variables with far fewer ones (events) than zeros. Logistic regression, a common method for such data, often underestimates the probability of rare events. Additionally, data collection strategies for rare events are inefficient, often resulting in large datasets with few meaningful explanatory variables. The authors propose corrections to logistic regression that improve estimates of absolute and relative risks and suggest more efficient sampling designs that focus on collecting events and a small fraction of non-events. These corrections, along with efficient sampling, allow scholars to save significant data collection costs or collect more meaningful variables. The paper also discusses the importance of proper statistical methods for handling rare events, including prior correction and weighting, and provides software to implement these methods. The authors demonstrate that these corrections can significantly improve the accuracy of probability estimates and that the optimal trade-off between data collection and variable quality can be achieved through careful sampling and statistical adjustments. The paper emphasizes the need for researchers to consider both statistical and sampling strategies when analyzing rare events data.This paper addresses the challenges of analyzing rare events data, which are binary dependent variables with far fewer ones (events) than zeros. Logistic regression, a common method for such data, often underestimates the probability of rare events. Additionally, data collection strategies for rare events are inefficient, often resulting in large datasets with few meaningful explanatory variables. The authors propose corrections to logistic regression that improve estimates of absolute and relative risks and suggest more efficient sampling designs that focus on collecting events and a small fraction of non-events. These corrections, along with efficient sampling, allow scholars to save significant data collection costs or collect more meaningful variables. The paper also discusses the importance of proper statistical methods for handling rare events, including prior correction and weighting, and provides software to implement these methods. The authors demonstrate that these corrections can significantly improve the accuracy of probability estimates and that the optimal trade-off between data collection and variable quality can be achieved through careful sampling and statistical adjustments. The paper emphasizes the need for researchers to consider both statistical and sampling strategies when analyzing rare events data.
Reach us at info@study.space