3 Nov 2017 | Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, Kilian Q. Weinberger
The paper investigates the tension between minimizing error disparity across different population groups and maintaining calibrated probability estimates in machine learning. It shows that calibration is compatible only with a single error constraint, such as equal false-negative rates across groups. Any algorithm satisfying this relaxation is no better than randomizing a percentage of predictions for an existing classifier. The authors demonstrate that calibration and error-rate fairness are inherently at odds, even beyond previous results. They propose a post-processing algorithm to achieve a calibrated relaxation of Equalized Odds by withholding predictive information for randomly chosen inputs. However, this method is fundamentally unsatisfactory as it results in non-trivial portions of predictions being withheld, which practitioners may object to in sensitive settings. The paper also shows that achieving multiple equal-cost constraints is infeasible, and that calibration and error-rate constraints are often mutually incompatible goals. Empirical experiments on several datasets confirm these findings, showing that calibration and error-rate fairness are often incompatible in practice. The paper concludes that maintaining cost parity and calibration is desirable yet often difficult in practice, and that no lower-error solution can be achieved if calibration is required.The paper investigates the tension between minimizing error disparity across different population groups and maintaining calibrated probability estimates in machine learning. It shows that calibration is compatible only with a single error constraint, such as equal false-negative rates across groups. Any algorithm satisfying this relaxation is no better than randomizing a percentage of predictions for an existing classifier. The authors demonstrate that calibration and error-rate fairness are inherently at odds, even beyond previous results. They propose a post-processing algorithm to achieve a calibrated relaxation of Equalized Odds by withholding predictive information for randomly chosen inputs. However, this method is fundamentally unsatisfactory as it results in non-trivial portions of predictions being withheld, which practitioners may object to in sensitive settings. The paper also shows that achieving multiple equal-cost constraints is infeasible, and that calibration and error-rate constraints are often mutually incompatible goals. Empirical experiments on several datasets confirm these findings, showing that calibration and error-rate fairness are often incompatible in practice. The paper concludes that maintaining cost parity and calibration is desirable yet often difficult in practice, and that no lower-error solution can be achieved if calibration is required.