3 Nov 2017 | Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, Kilian Q. Weinberger
The paper investigates the tension between minimizing error disparity across different population groups and maintaining calibrated probability estimates in machine learning models. It shows that calibration is compatible only with a single error constraint (equal false-negative rates across groups) and that any algorithm satisfying this relaxation is no better than randomizing a percentage of predictions for an existing classifier. Empirical results on several datasets confirm these findings, indicating that calibration and error-rate fairness are inherently incompatible in most cases. The authors propose a simple post-processing method to achieve this relaxed calibration, but it is unsatisfactory due to the random withholding of predictive information. The paper concludes that maintaining both cost parity and calibration is desirable but often difficult in practice, and that no lower-error solution can be achieved if calibration is required.The paper investigates the tension between minimizing error disparity across different population groups and maintaining calibrated probability estimates in machine learning models. It shows that calibration is compatible only with a single error constraint (equal false-negative rates across groups) and that any algorithm satisfying this relaxation is no better than randomizing a percentage of predictions for an existing classifier. Empirical results on several datasets confirm these findings, indicating that calibration and error-rate fairness are inherently incompatible in most cases. The authors propose a simple post-processing method to achieve this relaxed calibration, but it is unsatisfactory due to the random withholding of predictive information. The paper concludes that maintaining both cost parity and calibration is desirable but often difficult in practice, and that no lower-error solution can be achieved if calibration is required.