[slides and audio] Data preprocessing techniques for classification without discrimination

The paper "Data Preprocessing Techniques for Classification Without Discrimination" by Faisal Kamiran and Toon Calders addresses the problem of learning classifiers that optimize accuracy while avoiding discrimination based on sensitive attributes such as gender or ethnicity. The authors focus on binary classification problems with a single binary sensitive attribute and present theoretical and algorithmic solutions to preprocess data to remove discrimination before classifier learning. The paper begins by motivating the need for discrimination-aware classification, highlighting scenarios where historical data is biased due to unfair treatment or where sensitive attributes serve as proxies for unobserved features. It also discusses relevant anti-discrimination laws, emphasizing the legal and ethical implications of using discriminatory data. The authors define a measure for discrimination as the difference in the probability of being assigned the positive class between groups defined by the sensitive attribute. They introduce the Discrimination-Aware Classification Problem, which aims to learn classifiers that minimize discrimination while maintaining high accuracy. The paper then delves into the theoretical analysis of the trade-off between accuracy and discrimination, showing that reducing discrimination can lead to a decrease in accuracy. This trade-off is linear for perfect classifiers and can be non-linear for imperfect classifiers, depending on the classifier's scoring mechanism. Three data preprocessing techniques are proposed to remove discrimination from the training dataset: 1. **Massaging the Data**: This method changes class labels to reduce discrimination while maintaining the overall class distribution. 2. **Reweighing**: This method assigns weights to data objects to make the dataset discrimination-free without changing labels. 3. **Sampling**: This method resamples the dataset to remove discrimination without relabeling instances. The effectiveness of these techniques is evaluated through experiments on real-life datasets, demonstrating that they can effectively reduce discrimination with minimal loss in accuracy. The paper concludes by discussing the limitations and future directions for improving discrimination-aware classification.The paper "Data Preprocessing Techniques for Classification Without Discrimination" by Faisal Kamiran and Toon Calders addresses the problem of learning classifiers that optimize accuracy while avoiding discrimination based on sensitive attributes such as gender or ethnicity. The authors focus on binary classification problems with a single binary sensitive attribute and present theoretical and algorithmic solutions to preprocess data to remove discrimination before classifier learning. The paper begins by motivating the need for discrimination-aware classification, highlighting scenarios where historical data is biased due to unfair treatment or where sensitive attributes serve as proxies for unobserved features. It also discusses relevant anti-discrimination laws, emphasizing the legal and ethical implications of using discriminatory data. The authors define a measure for discrimination as the difference in the probability of being assigned the positive class between groups defined by the sensitive attribute. They introduce the Discrimination-Aware Classification Problem, which aims to learn classifiers that minimize discrimination while maintaining high accuracy. The paper then delves into the theoretical analysis of the trade-off between accuracy and discrimination, showing that reducing discrimination can lead to a decrease in accuracy. This trade-off is linear for perfect classifiers and can be non-linear for imperfect classifiers, depending on the classifier's scoring mechanism. Three data preprocessing techniques are proposed to remove discrimination from the training dataset: 1. **Massaging the Data**: This method changes class labels to reduce discrimination while maintaining the overall class distribution. 2. **Reweighing**: This method assigns weights to data objects to make the dataset discrimination-free without changing labels. 3. **Sampling**: This method resamples the dataset to remove discrimination without relabeling instances. The effectiveness of these techniques is evaluated through experiments on real-life datasets, demonstrating that they can effectively reduce discrimination with minimal loss in accuracy. The paper concludes by discussing the limitations and future directions for improving discrimination-aware classification.

Data preprocessing techniques for classification without discrimination

23 November 2010 / Revised: 23 August 2011 / Accepted: 16 November 2011 / Published online: 3 December 2011 | Faisal Kamiran · Toon Calders