This paper introduces data preprocessing techniques for classification without discrimination. The authors address the Discrimination-Aware Classification Problem, where the goal is to learn a classifier that is accurate but does not exhibit discrimination in its predictions. The problem is relevant in scenarios where data are generated by biased processes or when sensitive attributes act as proxies for unobserved features. The paper focuses on the case with one binary sensitive attribute and a two-class classification problem.
The authors first study the theoretically optimal trade-off between accuracy and non-discrimination for pure classifiers. They then present algorithmic solutions that preprocess the data to remove discrimination before learning a classifier. The techniques include suppression of the sensitive attribute, massaging the dataset by changing class labels, and reweighing or resampling the data to remove discrimination without relabeling instances. These techniques have been implemented in a modified version of Weka, and the results of experiments on real-life data are presented.
The paper discusses three scenarios where discrimination-aware classification is needed: historical discrimination, multiple data sources, and sensitive attributes as proxies. It also discusses anti-discrimination legislation, such as the Australian Sex Discrimination Act, the US Equal Pay Act, and the US Equal Credit Opportunity Act, which prohibit discrimination based on sensitive attributes.
The paper formally defines a measure for discrimination and introduces the Discrimination-Aware Classification Problem. It then theoretically analyzes the trade-off between accuracy and discrimination, showing that lowering discrimination can reduce accuracy. The paper also discusses the use of rankers and the impact of different classification methods on the trade-off between accuracy and discrimination.
The paper proposes three data preprocessing techniques: massaging the data, reweighing, and sampling. These techniques aim to remove discrimination from the training data before learning a classifier. The results of experiments on real-life data show that these techniques can effectively reduce discrimination while maintaining accuracy. The paper concludes that these techniques are essential for ensuring that classifiers are fair and do not discriminate based on sensitive attributes.This paper introduces data preprocessing techniques for classification without discrimination. The authors address the Discrimination-Aware Classification Problem, where the goal is to learn a classifier that is accurate but does not exhibit discrimination in its predictions. The problem is relevant in scenarios where data are generated by biased processes or when sensitive attributes act as proxies for unobserved features. The paper focuses on the case with one binary sensitive attribute and a two-class classification problem.
The authors first study the theoretically optimal trade-off between accuracy and non-discrimination for pure classifiers. They then present algorithmic solutions that preprocess the data to remove discrimination before learning a classifier. The techniques include suppression of the sensitive attribute, massaging the dataset by changing class labels, and reweighing or resampling the data to remove discrimination without relabeling instances. These techniques have been implemented in a modified version of Weka, and the results of experiments on real-life data are presented.
The paper discusses three scenarios where discrimination-aware classification is needed: historical discrimination, multiple data sources, and sensitive attributes as proxies. It also discusses anti-discrimination legislation, such as the Australian Sex Discrimination Act, the US Equal Pay Act, and the US Equal Credit Opportunity Act, which prohibit discrimination based on sensitive attributes.
The paper formally defines a measure for discrimination and introduces the Discrimination-Aware Classification Problem. It then theoretically analyzes the trade-off between accuracy and discrimination, showing that lowering discrimination can reduce accuracy. The paper also discusses the use of rankers and the impact of different classification methods on the trade-off between accuracy and discrimination.
The paper proposes three data preprocessing techniques: massaging the data, reweighing, and sampling. These techniques aim to remove discrimination from the training data before learning a classifier. The results of experiments on real-life data show that these techniques can effectively reduce discrimination while maintaining accuracy. The paper concludes that these techniques are essential for ensuring that classifiers are fair and do not discriminate based on sensitive attributes.