Big Data’s Disparate Impact
Solon Barocas and Andrew D. Selbst argue that while algorithmic techniques like data mining are often praised for eliminating human biases, they can inherit prejudices from prior decision-makers or reflect societal biases. Data mining can also reveal patterns of exclusion and inequality that are not explicitly biased. Unintentional discrimination from data mining is hard to identify or explain, as it is an emergent property of the algorithm rather than a conscious choice by programmers. This essay examines these issues through the lens of Title VII of the U.S. Civil Rights Act, which prohibits employment discrimination. Disparate impact doctrine is a key tool for addressing data mining's effects, but data mining can justify practices as business necessities if they predict future employment outcomes. However, these correlations may reflect historical prejudice or data flaws. Addressing these issues is technically, legally, and politically challenging. Data mining can reproduce existing discrimination, inherit prejudice, or reflect societal biases. It can even exacerbate inequalities by suggesting that historically disadvantaged groups deserve less favorable treatment. Algorithms may exhibit these tendencies even without being manually programmed to do so. Discrimination may be an artifact of the data mining process itself, not a result of programmers assigning inappropriate weight to factors. This possibility has been largely overlooked by scholars and policymakers, who tend to fear concealed intentions or human bias. Because the discrimination is unintentional, even honest attempts to certify the absence of prejudice may wrongly confer impartiality on decisions. The Podesta Report highlights the discriminatory potential of big data but does not detail how it might occur. Existing law largely fails to address discrimination from data mining, as Title VII's disparate impact doctrine may justify data mining practices that reflect historical prejudice. The essay argues that data mining poses significant challenges to antidiscrimination law, requiring a reexamination of the meanings of "discrimination" and "fairness." Data mining can reproduce existing discrimination, inherit prejudice, or reflect societal biases. It can also exacerbate inequalities by suggesting that historically disadvantaged groups deserve less favorable treatment. The essay explores how data mining can lead to discriminatory outcomes through various steps, including defining the target variable, labeling examples, data collection, feature selection, and using proxies. These steps can lead to disproportionately adverse outcomes for protected classes. Data mining can also mask intentional discrimination by using mechanisms like biased data labeling, biased data collection, or using proxies that correlate with protected classes. The essay concludes that data mining poses significant challenges to antidiscrimination law, requiring a reexamination of the meanings of "discrimination" and "fairness."Big Data’s Disparate Impact
Solon Barocas and Andrew D. Selbst argue that while algorithmic techniques like data mining are often praised for eliminating human biases, they can inherit prejudices from prior decision-makers or reflect societal biases. Data mining can also reveal patterns of exclusion and inequality that are not explicitly biased. Unintentional discrimination from data mining is hard to identify or explain, as it is an emergent property of the algorithm rather than a conscious choice by programmers. This essay examines these issues through the lens of Title VII of the U.S. Civil Rights Act, which prohibits employment discrimination. Disparate impact doctrine is a key tool for addressing data mining's effects, but data mining can justify practices as business necessities if they predict future employment outcomes. However, these correlations may reflect historical prejudice or data flaws. Addressing these issues is technically, legally, and politically challenging. Data mining can reproduce existing discrimination, inherit prejudice, or reflect societal biases. It can even exacerbate inequalities by suggesting that historically disadvantaged groups deserve less favorable treatment. Algorithms may exhibit these tendencies even without being manually programmed to do so. Discrimination may be an artifact of the data mining process itself, not a result of programmers assigning inappropriate weight to factors. This possibility has been largely overlooked by scholars and policymakers, who tend to fear concealed intentions or human bias. Because the discrimination is unintentional, even honest attempts to certify the absence of prejudice may wrongly confer impartiality on decisions. The Podesta Report highlights the discriminatory potential of big data but does not detail how it might occur. Existing law largely fails to address discrimination from data mining, as Title VII's disparate impact doctrine may justify data mining practices that reflect historical prejudice. The essay argues that data mining poses significant challenges to antidiscrimination law, requiring a reexamination of the meanings of "discrimination" and "fairness." Data mining can reproduce existing discrimination, inherit prejudice, or reflect societal biases. It can also exacerbate inequalities by suggesting that historically disadvantaged groups deserve less favorable treatment. The essay explores how data mining can lead to discriminatory outcomes through various steps, including defining the target variable, labeling examples, data collection, feature selection, and using proxies. These steps can lead to disproportionately adverse outcomes for protected classes. Data mining can also mask intentional discrimination by using mechanisms like biased data labeling, biased data collection, or using proxies that correlate with protected classes. The essay concludes that data mining poses significant challenges to antidiscrimination law, requiring a reexamination of the meanings of "discrimination" and "fairness."