October 12–16, 2015, Denver, Colorado, USA | Matt Fredrikson, Somesh Jha, Thomas Ristenpart
This paper explores model inversion attacks that exploit confidence values revealed alongside predictions in machine learning (ML) models. The authors develop a new class of attacks applicable to various settings, focusing on decision trees for lifestyle surveys and neural networks for facial recognition. They demonstrate that these attacks can estimate sensitive information, such as marital infidelity or pornographic viewing habits, and recover recognizable images of people's faces from their names and access to ML models. The paper also investigates countermeasures, showing that simple mechanisms like considering sensitive features during decision tree training and rounding confidence values can significantly reduce the effectiveness of these attacks. The findings highlight the need for future research on ML-resistant countermeasures to protect against model inversion attacks.This paper explores model inversion attacks that exploit confidence values revealed alongside predictions in machine learning (ML) models. The authors develop a new class of attacks applicable to various settings, focusing on decision trees for lifestyle surveys and neural networks for facial recognition. They demonstrate that these attacks can estimate sensitive information, such as marital infidelity or pornographic viewing habits, and recover recognizable images of people's faces from their names and access to ML models. The paper also investigates countermeasures, showing that simple mechanisms like considering sensitive features during decision tree training and rounding confidence values can significantly reduce the effectiveness of these attacks. The findings highlight the need for future research on ML-resistant countermeasures to protect against model inversion attacks.