Understanding Calibrating Noise to Sensitivity in Private Data Analysis

This paper extends the research on privacy-preserving statistical databases by focusing on general functions and calibrating the standard deviation of the noise according to the sensitivity of the function. The authors introduce a new notion of privacy leakage, $\epsilon$-indistinguishability, which measures the indistinguishability of transcripts. They prove that for any function $f$, adding noise according to a Laplace distribution with standard deviation $S(f)/\epsilon$ ensures $\epsilon$-indistinguishability, where $S(f)$ is the sensitivity of the function. This calibration allows for significantly less noise compared to previous work on noisy sums. The paper also discusses the limitations of non-interactive mechanisms, showing that they require large databases to approximate low-sensitivity functions. Additionally, it provides examples of functions with low sensitivity, such as histograms and distance from a set, and demonstrates the effectiveness of the proposed approach in these cases. The authors further extend the analysis to vector-valued functions and adaptively chosen query functions, proving that the mechanism remains $\epsilon$-indistinguishable. Finally, they present a strong separation between interactive and non-interactive mechanisms, showing that non-interactive mechanisms need very large databases to answer certain queries accurately.This paper extends the research on privacy-preserving statistical databases by focusing on general functions and calibrating the standard deviation of the noise according to the sensitivity of the function. The authors introduce a new notion of privacy leakage, $\epsilon$-indistinguishability, which measures the indistinguishability of transcripts. They prove that for any function $f$, adding noise according to a Laplace distribution with standard deviation $S(f)/\epsilon$ ensures $\epsilon$-indistinguishability, where $S(f)$ is the sensitivity of the function. This calibration allows for significantly less noise compared to previous work on noisy sums. The paper also discusses the limitations of non-interactive mechanisms, showing that they require large databases to approximate low-sensitivity functions. Additionally, it provides examples of functions with low sensitivity, such as histograms and distance from a set, and demonstrates the effectiveness of the proposed approach in these cases. The authors further extend the analysis to vector-valued functions and adaptively chosen query functions, proving that the mechanism remains $\epsilon$-indistinguishable. Finally, they present a strong separation between interactive and non-interactive mechanisms, showing that non-interactive mechanisms need very large databases to answer certain queries accurately.

Calibrating Noise to Sensitivity in Private Data Analysis

2006 | Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith