Limiting Privacy Breaches in Privacy Preserving Data Mining

Limiting Privacy Breaches in Privacy Preserving Data Mining

June 9-12, 2003, San Diego, CA | Alexandre Evfimievski, Johannes Gehrke, Ramakrishnan Srikant
This paper addresses the problem of privacy breaches in privacy-preserving data mining, where the goal is to build accurate data mining models over aggregate data while protecting individual privacy. The authors propose a new approach to limit privacy breaches, called "amplification," which allows for privacy guarantees without knowledge of the original data distribution. They apply this methodology to the problem of mining association rules, modifying the algorithm to limit privacy breaches without knowing the data distribution. They also address the issue of long transactions resulting from randomization, proposing a method to compress randomized transactions using pseudorandom generators and carefully chosen seeds. This approach significantly reduces communication and storage costs. Finally, they define new information measures that take privacy breaches into account when quantifying the amount of privacy preserved by randomization. The paper also introduces a formal definition of privacy breaches and shows how amplification can be used to limit them. The authors demonstrate that privacy breaches can occur even when mutual information is small, and propose new information-theoretical measures called "worst-case information" that bound privacy breaches. The paper concludes with a summary and directions for future work.This paper addresses the problem of privacy breaches in privacy-preserving data mining, where the goal is to build accurate data mining models over aggregate data while protecting individual privacy. The authors propose a new approach to limit privacy breaches, called "amplification," which allows for privacy guarantees without knowledge of the original data distribution. They apply this methodology to the problem of mining association rules, modifying the algorithm to limit privacy breaches without knowing the data distribution. They also address the issue of long transactions resulting from randomization, proposing a method to compress randomized transactions using pseudorandom generators and carefully chosen seeds. This approach significantly reduces communication and storage costs. Finally, they define new information measures that take privacy breaches into account when quantifying the amount of privacy preserved by randomization. The paper also introduces a formal definition of privacy breaches and shows how amplification can be used to limit them. The authors demonstrate that privacy breaches can occur even when mutual information is small, and propose new information-theoretical measures called "worst-case information" that bound privacy breaches. The paper concludes with a summary and directions for future work.
Reach us at info@study.space