June 9-12, 2003, San Diego, CA | Alexandre Evfimievski, Johannes Gehrke, Ramakrishnan Srikant
The paper addresses the problem of privacy breaches in privacy-preserving data mining, particularly when randomization is used to protect individual records. The authors propose a new formulation of privacy breaches and introduce a methodology called "amplification" to limit these breaches without needing to know the distribution of the original data. They apply this methodology to the problem of mining association rules and modify an existing algorithm to limit privacy breaches. To reduce the communication and storage costs associated with long randomized transactions, they use pseudorandom generators to compress the transactions by sending only a seed instead of the full transaction. Finally, they define new information measures that account for privacy breaches when quantifying the amount of privacy preserved by randomization. The paper includes a detailed introduction, definitions, and proofs, as well as practical examples and trade-off charts to illustrate the effectiveness of their approach.The paper addresses the problem of privacy breaches in privacy-preserving data mining, particularly when randomization is used to protect individual records. The authors propose a new formulation of privacy breaches and introduce a methodology called "amplification" to limit these breaches without needing to know the distribution of the original data. They apply this methodology to the problem of mining association rules and modify an existing algorithm to limit privacy breaches. To reduce the communication and storage costs associated with long randomized transactions, they use pseudorandom generators to compress the transactions by sending only a seed instead of the full transaction. Finally, they define new information measures that account for privacy breaches when quantifying the amount of privacy preserved by randomization. The paper includes a detailed introduction, definitions, and proofs, as well as practical examples and trade-off charts to illustrate the effectiveness of their approach.