This thesis by Benjamin C. M. Fung, titled "Privacy-Preserving Data Publishing," focuses on the challenges and solutions for protecting individual privacy while ensuring the usefulness of released data for data mining. The author identifies privacy threats in real-life data publishing scenarios and proposes a unified anonymization algorithm to address these threats. The thesis is structured into several chapters, each addressing different aspects of privacy-preserving data publishing:
1. **Introduction**: Discusses the importance of privacy in data mining and the need for effective information sharing. It outlines the privacy threats and the goal of the thesis, which is to prevent linking attacks while preserving useful information.
2. **Related Works**: Reviews existing privacy models in privacy-preserving data publishing (PPDP) and compares them with privacy-preserving data mining (PPDM). It highlights the differences between PPDP and PPDM, emphasizing that PPDP focuses on data publication rather than data mining results.
3. **The Preliminaries**: Defines key notations and operations, such as masking and refinement operations, which are used to transform data to meet privacy requirements.
4. **Anonymizing Classification Data**: Introduces a top-down refinement (TDR) algorithmic framework to efficiently identify a k-anonymous solution that preserves classification structures. The framework is designed to handle different types of attributes and minimize distortion to the data.
5. **Confidence Bounding**: Proposes a new privacy notion and an anonymization algorithm to limit the usefulness of sensitive inferences derived from the released data, even in the presence of data mining algorithms.
6. **Anonymizing Sequential Releases**: Addresses the challenge of anonymizing multiple releases to prevent linking individual record holders across different releases while maintaining the usefulness of each release for its specific purpose.
7. **Secure Data Integration**: Discusses the problem of integrating private databases from multiple publishers to achieve a common goal while satisfying privacy requirements.
The thesis evaluates the proposed algorithms in terms of privacy protection, data quality, applicability to real-life databases, efficiency, and scalability. It provides a comprehensive framework for privacy-preserving data publishing, aiming to bridge the gap between data privacy and data utility.This thesis by Benjamin C. M. Fung, titled "Privacy-Preserving Data Publishing," focuses on the challenges and solutions for protecting individual privacy while ensuring the usefulness of released data for data mining. The author identifies privacy threats in real-life data publishing scenarios and proposes a unified anonymization algorithm to address these threats. The thesis is structured into several chapters, each addressing different aspects of privacy-preserving data publishing:
1. **Introduction**: Discusses the importance of privacy in data mining and the need for effective information sharing. It outlines the privacy threats and the goal of the thesis, which is to prevent linking attacks while preserving useful information.
2. **Related Works**: Reviews existing privacy models in privacy-preserving data publishing (PPDP) and compares them with privacy-preserving data mining (PPDM). It highlights the differences between PPDP and PPDM, emphasizing that PPDP focuses on data publication rather than data mining results.
3. **The Preliminaries**: Defines key notations and operations, such as masking and refinement operations, which are used to transform data to meet privacy requirements.
4. **Anonymizing Classification Data**: Introduces a top-down refinement (TDR) algorithmic framework to efficiently identify a k-anonymous solution that preserves classification structures. The framework is designed to handle different types of attributes and minimize distortion to the data.
5. **Confidence Bounding**: Proposes a new privacy notion and an anonymization algorithm to limit the usefulness of sensitive inferences derived from the released data, even in the presence of data mining algorithms.
6. **Anonymizing Sequential Releases**: Addresses the challenge of anonymizing multiple releases to prevent linking individual record holders across different releases while maintaining the usefulness of each release for its specific purpose.
7. **Secure Data Integration**: Discusses the problem of integrating private databases from multiple publishers to achieve a common goal while satisfying privacy requirements.
The thesis evaluates the proposed algorithms in terms of privacy protection, data quality, applicability to real-life databases, efficiency, and scalability. It provides a comprehensive framework for privacy-preserving data publishing, aiming to bridge the gap between data privacy and data utility.