2008 September | DM Roden, JM Pulley, MA Basford, GR Bernard, EW Clayton, JR Balser, DR Masys
The article describes the development of a large-scale de-identified DNA biobank linked to electronic medical record (EMR) data to support personalized medicine. The biobank was created using blood samples discarded during clinical procedures, with the samples de-identified and linked to a synthetic derivative (SD) of the EMR. The project used an "opt-out" model, where patients were not required to explicitly consent but could opt out if they wished. Surveys indicated general acceptance of the concept, with only a small minority opposing it. The project involved extensive ethical and community input, and algorithms were developed to ensure accurate de-identification with low error rates. The biobank has a sample accrual rate of 700–900 samples per week, with over 33,000 samples collected by April 2008. The SD contains de-identified EMR data, including diagnoses, medications, and procedures, and is used for genomic research. The biobank provides a large, diverse dataset for studying genotype-phenotype relationships. The project also includes a data use agreement to ensure ethical use of the data. The biobank is considered a limited dataset under HIPAA regulations and is used for research with approved queries. The project highlights the importance of de-identification in protecting patient privacy while enabling large-scale genomic research. The study also discusses the advantages and limitations of the opt-out model, including the potential for re-identification and the need for ongoing ethical oversight. The project demonstrates the feasibility of using de-identified data for personalized medicine and provides a framework for future research in this area.The article describes the development of a large-scale de-identified DNA biobank linked to electronic medical record (EMR) data to support personalized medicine. The biobank was created using blood samples discarded during clinical procedures, with the samples de-identified and linked to a synthetic derivative (SD) of the EMR. The project used an "opt-out" model, where patients were not required to explicitly consent but could opt out if they wished. Surveys indicated general acceptance of the concept, with only a small minority opposing it. The project involved extensive ethical and community input, and algorithms were developed to ensure accurate de-identification with low error rates. The biobank has a sample accrual rate of 700–900 samples per week, with over 33,000 samples collected by April 2008. The SD contains de-identified EMR data, including diagnoses, medications, and procedures, and is used for genomic research. The biobank provides a large, diverse dataset for studying genotype-phenotype relationships. The project also includes a data use agreement to ensure ethical use of the data. The biobank is considered a limited dataset under HIPAA regulations and is used for research with approved queries. The project highlights the importance of de-identification in protecting patient privacy while enabling large-scale genomic research. The study also discusses the advantages and limitations of the opt-out model, including the potential for re-identification and the need for ongoing ethical oversight. The project demonstrates the feasibility of using de-identified data for personalized medicine and provides a framework for future research in this area.