A data-driven approach to preprocessing Illumina 450K methylation array data

A data-driven approach to preprocessing Illumina 450K methylation array data

2013 | Ruth Pidsley, Chloe C Y Wong, Manuela Volta, Katie Lunnon, Jonathan Mill, Leonard C Schalkwyk
This article presents a data-driven approach to preprocessing Illumina 450K methylation array data. The study addresses the challenges of processing data from two different assay types (Type I and Type II) on the same array, which can lead to inconsistent results. The authors developed three metrics based on known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), as well as SNP genotyping assays, to evaluate different preprocessing and normalization methods. The standard β value, calculated as β = M/(M + U + 100), is used to represent DNA methylation levels. However, the study found that quantile normalization (QN) significantly improves the performance of β values, even in highly consistent data. The authors also found that normalizing M and U separately, rather than β, is more effective, and that separating Type I and Type II assays for normalization is beneficial. The study evaluated 11 datasets (total n=696) and found that the 'dasen' method, which involves background adjustment and separate quantile normalization of methylated and unmethylated intensities for both Type I and Type II probes, performs best. This method reduces variance and improves the detection of small DNA methylation changes, which are likely associated with complex disease phenotypes. The authors developed a user-friendly R package called wateRmelon, compatible with existing methylumi, minfi, and IMA packages, to facilitate the use of these normalization methods and data quality tests on 450K data. The study highlights the importance of careful preprocessing steps to minimize variance and improve statistical power, especially for detecting subtle DNA methylation changes. The results suggest that quantile normalization, combined with background adjustment, is optimal for processing 450K methylation data.This article presents a data-driven approach to preprocessing Illumina 450K methylation array data. The study addresses the challenges of processing data from two different assay types (Type I and Type II) on the same array, which can lead to inconsistent results. The authors developed three metrics based on known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), as well as SNP genotyping assays, to evaluate different preprocessing and normalization methods. The standard β value, calculated as β = M/(M + U + 100), is used to represent DNA methylation levels. However, the study found that quantile normalization (QN) significantly improves the performance of β values, even in highly consistent data. The authors also found that normalizing M and U separately, rather than β, is more effective, and that separating Type I and Type II assays for normalization is beneficial. The study evaluated 11 datasets (total n=696) and found that the 'dasen' method, which involves background adjustment and separate quantile normalization of methylated and unmethylated intensities for both Type I and Type II probes, performs best. This method reduces variance and improves the detection of small DNA methylation changes, which are likely associated with complex disease phenotypes. The authors developed a user-friendly R package called wateRmelon, compatible with existing methylumi, minfi, and IMA packages, to facilitate the use of these normalization methods and data quality tests on 450K data. The study highlights the importance of careful preprocessing steps to minimize variance and improve statistical power, especially for detecting subtle DNA methylation changes. The results suggest that quantile normalization, combined with background adjustment, is optimal for processing 450K methylation data.
Reach us at info@study.space
[slides] A data-driven approach to preprocessing Illumina 450K methylation array data | StudySpace