A data-driven approach to preprocessing Illumina 450K methylation array data

A data-driven approach to preprocessing Illumina 450K methylation array data

2013 | Ruth Pidsley, Chloe C Y Wong, Manuela Volta, Katie Lunnon, Jonathan Mill and Leonard C Schalkwyk
This paper presents a data-driven approach to preprocessing Illumina 450K methylation array data, focusing on improving the detection of small absolute DNA methylation changes associated with complex diseases. The authors utilize known DNA methylation patterns, such as those related to genomic imprinting and X-chromosome inactivation, along with SNP genotyping assays on the array, to derive three independent metrics for evaluating alternative correction and normalization schemes. They find that quantile normalization methods, particularly those that separately normalize methylated (M) and unmethylated (U) intensities, significantly improve the performance of the data, even in highly consistent datasets. The commonly used procedure of normalizing betas is found to be inferior. The authors also introduce a user-friendly R software package called wateMelon, which can be used to apply these normalization methods and data quality tests on 450K data. The study concludes that careful selection of preprocessing steps can minimize variance and enhance statistical power, making it more effective for detecting subtle differences in DNA methylation.This paper presents a data-driven approach to preprocessing Illumina 450K methylation array data, focusing on improving the detection of small absolute DNA methylation changes associated with complex diseases. The authors utilize known DNA methylation patterns, such as those related to genomic imprinting and X-chromosome inactivation, along with SNP genotyping assays on the array, to derive three independent metrics for evaluating alternative correction and normalization schemes. They find that quantile normalization methods, particularly those that separately normalize methylated (M) and unmethylated (U) intensities, significantly improve the performance of the data, even in highly consistent datasets. The commonly used procedure of normalizing betas is found to be inferior. The authors also introduce a user-friendly R software package called wateMelon, which can be used to apply these normalization methods and data quality tests on 450K data. The study concludes that careful selection of preprocessing steps can minimize variance and enhance statistical power, making it more effective for detecting subtle differences in DNA methylation.
Reach us at info@study.space