VOLUME 11 | OCTOBER 2010 | Jeffrey T. Leek, Robert B. Scharpf, Héctor Corrada Bravo, David Simcha, Benjamin Langmead, W. Evan Johnson, Donald Geman, Keith Baggerly and Rafael A. Irizarry
The article discusses the widespread and critical impact of batch effects in high-throughput data, which are often overlooked but can lead to incorrect conclusions. Batch effects occur when measurements are influenced by laboratory conditions, reagent lots, and personnel differences. These effects can confound biological variables and result in misleading results. The authors review experimental and computational approaches to address batch effects, emphasizing the importance of careful study design, recording of batch information, and statistical methods such as surrogate variable analysis (SVA) and ComBat. They provide examples and analyses from various high-throughput studies to illustrate the prevalence and consequences of batch effects, highlighting the need for consistent reporting and collaboration between laboratory biologists and data analysts to minimize their impact.The article discusses the widespread and critical impact of batch effects in high-throughput data, which are often overlooked but can lead to incorrect conclusions. Batch effects occur when measurements are influenced by laboratory conditions, reagent lots, and personnel differences. These effects can confound biological variables and result in misleading results. The authors review experimental and computational approaches to address batch effects, emphasizing the importance of careful study design, recording of batch information, and statistical methods such as surrogate variable analysis (SVA) and ComBat. They provide examples and analyses from various high-throughput studies to illustrate the prevalence and consequences of batch effects, highlighting the need for consistent reporting and collaboration between laboratory biologists and data analysts to minimize their impact.