[slides] Summarizing and correcting the GC content bias in high-throughput sequencing

This supplementary information provides detailed analysis and methods to address GC-content bias in high-throughput sequencing data. The authors, Yuval Benjamini and Terence P. Speed, explore various aspects of GC bias, including its impact on sequencing quality, strand-specific effects, and correction methods. 1. **GC Content Curves and Library Comparison**: Figure 1 shows GC curves for different chromosomes, highlighting that while the curves are similar, they differ between libraries, indicating potential biases in library preparation. 2. **GC Effect on Strands**: Figure 2 illustrates the TV scores of GC windows on both forward and reverse strands, demonstrating that the GC effect is more pronounced on the reverse strand after inversion. 3. **Corrections at Different Resolutions**: Figure 3 presents corrections at different resolutions, showing how fragment GC models are used to predict and correct CN estimates. 4. **Sequencing Quality and GC Bias**: The mapping quality score is used to assess sequencing biases. Low-quality fragments are strongly associated with high GC content, suggesting challenges in sequencing such regions. 5. **ChIP-seq Data Analysis**: Figures S5 and S6 re-analyze two datasets to compare PCR-free and optimized PCR protocols for reducing GC bias. The results show that both methods improve GC correction. 6. **ChIP-seq Technical Replicates**: Figures S7 and S8 analyze background counts from technical replicates of ChIP-seq samples, demonstrating the importance of single-sample correction for GC biases. 7. **Comparison with BEADS Method**: Figures S9 and S10 compare the fragment model with BEADS, a bias correction method. The fragment model and BEADS produce similar corrected counts in high-mappability bins but show more variability in bins with lower mappability. 8. **Supplementary Methods**: Detailed descriptions of two alternative models (two-ends and fragmentation models) are provided, explaining how they predict fragment counts based on GC content and position-specific biases. Overall, the supplementary information provides a comprehensive overview of the methods and results used to address and correct GC-content bias in high-throughput sequencing data.This supplementary information provides detailed analysis and methods to address GC-content bias in high-throughput sequencing data. The authors, Yuval Benjamini and Terence P. Speed, explore various aspects of GC bias, including its impact on sequencing quality, strand-specific effects, and correction methods. 1. **GC Content Curves and Library Comparison**: Figure 1 shows GC curves for different chromosomes, highlighting that while the curves are similar, they differ between libraries, indicating potential biases in library preparation. 2. **GC Effect on Strands**: Figure 2 illustrates the TV scores of GC windows on both forward and reverse strands, demonstrating that the GC effect is more pronounced on the reverse strand after inversion. 3. **Corrections at Different Resolutions**: Figure 3 presents corrections at different resolutions, showing how fragment GC models are used to predict and correct CN estimates. 4. **Sequencing Quality and GC Bias**: The mapping quality score is used to assess sequencing biases. Low-quality fragments are strongly associated with high GC content, suggesting challenges in sequencing such regions. 5. **ChIP-seq Data Analysis**: Figures S5 and S6 re-analyze two datasets to compare PCR-free and optimized PCR protocols for reducing GC bias. The results show that both methods improve GC correction. 6. **ChIP-seq Technical Replicates**: Figures S7 and S8 analyze background counts from technical replicates of ChIP-seq samples, demonstrating the importance of single-sample correction for GC biases. 7. **Comparison with BEADS Method**: Figures S9 and S10 compare the fragment model with BEADS, a bias correction method. The fragment model and BEADS produce similar corrected counts in high-mappability bins but show more variability in bins with lower mappability. 8. **Supplementary Methods**: Detailed descriptions of two alternative models (two-ends and fragmentation models) are provided, explaining how they predict fragment counts based on GC content and position-specific biases. Overall, the supplementary information provides a comprehensive overview of the methods and results used to address and correct GC-content bias in high-throughput sequencing data.

Supplementary Information: Estimation and correction for GC-content bias in high throughput sequencing

November 18, 2011 | Yuval Benjamini and Terence P. Speed