[slides] The ENCODE Blacklist%3A Identification of Problematic Regions of the Genome

The ENCODE Blacklist identifies problematic genomic regions that produce anomalous or high signal in next-generation sequencing experiments, regardless of cell line or experiment. These regions, which include repetitive sequences, low-mappability areas, and mitochondrial DNA segments, are critical to filter out to avoid biased results in functional genomics analyses. The ENCODE project developed an automated method to systematically flag such regions using input data from ChIP-seq experiments. This method identified regions with high read depth or multi-mapping rates, which are likely artifacts. The blacklist was created for human, mouse, worm, and fly genomes, and it includes regions that were previously manually curated. The blacklist helps remove background noise and ensures accurate interpretation of genomic data. The ENCODE blacklist is essential for quality control in ChIP-seq experiments, as it reduces spurious correlations and improves the biological relevance of results. The blacklist is specific to each genome assembly and should not be lifted from older assemblies. The method requires extensive input sequencing data and considers multiple cell types to avoid over-filtering. The ENCODE blacklists are used to filter all ChIP-seq data from the ENCODE project, and their application improves data accuracy. The blacklist is not a universal solution for all NGS assays but is effective for ChIP-seq, DNase-seq, and ATAC-seq. The blacklist helps identify and remove noise in genomic data, leading to more accurate and stable results. The method is used by the ENCODE project and other established pipelines to improve the accuracy of genomic studies. The blacklist is available for download and is an important resource for genomic research.The ENCODE Blacklist identifies problematic genomic regions that produce anomalous or high signal in next-generation sequencing experiments, regardless of cell line or experiment. These regions, which include repetitive sequences, low-mappability areas, and mitochondrial DNA segments, are critical to filter out to avoid biased results in functional genomics analyses. The ENCODE project developed an automated method to systematically flag such regions using input data from ChIP-seq experiments. This method identified regions with high read depth or multi-mapping rates, which are likely artifacts. The blacklist was created for human, mouse, worm, and fly genomes, and it includes regions that were previously manually curated. The blacklist helps remove background noise and ensures accurate interpretation of genomic data. The ENCODE blacklist is essential for quality control in ChIP-seq experiments, as it reduces spurious correlations and improves the biological relevance of results. The blacklist is specific to each genome assembly and should not be lifted from older assemblies. The method requires extensive input sequencing data and considers multiple cell types to avoid over-filtering. The ENCODE blacklists are used to filter all ChIP-seq data from the ENCODE project, and their application improves data accuracy. The blacklist is not a universal solution for all NGS assays but is effective for ChIP-seq, DNase-seq, and ATAC-seq. The blacklist helps identify and remove noise in genomic data, leading to more accurate and stable results. The method is used by the ENCODE project and other established pipelines to improve the accuracy of genomic studies. The blacklist is available for download and is an important resource for genomic research.

The ENCODE Blacklist: Identification of Problematic Regions of the Genome

27 June 2019 | Haley M. Amemiya¹,², Anshul Kundaje³ & Alan P. Boyle¹,²,⁴