2013 January 1 | Todd J. Treangen and Steven L. Salzberg
The article discusses the challenges and computational solutions related to repetitive DNA sequences in next-generation sequencing (NGS) projects. Repetitive DNA, which is abundant in genomes from bacteria to mammals, poses significant technical difficulties for sequence alignment and assembly. NGS technologies, with their short read lengths and high data volumes, exacerbate these challenges. The authors highlight that ignoring repeats is not a viable solution as it can lead to the loss of important biological information. They review the computational problems associated with repeats and describe strategies used by current bioinformatics systems to address them. These strategies include focusing on uniquely mapped reads, using paired-end information, and developing sophisticated models for gene expression estimation. The article also discusses the impact of repeats on genome resequencing projects, *de novo* genome assembly, and RNA-seq analysis. It emphasizes the importance of longer read lengths and advanced computational methods to improve the accuracy of repeat handling in NGS data.The article discusses the challenges and computational solutions related to repetitive DNA sequences in next-generation sequencing (NGS) projects. Repetitive DNA, which is abundant in genomes from bacteria to mammals, poses significant technical difficulties for sequence alignment and assembly. NGS technologies, with their short read lengths and high data volumes, exacerbate these challenges. The authors highlight that ignoring repeats is not a viable solution as it can lead to the loss of important biological information. They review the computational problems associated with repeats and describe strategies used by current bioinformatics systems to address them. These strategies include focusing on uniquely mapped reads, using paired-end information, and developing sophisticated models for gene expression estimation. The article also discusses the impact of repeats on genome resequencing projects, *de novo* genome assembly, and RNA-seq analysis. It emphasizes the importance of longer read lengths and advanced computational methods to improve the accuracy of repeat handling in NGS data.