Accurate detection of complex structural variations using single molecule sequencing

Accurate detection of complex structural variations using single molecule sequencing

2018 June ; 15(6): 461–468. doi:10.1038/s41592-018-0001-7 | Fritz J. Sedlazeck, Philipp Rescheneder, Moritz Smolka, Han Fang, Maria Nattestad, Arndt von Haeseler, and Michael C. Schatz
The paper introduces two open-source methods, NGMLR and Sniffles, for long-read alignment and structural variation (SV) identification, respectively. These methods address the limitations of short-read sequencing in detecting SVs, which often miss up to 90% of SVs and have high false positive rates. NGMLR is a fast and accurate aligner for long-reads, using a convex gap-cost scoring model to align reads spanning SV breakpoints. Sniffles, in conjunction with NGMLR, identifies all types of SVs (indels, duplications, inversions, translocations, and nested events) by scanning and clustering alignments. The methods are evaluated on simulated and real datasets, demonstrating superior performance in SV detection compared to existing tools. They can detect thousands of novel variants in healthy and cancerous human genomes, including complex nested events that are poorly studied but associated with diseases. The paper also highlights systematic errors in short-read approaches, such as false translocations due to mis-mapped reads. Finally, the methods show that high accuracy can be achieved with only 15x to 30x coverage, making long-read sequencing more feasible for large-scale applications.The paper introduces two open-source methods, NGMLR and Sniffles, for long-read alignment and structural variation (SV) identification, respectively. These methods address the limitations of short-read sequencing in detecting SVs, which often miss up to 90% of SVs and have high false positive rates. NGMLR is a fast and accurate aligner for long-reads, using a convex gap-cost scoring model to align reads spanning SV breakpoints. Sniffles, in conjunction with NGMLR, identifies all types of SVs (indels, duplications, inversions, translocations, and nested events) by scanning and clustering alignments. The methods are evaluated on simulated and real datasets, demonstrating superior performance in SV detection compared to existing tools. They can detect thousands of novel variants in healthy and cancerous human genomes, including complex nested events that are poorly studied but associated with diseases. The paper also highlights systematic errors in short-read approaches, such as false translocations due to mis-mapped reads. Finally, the methods show that high accuracy can be achieved with only 15x to 30x coverage, making long-read sequencing more feasible for large-scale applications.
Reach us at info@study.space