2 January 2024 | Moritz Smolka, Luis F. Paulin, Christopher M. Grochowski, Dominic W. Horner, Medhat Mahmoud, Sairam Behera, Ester Kalef-Ezra, Mira Gandhi, Karl Hong, Davut Pehlivan, Sonja W. Scholz, Claudia M. B. Carvalho, Christos Proukakis & Fritz J. Sedlazeck
Sniffles2 is a new tool for detecting structural variations (SVs) in long-read sequencing data, offering improved accuracy and speed compared to existing methods. It uses repeat-aware clustering, fast consensus sequence generation, and coverage-adaptive filtering to enhance SV detection. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across various coverage levels, sequencing technologies, and SV types. It enables the detection of mosaic and population-level SVs, producing fully genotyped VCF files. Sniffles2 was tested on 11 probands, accurately identifying causative SVs around the MECP2 gene, including complex alleles with overlapping SVs. It also detected mosaic SVs in brain tissue from a patient with multiple system atrophy, showing diversity in the cingulate cortex affecting genes involved in neuron function and repetitive elements.
SVs are defined as genomic alterations of 50 base pairs or larger, including insertions, deletions, duplications, inversions, and translocations. They play a significant role in speciation, plant biology, and human diseases, including Mendelian and complex diseases, and cancer. Despite their importance, detecting germline and somatic SVs remains challenging, especially for insertions, which account for half of all SVs in the human genome. Long-read sequencing has advanced significantly, with decreasing error rates and increasing use in medical applications. Sniffles2 addresses the need for efficient software to detect, merge, and produce fully genotyped VCF files. It enables population-scale SV calling and is open-source, available at https://github.com/fritzsedlazeck/Sniffles.
Sniffles2's performance was evaluated against other SV callers using various benchmark sets, showing superior speed and accuracy. It can detect complex SVs, including those with multiple breakpoints, and mosaic SVs in bulk long-read data. It was applied to 31 ONT datasets representing Mendelian disorders, detecting SVs with high accuracy. Sniffles2 also identified mosaic SVs in a patient with multiple system atrophy, validated by PCR and Sanger sequencing. It outperformed other methods in detecting low-frequency SVs and provided accurate results compared to Illumina and Bionano optical genome mapping. Sniffles2 is a versatile tool for detecting SVs in various contexts, including cancer and neurodegenerative diseases, and is suitable for diploid, haploid, and polyploid organisms. It addresses challenges in SV calling, including the n + 1 problem and the need for accurate reporting of complex alleles. Overall, Sniffles2 represents a significant advancement in SV detection, offering improved accuracy and efficiency for long-read sequencing data.Sniffles2 is a new tool for detecting structural variations (SVs) in long-read sequencing data, offering improved accuracy and speed compared to existing methods. It uses repeat-aware clustering, fast consensus sequence generation, and coverage-adaptive filtering to enhance SV detection. Sniffles2 is 11.8 times faster and 29% more accurate than state-of-the-art SV callers across various coverage levels, sequencing technologies, and SV types. It enables the detection of mosaic and population-level SVs, producing fully genotyped VCF files. Sniffles2 was tested on 11 probands, accurately identifying causative SVs around the MECP2 gene, including complex alleles with overlapping SVs. It also detected mosaic SVs in brain tissue from a patient with multiple system atrophy, showing diversity in the cingulate cortex affecting genes involved in neuron function and repetitive elements.
SVs are defined as genomic alterations of 50 base pairs or larger, including insertions, deletions, duplications, inversions, and translocations. They play a significant role in speciation, plant biology, and human diseases, including Mendelian and complex diseases, and cancer. Despite their importance, detecting germline and somatic SVs remains challenging, especially for insertions, which account for half of all SVs in the human genome. Long-read sequencing has advanced significantly, with decreasing error rates and increasing use in medical applications. Sniffles2 addresses the need for efficient software to detect, merge, and produce fully genotyped VCF files. It enables population-scale SV calling and is open-source, available at https://github.com/fritzsedlazeck/Sniffles.
Sniffles2's performance was evaluated against other SV callers using various benchmark sets, showing superior speed and accuracy. It can detect complex SVs, including those with multiple breakpoints, and mosaic SVs in bulk long-read data. It was applied to 31 ONT datasets representing Mendelian disorders, detecting SVs with high accuracy. Sniffles2 also identified mosaic SVs in a patient with multiple system atrophy, validated by PCR and Sanger sequencing. It outperformed other methods in detecting low-frequency SVs and provided accurate results compared to Illumina and Bionano optical genome mapping. Sniffles2 is a versatile tool for detecting SVs in various contexts, including cancer and neurodegenerative diseases, and is suitable for diploid, haploid, and polyploid organisms. It addresses challenges in SV calling, including the n + 1 problem and the need for accurate reporting of complex alleles. Overall, Sniffles2 represents a significant advancement in SV detection, offering improved accuracy and efficiency for long-read sequencing data.