Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications

Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications

2016 | Xiaoyu Chen¹, Ole Schulz-Trießlaff², Richard Shaw², Bret Barnes¹, Felix Schlesinger¹, Morten Källberg², Anthony J. Cox², Semyon Kruglyak¹ and Christopher T. Saunders¹,*
Manta is a method for rapid detection of structural variants (SVs) and indels from next-generation sequencing data, optimized for germline and somatic analysis. It can identify SVs, medium-sized indels, and large insertions in less than a tenth of the time required by comparable methods. Manta uses paired and split-read evidence to discover and score variants, with scoring models optimized for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs. Call quality is similar to or better than comparable methods, as determined by pedigree consistency of germline calls and comparison of somatic calls to COSMIC database variants. Manta consistently assembles a higher fraction of its calls to base-pair resolution, allowing for improved downstream annotation and analysis of clinical significance. Manta is released under the open-source GPLv3 license, with source code, documentation, and Linux binaries available at https://github.com/illumina/manta. Manta's workflow is designed for high parallelization on individual or small sets of samples. It operates in two phases: first, a graph of all breakend associations within the genome is built, then the components of this graph are processed for variant hypothesis generation, assembly, scoring, and VCF reporting. The breakend graph contains edges between any genomic regions where evidence of a long-range adjacency exists, and indel assembly regions are denoted as self-edges. The graph does not express specific variant hypotheses, so it is very compact and can be constructed from segments of the genome in parallel. Following graph construction, individual edges (or larger subgraphs) are analyzed for variants in parallel. Each edge is analyzed to find imprecise variant hypotheses, for which variant reads are assembled and aligned back to the genome. Assembly is attempted for all cases, but is not required to report a variant. All paired and split-read evidence is consolidated to a quality score under either a germline or somatic variant model, and filtration metrics complement this quality score to improve call precision. Manta's approach is sufficiently flexible to support several types of sequencing assays. The primary focus for rapid analysis and large-scale SV calling has been whole genome sequencing, but Manta is routinely used to analyze exome and other enrichment-based targeted sequencing assays. The method is not designed for targeted amplicon sequencing but successful results have been reported. Manta has been extensively optimized to handle the shorter fragment lengths and higher chimera rates found in highly degraded FFPE samples as part of an ongoing focus on clinical sequencing workflows.Manta is a method for rapid detection of structural variants (SVs) and indels from next-generation sequencing data, optimized for germline and somatic analysis. It can identify SVs, medium-sized indels, and large insertions in less than a tenth of the time required by comparable methods. Manta uses paired and split-read evidence to discover and score variants, with scoring models optimized for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs. Call quality is similar to or better than comparable methods, as determined by pedigree consistency of germline calls and comparison of somatic calls to COSMIC database variants. Manta consistently assembles a higher fraction of its calls to base-pair resolution, allowing for improved downstream annotation and analysis of clinical significance. Manta is released under the open-source GPLv3 license, with source code, documentation, and Linux binaries available at https://github.com/illumina/manta. Manta's workflow is designed for high parallelization on individual or small sets of samples. It operates in two phases: first, a graph of all breakend associations within the genome is built, then the components of this graph are processed for variant hypothesis generation, assembly, scoring, and VCF reporting. The breakend graph contains edges between any genomic regions where evidence of a long-range adjacency exists, and indel assembly regions are denoted as self-edges. The graph does not express specific variant hypotheses, so it is very compact and can be constructed from segments of the genome in parallel. Following graph construction, individual edges (or larger subgraphs) are analyzed for variants in parallel. Each edge is analyzed to find imprecise variant hypotheses, for which variant reads are assembled and aligned back to the genome. Assembly is attempted for all cases, but is not required to report a variant. All paired and split-read evidence is consolidated to a quality score under either a germline or somatic variant model, and filtration metrics complement this quality score to improve call precision. Manta's approach is sufficiently flexible to support several types of sequencing assays. The primary focus for rapid analysis and large-scale SV calling has been whole genome sequencing, but Manta is routinely used to analyze exome and other enrichment-based targeted sequencing assays. The method is not designed for targeted amplicon sequencing but successful results have been reported. Manta has been extensively optimized to handle the shorter fragment lengths and higher chimera rates found in highly degraded FFPE samples as part of an ongoing focus on clinical sequencing workflows.
Reach us at info@study.space