LUMPY: A probabilistic framework for structural variant discovery

LUMPY: A probabilistic framework for structural variant discovery

| Ryan M Layer, Aaron R Quinlan*, Ira M Hall*
LUMPY is a novel probabilistic framework for structural variant (SV) discovery that integrates multiple alignment signals, including read-pair, split-read, and read-depth, to improve sensitivity. The framework maps each SV signal to a common abstract representation— breakpoint probability distributions—and performs SV prediction at this higher level. This approach allows for straightforward signal integration, produces probabilistic breakpoint position estimates, and can be extended to new signals as sequencing technologies evolve. LUMPY demonstrates superior sensitivity compared to existing methods, especially in low coverage datasets or heterogeneous tumor samples, where variants may be present in only a subset of cells. The framework is implemented in an open-source C++ software package and can handle multiple samples by tracking the sample origin of each probability distribution. Performance comparisons with other tools (GASVPro, DELLY, and PINDEL) show that LUMPY consistently outperforms them in terms of sensitivity and false discovery rate (FDR), particularly in low coverage and heterogeneous tumor scenarios. The benefits of integrating multiple signals are evident, as the super-additive effect of combining read-pair and split-read signals within LUMPY is more significant than when using either signal alone. LUMPY's ability to integrate various types of evidence from multiple sources and its use of probability distributions for more accurate breakpoint prediction make it a powerful tool for SV discovery, especially in cancer genomics.LUMPY is a novel probabilistic framework for structural variant (SV) discovery that integrates multiple alignment signals, including read-pair, split-read, and read-depth, to improve sensitivity. The framework maps each SV signal to a common abstract representation— breakpoint probability distributions—and performs SV prediction at this higher level. This approach allows for straightforward signal integration, produces probabilistic breakpoint position estimates, and can be extended to new signals as sequencing technologies evolve. LUMPY demonstrates superior sensitivity compared to existing methods, especially in low coverage datasets or heterogeneous tumor samples, where variants may be present in only a subset of cells. The framework is implemented in an open-source C++ software package and can handle multiple samples by tracking the sample origin of each probability distribution. Performance comparisons with other tools (GASVPro, DELLY, and PINDEL) show that LUMPY consistently outperforms them in terms of sensitivity and false discovery rate (FDR), particularly in low coverage and heterogeneous tumor scenarios. The benefits of integrating multiple signals are evident, as the super-additive effect of combining read-pair and split-read signals within LUMPY is more significant than when using either signal alone. LUMPY's ability to integrate various types of evidence from multiple sources and its use of probability distributions for more accurate breakpoint prediction make it a powerful tool for SV discovery, especially in cancer genomics.
Reach us at info@study.space