Streaming fragment assignment for real-time analysis of sequencing experiments

Streaming fragment assignment for real-time analysis of sequencing experiments

2013 January | Adam Roberts and Lior Pachter
The paper introduces eXpress, a software package for efficiently assigning sequenced fragments in real-time. It uses a streaming algorithm with linear runtime and constant memory usage, enabling real-time abundance estimation for sequenced molecules in applications like RNA-seq, ChIP-seq, and metagenomics. eXpress outperforms existing quantification methods in efficiency and accuracy. High-throughput sequencing generates large volumes of data that are costly to store and process. Fragment assignment, determining the origin of ambiguously mapped fragments, is a major computational challenge. Traditional batch EM algorithms are not easily parallelizable and scale poorly with sequencing depth. eXpress addresses this by using an online algorithm that processes data one fragment at a time, updating parameters dynamically and adapting to new information. eXpress is implemented as open-source software and is suitable for many applications requiring probabilistic fragment assignment. It reduces memory usage and achieves significant speed improvements over previous methods. The algorithm uses a probabilistic model to estimate fragment abundances and incorporates parameters for fragment length distribution, sequence bias, and sequencing errors. The paper compares eXpress with RSEM and Cufflinks using simulated RNA-seq data. eXpress outperforms both in terms of speed and accuracy, especially at high sequencing depths. It is also more memory-efficient and can be used in real-time sequencing applications. The algorithm is compatible with single-molecule sequencing technologies and can be used to estimate abundances in real-time as fragments are sequenced. eXpress uses a streaming algorithm that processes data incrementally, allowing it to adapt to new information and converge to the global maximum likelihood solution. It is efficient in terms of both time and memory, with run time linear in the number of fragments and memory usage proportional to transcriptome size. The software is freely available and can be used for a wide range of sequencing applications.The paper introduces eXpress, a software package for efficiently assigning sequenced fragments in real-time. It uses a streaming algorithm with linear runtime and constant memory usage, enabling real-time abundance estimation for sequenced molecules in applications like RNA-seq, ChIP-seq, and metagenomics. eXpress outperforms existing quantification methods in efficiency and accuracy. High-throughput sequencing generates large volumes of data that are costly to store and process. Fragment assignment, determining the origin of ambiguously mapped fragments, is a major computational challenge. Traditional batch EM algorithms are not easily parallelizable and scale poorly with sequencing depth. eXpress addresses this by using an online algorithm that processes data one fragment at a time, updating parameters dynamically and adapting to new information. eXpress is implemented as open-source software and is suitable for many applications requiring probabilistic fragment assignment. It reduces memory usage and achieves significant speed improvements over previous methods. The algorithm uses a probabilistic model to estimate fragment abundances and incorporates parameters for fragment length distribution, sequence bias, and sequencing errors. The paper compares eXpress with RSEM and Cufflinks using simulated RNA-seq data. eXpress outperforms both in terms of speed and accuracy, especially at high sequencing depths. It is also more memory-efficient and can be used in real-time sequencing applications. The algorithm is compatible with single-molecule sequencing technologies and can be used to estimate abundances in real-time as fragments are sequenced. eXpress uses a streaming algorithm that processes data incrementally, allowing it to adapt to new information and converge to the global maximum likelihood solution. It is efficient in terms of both time and memory, with run time linear in the number of fragments and memory usage proportional to transcriptome size. The software is freely available and can be used for a wide range of sequencing applications.
Reach us at info@study.space