StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

2015 March | Mihaela Perteă, Geo M Perteă, Corina M Antonescu, Tsung-Cheng Chang, Joshua T Mendell, and Steven L Salzberg
StringTie is a computational method that uses a network flow algorithm and optional de novo assembly to improve transcriptome reconstruction from RNA-seq reads. It outperforms other transcript assembly programs like Cufflinks, IsoLasso, Scripture, and Thrap in terms of transcript completeness and accuracy. On simulated and real data, StringTie correctly assembled more transcripts than these programs. For example, on 90 million human blood reads, StringTie assembled 10,990 transcripts, compared to 7,187 by Cufflinks. On a simulated dataset, StringTie assembled 7,559 transcripts, 20% more than Cufflinks. StringTie also runs faster than other assembly software. The study highlights the increasing complexity of transcriptomes in higher eukaryotes, with many transcripts undergoing alternative splicing. RNA-seq has revolutionized gene discovery by enabling high-throughput sequencing of transcribed genes. However, assembling short reads into full transcripts remains challenging due to variable sequence coverage and shared exons among transcripts. StringTie addresses these challenges by using a genome-guided approach combined with de novo assembly concepts. It groups reads into clusters, creates a splice graph for each cluster, and estimates expression levels using a maximum flow algorithm. StringTie's accuracy is enhanced by its use of a network flow algorithm, which allows it to assemble transcripts and estimate their expression levels simultaneously. It also incorporates aligned de novo assembled fragments, improving accuracy. StringTie outperforms other programs in both simulated and real data, with higher sensitivity and precision. On real data, StringTie correctly predicted more transcripts than Cufflinks, with a lower false-positive rate. It is faster than other programs and has a smaller memory footprint. StringTie is implemented in C++ and is freely available as open-source software. It is designed to be used in RNA-seq analysis pipelines, offering improved transcript assembly and faster performance compared to existing tools. The study demonstrates that StringTie provides a more accurate and efficient method for transcriptome reconstruction from RNA-seq data.StringTie is a computational method that uses a network flow algorithm and optional de novo assembly to improve transcriptome reconstruction from RNA-seq reads. It outperforms other transcript assembly programs like Cufflinks, IsoLasso, Scripture, and Thrap in terms of transcript completeness and accuracy. On simulated and real data, StringTie correctly assembled more transcripts than these programs. For example, on 90 million human blood reads, StringTie assembled 10,990 transcripts, compared to 7,187 by Cufflinks. On a simulated dataset, StringTie assembled 7,559 transcripts, 20% more than Cufflinks. StringTie also runs faster than other assembly software. The study highlights the increasing complexity of transcriptomes in higher eukaryotes, with many transcripts undergoing alternative splicing. RNA-seq has revolutionized gene discovery by enabling high-throughput sequencing of transcribed genes. However, assembling short reads into full transcripts remains challenging due to variable sequence coverage and shared exons among transcripts. StringTie addresses these challenges by using a genome-guided approach combined with de novo assembly concepts. It groups reads into clusters, creates a splice graph for each cluster, and estimates expression levels using a maximum flow algorithm. StringTie's accuracy is enhanced by its use of a network flow algorithm, which allows it to assemble transcripts and estimate their expression levels simultaneously. It also incorporates aligned de novo assembled fragments, improving accuracy. StringTie outperforms other programs in both simulated and real data, with higher sensitivity and precision. On real data, StringTie correctly predicted more transcripts than Cufflinks, with a lower false-positive rate. It is faster than other programs and has a smaller memory footprint. StringTie is implemented in C++ and is freely available as open-source software. It is designed to be used in RNA-seq analysis pipelines, offering improved transcript assembly and faster performance compared to existing tools. The study demonstrates that StringTie provides a more accurate and efficient method for transcriptome reconstruction from RNA-seq data.
Reach us at info@study.space
[slides and audio] StringTie enables improved reconstruction of a transcriptome from RNA-seq reads