Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms

Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms

2010 May ; 28(5): 511–515. doi:10.1038/nbt.1621 | Cole Trapnell, Brian A. Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J. van Baren, Steven L. Salzberg, Barbara J. Wold, and Lior Pachter
The study introduces an algorithm called Cufflinks for transcript assembly and abundance estimation from RNA-Seq data. Cufflinks is implemented in an open-source software program and was tested on a mouse myoblast cell line representing a differentiation time series. The algorithm detected 13,692 known transcripts and 3,724 previously unannotated transcripts, with 62% supported by independent expression data or homologous genes in other species. Analysis revealed complete switches in the dominant transcription start site (TSS) or splice-isoform in 330 genes and more subtle shifts in 1,304 genes, highlighting regulatory flexibility and complexity in muscle development. Cufflinks uses a statistical model to estimate transcript abundances, incorporating the distribution of fragment lengths to help assign fragments to isoforms. The software is applicable to a broad range of RNA-Seq studies and can be used to annotate genomes of newly sequenced organisms.The study introduces an algorithm called Cufflinks for transcript assembly and abundance estimation from RNA-Seq data. Cufflinks is implemented in an open-source software program and was tested on a mouse myoblast cell line representing a differentiation time series. The algorithm detected 13,692 known transcripts and 3,724 previously unannotated transcripts, with 62% supported by independent expression data or homologous genes in other species. Analysis revealed complete switches in the dominant transcription start site (TSS) or splice-isoform in 330 genes and more subtle shifts in 1,304 genes, highlighting regulatory flexibility and complexity in muscle development. Cufflinks uses a statistical model to estimate transcript abundances, incorporating the distribution of fragment lengths to help assign fragments to isoforms. The software is applicable to a broad range of RNA-Seq studies and can be used to annotate genomes of newly sequenced organisms.
Reach us at info@study.space