Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

July 2024 | Unknown Author
A consortium evaluated long-read RNA-seq methods for transcript identification and quantification, generating over 427 million long-read sequences from human, mouse, and manatee samples. The study found that longer, more accurate sequences produced more accurate transcripts than increased read depth, while greater read depth improved quantification accuracy. Reference-based tools performed best in well-annotated genomes, and incorporating orthogonal data and replicates was advised for rare/novel transcripts. Long-read methods showed potential for capturing full-length and novel transcripts, but quantification remained challenging due to throughput and error limitations. The study validated many lowly expressed transcripts, suggesting further exploration of long-read data for reference transcriptomes. The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) tested tools across three challenges: full-length transcript reconstruction in well-annotated genomes, transcript quantification, and de novo transcript reconstruction in poorly annotated genomes. Results showed significant variability in transcript detection and quantification across tools, with some tools excelling in specific categories. For example, Bambu, FLAIR, and FLAMES performed well in detecting full splice matches, while TALON, IsoTools, and LyRic detected more incomplete splice matches. Novel transcript detection varied widely, with some tools showing high sensitivity but low precision. Quantification results revealed that long-read tools generally performed worse than short-read tools, particularly for lowly expressed transcripts. However, some long-read tools, such as IsoQuant, FLAIR, and Bambu, showed comparable performance to short-read tools in certain scenarios. The study also found that transcript detection was influenced by the quality of the reference annotation and the type of sequencing data used. For de novo transcript detection, long-read methods were tested on a manatee sample with limited genomic information. Four tools were evaluated, with rnaSPAdes predicting the most transcripts and Bambu predicting the fewest. Most detected transcripts in the mouse sample were novel, highlighting the impact of annotation on predictions. The study also found that transcript detection without a reference annotation was challenging, with performance varying across tools and data types. The study concluded that long-read RNA-seq methods offer significant potential for transcriptome analysis, but their performance depends on factors such as read length, accuracy, and the availability of reference annotations. The consortium recommended using cDNA-PacBio and R2C2-ONT datasets for transcript identification and cDNA-ONT and CapTrap-ONT for quantification. For de novo transcript detection, tools like Bambu, IsoQuant, and FLAIR were recommended, along with the use of orthogonal data and replicates. Overall, the study emphasized the importance of benchmarking and validation in improving the accuracy and reliability of long-read RNA-seq methods.A consortium evaluated long-read RNA-seq methods for transcript identification and quantification, generating over 427 million long-read sequences from human, mouse, and manatee samples. The study found that longer, more accurate sequences produced more accurate transcripts than increased read depth, while greater read depth improved quantification accuracy. Reference-based tools performed best in well-annotated genomes, and incorporating orthogonal data and replicates was advised for rare/novel transcripts. Long-read methods showed potential for capturing full-length and novel transcripts, but quantification remained challenging due to throughput and error limitations. The study validated many lowly expressed transcripts, suggesting further exploration of long-read data for reference transcriptomes. The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) tested tools across three challenges: full-length transcript reconstruction in well-annotated genomes, transcript quantification, and de novo transcript reconstruction in poorly annotated genomes. Results showed significant variability in transcript detection and quantification across tools, with some tools excelling in specific categories. For example, Bambu, FLAIR, and FLAMES performed well in detecting full splice matches, while TALON, IsoTools, and LyRic detected more incomplete splice matches. Novel transcript detection varied widely, with some tools showing high sensitivity but low precision. Quantification results revealed that long-read tools generally performed worse than short-read tools, particularly for lowly expressed transcripts. However, some long-read tools, such as IsoQuant, FLAIR, and Bambu, showed comparable performance to short-read tools in certain scenarios. The study also found that transcript detection was influenced by the quality of the reference annotation and the type of sequencing data used. For de novo transcript detection, long-read methods were tested on a manatee sample with limited genomic information. Four tools were evaluated, with rnaSPAdes predicting the most transcripts and Bambu predicting the fewest. Most detected transcripts in the mouse sample were novel, highlighting the impact of annotation on predictions. The study also found that transcript detection without a reference annotation was challenging, with performance varying across tools and data types. The study concluded that long-read RNA-seq methods offer significant potential for transcriptome analysis, but their performance depends on factors such as read length, accuracy, and the availability of reference annotations. The consortium recommended using cDNA-PacBio and R2C2-ONT datasets for transcript identification and cDNA-ONT and CapTrap-ONT for quantification. For de novo transcript detection, tools like Bambu, IsoQuant, and FLAIR were recommended, along with the use of orthogonal data and replicates. Overall, the study emphasized the importance of benchmarking and validation in improving the accuracy and reliability of long-read RNA-seq methods.
Reach us at info@futurestudyspace.com