Identification of novel transcripts in annotated genomes using RNA-Seq

Identification of novel transcripts in annotated genomes using RNA-Seq

June 21, 2011 | Adam Roberts, Harold Pimentel, Cole Trapnell, Lior Pachter
This paper introduces a new method for assembling novel transcripts in annotated genomes using RNA-Seq data, called Reference Annotation Based Transcript (RABT) assembly. The method uses existing annotations to improve the assembly of transcripts, especially those that are not well covered by sequencing data. The approach involves generating 'faux-reads' from reference transcripts to capture features that may be missing in the sequencing data due to low coverage. These faux-reads are then merged with actual sequencing reads to assemble transcripts. The resulting assemblies are then compared with the reference annotation to filter out transfrags that are already present in the annotation. The RABT method was tested on human and Drosophila data. For human data, the RABT assembly produced longer and more accurate transcripts compared to the original Cufflinks assembler. It also identified more novel transcripts, although some of these were found to be false positives. For Drosophila data, the method similarly produced improved assemblies, with novel transcripts that had conservation probabilities similar to known transcripts. The RABT method is a 'pure' assembler that does not use information about the structure of coding genes or other external data. This allows it to assemble non-coding RNA transcripts. However, the method's parameters can vary depending on the organism and the experiment. The paper also highlights the importance of accurate genome annotations for accurate gene expression estimation and suggests that RABT is essential until annotations are improved. The study shows that RABT can effectively identify novel transcripts in annotated genomes, improving the accuracy of genome annotations. The method is applicable to various organisms and can be used to incrementally improve annotations. The paper also discusses the limitations of the method, including the potential for false positives and the need for further research to optimize the parameters used in the assembly process.This paper introduces a new method for assembling novel transcripts in annotated genomes using RNA-Seq data, called Reference Annotation Based Transcript (RABT) assembly. The method uses existing annotations to improve the assembly of transcripts, especially those that are not well covered by sequencing data. The approach involves generating 'faux-reads' from reference transcripts to capture features that may be missing in the sequencing data due to low coverage. These faux-reads are then merged with actual sequencing reads to assemble transcripts. The resulting assemblies are then compared with the reference annotation to filter out transfrags that are already present in the annotation. The RABT method was tested on human and Drosophila data. For human data, the RABT assembly produced longer and more accurate transcripts compared to the original Cufflinks assembler. It also identified more novel transcripts, although some of these were found to be false positives. For Drosophila data, the method similarly produced improved assemblies, with novel transcripts that had conservation probabilities similar to known transcripts. The RABT method is a 'pure' assembler that does not use information about the structure of coding genes or other external data. This allows it to assemble non-coding RNA transcripts. However, the method's parameters can vary depending on the organism and the experiment. The paper also highlights the importance of accurate genome annotations for accurate gene expression estimation and suggests that RABT is essential until annotations are improved. The study shows that RABT can effectively identify novel transcripts in annotated genomes, improving the accuracy of genome annotations. The method is applicable to various organisms and can be used to incrementally improve annotations. The paper also discusses the limitations of the method, including the potential for false positives and the need for further research to optimize the parameters used in the assembly process.
Reach us at info@study.space