The Landscape of Long Noncoding RNAs in the Human Transcriptome

The Landscape of Long Noncoding RNAs in the Human Transcriptome

2015 March | Matthew K. Iyer, Yashar S. Niknafs, Rohit Malik, Udit Singhal, Anirban Sahu, Yasuyuki Hosono, Terrence R. Barrette, John R. Prensner, Joseph R. Evans, Shuang Zhao, Anton Poliakov, Xuhong Cao, Saravana M. Dhanasekaran, Yi-Mi Wu, Dan R. Robinson, David G. Beer, Felix Y. Feng, Hariharan K. Iyer, and Arul M. Chinnaiyan
The study presents a comprehensive analysis of the human transcriptome, focusing on long non-coding RNAs (lncRNAs). Using RNA-Seq data from 7,256 libraries across 25 studies, the researchers assembled a consensus human transcriptome of 91,013 expressed genes. Over 68% of these genes were classified as lncRNAs, with 79% previously unannotated. The study identified 58,648 lncRNA genes, including 7,942 lineage- or cancer-associated lncRNAs. The lncRNA landscape characterized here may provide insights into normal biology and cancer pathogenesis, and be valuable for future biomarker development. The study also identified 597 lncRNAs harboring ultraconserved elements and 3,900 overlapping disease-associated SNPs. The researchers developed a non-parametric method for differential expression testing called Sample Set Enrichment Analysis (SSEA) to prioritize disease-associated and lineage-specific transcription. Using this method, they identified 267,726 MiTranscriptome transcripts with significant associations involving over two million significant associations. The study also characterized the coding potential of long RNA transcripts, classifying them into five categories: protein-coding, read-through, pseudogene, lncRNA, and transcript of unknown coding potential (TUCP). Over 60% of MiTranscriptome genes were classified as either lncRNAs or TUCPs. The study also identified 3,309 lncRNAs harboring markedly higher base-wise conservation relative to random intergenic regions, and 597 intergenic lncRNAs harboring ultraconserved elements. The study also investigated the relationship of the MiTranscriptome assembly with disease-associated regions of the genome, finding that transcripts in the assembly overlapped 2,881 formerly intergenic SNPs located within 'gene deserts'. The study also identified 545 lncRNA genes with ultraconserved elements but not meeting the stringent lineage and cancer association criteria, designated as HICLINCs. The study also identified several lncRNAs associated with cancer, including BRCAT49 and MEAT6. The study also identified numerous lncRNAs associated with different cancer types and tissue types, suggesting that lncRNAs may play a role in cancer development and progression. The study also identified that lncRNAs may have translational potential for use in non-invasive clinical tests, particularly for cancers that lack reliable biomarkers. The study concludes that the MiTranscriptome assembly and lncRNAs identified by this study, as well as the computational tools developed herein, will provide a foundation for lncRNA genomics, biomarker development, and the delineation of cancer disease mechanisms.The study presents a comprehensive analysis of the human transcriptome, focusing on long non-coding RNAs (lncRNAs). Using RNA-Seq data from 7,256 libraries across 25 studies, the researchers assembled a consensus human transcriptome of 91,013 expressed genes. Over 68% of these genes were classified as lncRNAs, with 79% previously unannotated. The study identified 58,648 lncRNA genes, including 7,942 lineage- or cancer-associated lncRNAs. The lncRNA landscape characterized here may provide insights into normal biology and cancer pathogenesis, and be valuable for future biomarker development. The study also identified 597 lncRNAs harboring ultraconserved elements and 3,900 overlapping disease-associated SNPs. The researchers developed a non-parametric method for differential expression testing called Sample Set Enrichment Analysis (SSEA) to prioritize disease-associated and lineage-specific transcription. Using this method, they identified 267,726 MiTranscriptome transcripts with significant associations involving over two million significant associations. The study also characterized the coding potential of long RNA transcripts, classifying them into five categories: protein-coding, read-through, pseudogene, lncRNA, and transcript of unknown coding potential (TUCP). Over 60% of MiTranscriptome genes were classified as either lncRNAs or TUCPs. The study also identified 3,309 lncRNAs harboring markedly higher base-wise conservation relative to random intergenic regions, and 597 intergenic lncRNAs harboring ultraconserved elements. The study also investigated the relationship of the MiTranscriptome assembly with disease-associated regions of the genome, finding that transcripts in the assembly overlapped 2,881 formerly intergenic SNPs located within 'gene deserts'. The study also identified 545 lncRNA genes with ultraconserved elements but not meeting the stringent lineage and cancer association criteria, designated as HICLINCs. The study also identified several lncRNAs associated with cancer, including BRCAT49 and MEAT6. The study also identified numerous lncRNAs associated with different cancer types and tissue types, suggesting that lncRNAs may play a role in cancer development and progression. The study also identified that lncRNAs may have translational potential for use in non-invasive clinical tests, particularly for cancers that lack reliable biomarkers. The study concludes that the MiTranscriptome assembly and lncRNAs identified by this study, as well as the computational tools developed herein, will provide a foundation for lncRNA genomics, biomarker development, and the delineation of cancer disease mechanisms.
Reach us at info@futurestudyspace.com