VOL 420 | 5 DECEMBER 2002 | The FANTOM Consortium and the RIKEN Genome Exploration Research Group Phase I & II Team
The FANTOM Consortium and RIKEN Genome Exploration Research Group Phase I & II have manually annotated 60,770 full-length mouse cDNAs, clustering them into 33,409 transcriptional units (TUs), which account for 90.1% of a newly established mouse transcriptome database. These TUs include 4,258 new protein-coding and 11,665 new non-coding RNA transcripts, indicating that non-coding RNA is a major component of the transcriptome. 41% of TUs showed evidence of alternative splicing, with 79% of protein-coding splice variations altering the protein product. The study identified 2,431 sense-antisense pairs, providing a comprehensive survey of the mammalian transcriptome. The data set includes 60,770 high-quality full-length cDNA sequences with an average length of 1.97 kb. The study also analyzed the functional annotation of these sequences, identifying 13,736 TUs with some functional information, and 6,929 TUs with gene names from known mouse DNA/protein or inferred from EST/mRNA clusters. The study also identified 4,258 TUs with protein-coding transcripts and 11,665 TUs without protein-coding potential. The analysis of the transcriptome revealed that non-coding RNAs are a significant component, with 15,923 novel TUs. The study also identified 4,148 non-coding RNAs with splice evidence, and 4,280 transcripts as the strongest non-coding RNA candidates. The study also identified 2,431 sense-antisense pairs, and analyzed the proteome, identifying 18,768 representative proteins with GO annotations. The study also identified 1,712 human disease-associated genes and 1,022 disease-associated protein sequences. The study also identified new protein-coding transcripts, including 33 KIFs, 4 E1 ubiquitin activating enzymes, 13 E2 ubiquitin conjugating enzymes, 98 E3 ubiquitin ligases, and 6 de-ubiquitinating enzymes. The study also identified 410 candidate GPCRs, and 726 distinct EC numbers for metabolic enzymes. The study also identified the tricarboxylic acid (TCA) cycle in the RTPS. The study also analyzed the impact of alternative splicing on the proteome, identifying 8,500 variant exons with length variation. The study also identified the impact of alternative splicing on protein function, including examples of proteins with altered functions due to alternative splicing. The study also discussed the importance of non-coding RNAs in the mammalian transcriptome, and the challengesThe FANTOM Consortium and RIKEN Genome Exploration Research Group Phase I & II have manually annotated 60,770 full-length mouse cDNAs, clustering them into 33,409 transcriptional units (TUs), which account for 90.1% of a newly established mouse transcriptome database. These TUs include 4,258 new protein-coding and 11,665 new non-coding RNA transcripts, indicating that non-coding RNA is a major component of the transcriptome. 41% of TUs showed evidence of alternative splicing, with 79% of protein-coding splice variations altering the protein product. The study identified 2,431 sense-antisense pairs, providing a comprehensive survey of the mammalian transcriptome. The data set includes 60,770 high-quality full-length cDNA sequences with an average length of 1.97 kb. The study also analyzed the functional annotation of these sequences, identifying 13,736 TUs with some functional information, and 6,929 TUs with gene names from known mouse DNA/protein or inferred from EST/mRNA clusters. The study also identified 4,258 TUs with protein-coding transcripts and 11,665 TUs without protein-coding potential. The analysis of the transcriptome revealed that non-coding RNAs are a significant component, with 15,923 novel TUs. The study also identified 4,148 non-coding RNAs with splice evidence, and 4,280 transcripts as the strongest non-coding RNA candidates. The study also identified 2,431 sense-antisense pairs, and analyzed the proteome, identifying 18,768 representative proteins with GO annotations. The study also identified 1,712 human disease-associated genes and 1,022 disease-associated protein sequences. The study also identified new protein-coding transcripts, including 33 KIFs, 4 E1 ubiquitin activating enzymes, 13 E2 ubiquitin conjugating enzymes, 98 E3 ubiquitin ligases, and 6 de-ubiquitinating enzymes. The study also identified 410 candidate GPCRs, and 726 distinct EC numbers for metabolic enzymes. The study also identified the tricarboxylic acid (TCA) cycle in the RTPS. The study also analyzed the impact of alternative splicing on the proteome, identifying 8,500 variant exons with length variation. The study also identified the impact of alternative splicing on protein function, including examples of proteins with altered functions due to alternative splicing. The study also discussed the importance of non-coding RNAs in the mammalian transcriptome, and the challenges