2016 | Bronwen L. Aken, Sarah Ayling, Daniel Barrell, Laura Clarke, Valery Curwen, Susan Fairley, Julio Fernandez Banet, Konstantinos Billis, Carlos Garcia Girón, Thibaut Hourlier, Kevin Howe, Andreas Kähäri, Felix Kokocinski, Fergal J. Martin, Daniel N. Murphy, Rishi Nag, Magali Ruffier, Michael Schuster, Y. Amy Tang, Jan-Hinnerk Vogel, Simon White, Amonida Zadissa, Paul Flicek, and Stephen M. J. Searle
The Ensembl gene annotation system is a comprehensive and high-quality resource for annotating vertebrate genomes. It involves several key steps: genome preparation, protein-coding model building, filtering, and gene set finalization. The system uses alignments of biological sequences, including cDNAs, proteins, and RNA-seq reads, to construct candidate transcript models. These models are then filtered to produce the final gene set, which is available on the Ensembl website. The process is designed to identify full-length protein-coding genes with high accuracy and incorporates manual curation for complex regions. The system has been extended to handle fragmented genome assemblies and limited same-species data, and it integrates data from new sequencing technologies. Ensembl's annotations are widely used in various research areas, including disease studies, evolution, metabolism, and gene expression. The article provides a detailed overview of the annotation process, including the use of specific pipelines and methods for different species and genome types.The Ensembl gene annotation system is a comprehensive and high-quality resource for annotating vertebrate genomes. It involves several key steps: genome preparation, protein-coding model building, filtering, and gene set finalization. The system uses alignments of biological sequences, including cDNAs, proteins, and RNA-seq reads, to construct candidate transcript models. These models are then filtered to produce the final gene set, which is available on the Ensembl website. The process is designed to identify full-length protein-coding genes with high accuracy and incorporates manual curation for complex regions. The system has been extended to handle fragmented genome assemblies and limited same-species data, and it integrates data from new sequencing technologies. Ensembl's annotations are widely used in various research areas, including disease studies, evolution, metabolism, and gene expression. The article provides a detailed overview of the annotation process, including the use of specific pipelines and methods for different species and genome types.