MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

2011 | Carson Holt and Mark Yandell
MAKER2 is a genome annotation and data management tool designed for second-generation genome projects. It is a multi-threaded, parallelized application that can process datasets of virtually any size. MAKER2 can produce accurate annotations for novel genomes with limited or no training data. It also uses mRNA-seq data to improve annotation quality and update legacy annotations. MAKER2 can evaluate the quality of genome annotations and identify problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. It scales to datasets of any size, requires little training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets. MAKER2 builds upon MAKER, an easy-to-use genome annotation pipeline. MAKER2 improves upon the de novo annotation capabilities of the original MAKER and integrates support for multiple ab initio prediction tools. Major additions include the Annotation Edit Distance (AED) metric for improved quality control and downstream database management, support for mRNA-seq, and gene model pass-through capability. MAKER2 supports distributed parallelization on computer clusters via MPI, allowing it to scale to datasets of any size. MAKER2 can run on UNIX-like operating systems such as Linux and Darwin in Mac OS X. MAKER2 was tested on first-generation genomes, including D. melanogaster, C. elegans, and A. thaliana. It produced ab initio gene predictions using SNAP, Augustus, and GeneMark-ES. Evidence-based gene annotations were produced using default settings. The performance of MAKER2 was evaluated using AED, a quality control measure developed by the Sequence Ontology project. MAKER2 was also tested on second-generation genomes, including Schmidtea mediterranea and Linepithema humile. It used CEGMA to produce gene models for training SNAP. MAKER2 was also tested on the maize genome, where it re-annotated a 22 megabase region of the Zea mays (maize) inbred line B73 chromosome 4. MAKER2 was used to add experimental evidence and quality control statistics to existing genome databases. It was used to add cross-species homology data to six published ant genomes. MAKER2 was also used to evaluate the performance of ab initio gene prediction algorithms using AED. MAKER2's performance was compared to ab initio gene predictors such as SNAP, GeneMark, and Augustus. MAKER2 was found to perform well in these comparisons, even with limited training data. MAKER2 provides a simple method to perform re-annotation of existing genomes and legacy annotations. It uses its external annotation pass-through mechanism to accept pre-existing genome annotations and aligned experimental evidence. MAKER2 can produce new gene models for regions where the evidence suggests the existence of a gene not found in the legacy set. It can also update/revise legacyMAKER2 is a genome annotation and data management tool designed for second-generation genome projects. It is a multi-threaded, parallelized application that can process datasets of virtually any size. MAKER2 can produce accurate annotations for novel genomes with limited or no training data. It also uses mRNA-seq data to improve annotation quality and update legacy annotations. MAKER2 can evaluate the quality of genome annotations and identify problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. It scales to datasets of any size, requires little training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets. MAKER2 builds upon MAKER, an easy-to-use genome annotation pipeline. MAKER2 improves upon the de novo annotation capabilities of the original MAKER and integrates support for multiple ab initio prediction tools. Major additions include the Annotation Edit Distance (AED) metric for improved quality control and downstream database management, support for mRNA-seq, and gene model pass-through capability. MAKER2 supports distributed parallelization on computer clusters via MPI, allowing it to scale to datasets of any size. MAKER2 can run on UNIX-like operating systems such as Linux and Darwin in Mac OS X. MAKER2 was tested on first-generation genomes, including D. melanogaster, C. elegans, and A. thaliana. It produced ab initio gene predictions using SNAP, Augustus, and GeneMark-ES. Evidence-based gene annotations were produced using default settings. The performance of MAKER2 was evaluated using AED, a quality control measure developed by the Sequence Ontology project. MAKER2 was also tested on second-generation genomes, including Schmidtea mediterranea and Linepithema humile. It used CEGMA to produce gene models for training SNAP. MAKER2 was also tested on the maize genome, where it re-annotated a 22 megabase region of the Zea mays (maize) inbred line B73 chromosome 4. MAKER2 was used to add experimental evidence and quality control statistics to existing genome databases. It was used to add cross-species homology data to six published ant genomes. MAKER2 was also used to evaluate the performance of ab initio gene prediction algorithms using AED. MAKER2's performance was compared to ab initio gene predictors such as SNAP, GeneMark, and Augustus. MAKER2 was found to perform well in these comparisons, even with limited training data. MAKER2 provides a simple method to perform re-annotation of existing genomes and legacy annotations. It uses its external annotation pass-through mechanism to accept pre-existing genome annotations and aligned experimental evidence. MAKER2 can produce new gene models for regions where the evidence suggests the existence of a gene not found in the legacy set. It can also update/revise legacy
Reach us at info@study.space