11 November 2015 | Katharina J. Hoff, Simone Lange, Alexandre Lomsadze, Mark Borodovsky, Mario Stanke
The paper introduces BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the strengths of GeneMark-ET and AUGUSTUS. GeneMark-ET performs iterative training using RNA-Seq data to generate initial gene structures, while AUGUSTUS uses these predicted genes for training and integrates RNA-Seq read information into final gene predictions. BRAKER1 is designed to be more accurate than MAKER2 when using RNA-Seq as the sole source for training and prediction. The pipeline is implemented in Perl and requires an RNA-Seq alignment file in BAM format and a genome file in FASTQ format. It is evaluated on four model organisms (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Schizosaccharomyces pombe) and compared to MAKER2 and CodingQuarry. BRAKER1 shows an average accuracy improvement of 15% over MAKER2 at the gene level. The study also highlights the importance of repeat masking and the role of RNA-Seq information in improving prediction accuracy. BRAKER1 is available for download and can be run in fully automated mode, making it a convenient tool for genome annotation.The paper introduces BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the strengths of GeneMark-ET and AUGUSTUS. GeneMark-ET performs iterative training using RNA-Seq data to generate initial gene structures, while AUGUSTUS uses these predicted genes for training and integrates RNA-Seq read information into final gene predictions. BRAKER1 is designed to be more accurate than MAKER2 when using RNA-Seq as the sole source for training and prediction. The pipeline is implemented in Perl and requires an RNA-Seq alignment file in BAM format and a genome file in FASTQ format. It is evaluated on four model organisms (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Schizosaccharomyces pombe) and compared to MAKER2 and CodingQuarry. BRAKER1 shows an average accuracy improvement of 15% over MAKER2 at the gene level. The study also highlights the importance of repeat masking and the role of RNA-Seq information in improving prediction accuracy. BRAKER1 is available for download and can be run in fully automated mode, making it a convenient tool for genome annotation.