2016 | Katharina J. Hoff, Simone Lange, Alexandre Lomsadze, Mark Borodovsky, Mario Stanke
BRAKER1 is an unsupervised RNA-Seq-based genome annotation pipeline that combines the strengths of GeneMark-ET and AUGUSTUS for accurate gene prediction. It uses RNA-Seq data for unsupervised training and integrates RNA-Seq read information into final gene predictions. BRAKER1 requires a genome assembly file and a bam-format file with spliced RNA-Seq alignments. GeneMark-ET performs iterative training and generates initial gene structures, while AUGUSTUS uses these predicted genes for training and incorporates RNA-Seq data into final predictions. BRAKER1 outperforms MAKER2 in accuracy when using RNA-Seq as the sole training source. It does not require pre-trained parameters or expert training. BRAKER1 is available for download and is compared with MAKER2 and CodingQuarry on four model organisms. BRAKER1's accuracy is higher than MAKER2 and CodingQuarry, especially on S. pombe. The pipeline is fully automated, with a one-step process, and can be run on a single CPU in about 17.5 hours. BRAKER1 uses repeat masking to improve accuracy, but it does not significantly affect prediction accuracy. The accuracy of BRAKER1 is due to the use of GeneMark-ET for training and the incorporation of RNA-Seq hints into AUGUSTUS predictions. BRAKER1 is more accurate than MAKER2 and CodingQuarry when using RNA-Seq as the sole source of evidence. It is a fully automated pipeline that provides accurate gene predictions for eukaryotic genomes.BRAKER1 is an unsupervised RNA-Seq-based genome annotation pipeline that combines the strengths of GeneMark-ET and AUGUSTUS for accurate gene prediction. It uses RNA-Seq data for unsupervised training and integrates RNA-Seq read information into final gene predictions. BRAKER1 requires a genome assembly file and a bam-format file with spliced RNA-Seq alignments. GeneMark-ET performs iterative training and generates initial gene structures, while AUGUSTUS uses these predicted genes for training and incorporates RNA-Seq data into final predictions. BRAKER1 outperforms MAKER2 in accuracy when using RNA-Seq as the sole training source. It does not require pre-trained parameters or expert training. BRAKER1 is available for download and is compared with MAKER2 and CodingQuarry on four model organisms. BRAKER1's accuracy is higher than MAKER2 and CodingQuarry, especially on S. pombe. The pipeline is fully automated, with a one-step process, and can be run on a single CPU in about 17.5 hours. BRAKER1 uses repeat masking to improve accuracy, but it does not significantly affect prediction accuracy. The accuracy of BRAKER1 is due to the use of GeneMark-ET for training and the incorporation of RNA-Seq hints into AUGUSTUS predictions. BRAKER1 is more accurate than MAKER2 and CodingQuarry when using RNA-Seq as the sole source of evidence. It is a fully automated pipeline that provides accurate gene predictions for eukaryotic genomes.