[slides and audio] DFAST%3A a flexible prokaryotic genome annotation pipeline for faster genome publication

DFAST is a flexible prokaryotic genome annotation pipeline that supports genome submission to public databases. It was originally developed as an online annotation server, processing over 7000 jobs since its 2016 launch. The new background annotation engine, DFAST-core, allows rapid annotation of bacterial genomes within 10 minutes, providing detailed information such as pseudogenes, translation exceptions, and orthologous gene assignments. The modular framework of DFAST enables easy customization of annotation workflows and future extensions. DFAST is implemented in Python 3 and runs on Macintosh and Linux systems. It is freely available under the GPLv3 license at https://github.com/nigyta/dfast_core/ and an online version at https://dfast.nig.ac.jp/. It supports both DDBJ and NCBI submissions, with the ability to generate INSDC submission files, GFF3, GenBank, and FASTA files. The workflow includes structural and functional annotation phases. Structural annotation predicts biological features like CDSs, RNAs, and CRISPRs, while functional annotation infers protein functions. The process involves orthologous assignment, homology search against reference databases, pseudogene detection, and profile HMM searches. DFAST uses GHOSTX for faster homology searches compared to BLASTP. DFAST outperforms other tools in pseudogene detection when close relatives are available. It is faster than Prokka, with a larger reference database. It performs well with well-characterized organisms but may have more uncharacterized genes for less-studied species. Additional references can improve results. DFAST is supported by JSPS KAKENHI grant 16H06279. It has no conflict of interest. References to supporting studies are provided.DFAST is a flexible prokaryotic genome annotation pipeline that supports genome submission to public databases. It was originally developed as an online annotation server, processing over 7000 jobs since its 2016 launch. The new background annotation engine, DFAST-core, allows rapid annotation of bacterial genomes within 10 minutes, providing detailed information such as pseudogenes, translation exceptions, and orthologous gene assignments. The modular framework of DFAST enables easy customization of annotation workflows and future extensions. DFAST is implemented in Python 3 and runs on Macintosh and Linux systems. It is freely available under the GPLv3 license at https://github.com/nigyta/dfast_core/ and an online version at https://dfast.nig.ac.jp/. It supports both DDBJ and NCBI submissions, with the ability to generate INSDC submission files, GFF3, GenBank, and FASTA files. The workflow includes structural and functional annotation phases. Structural annotation predicts biological features like CDSs, RNAs, and CRISPRs, while functional annotation infers protein functions. The process involves orthologous assignment, homology search against reference databases, pseudogene detection, and profile HMM searches. DFAST uses GHOSTX for faster homology searches compared to BLASTP. DFAST outperforms other tools in pseudogene detection when close relatives are available. It is faster than Prokka, with a larger reference database. It performs well with well-characterized organisms but may have more uncharacterized genes for less-studied species. Additional references can improve results. DFAST is supported by JSPS KAKENHI grant 16H06279. It has no conflict of interest. References to supporting studies are provided.

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication

2018 | Yasuhiro Tanizawa, Takatomo Fujisawa and Yasukazu Nakamura