GeSeq – versatile and accurate annotation of organelle genomes

GeSeq – versatile and accurate annotation of organelle genomes

2017 | Michael Tillich¹, Pascal Lehwark², Tommaso Pellizzer¹, Elena S. Ulbricht-Jones¹, Axel Fischer¹, Ralph Bock¹ and Stephan Greiner¹.
GeSeq is a web-based tool for the rapid and accurate annotation of organellar genomes, particularly chloroplast genomes. It combines batch processing with customizable reference sequence selection from NCBI or user-uploaded sequences. It includes an integrated database of manually curated reference sequences for chloroplast genomes. GeSeq identifies genes using BLAT-based homology searches and profile HMM searches for protein and rRNA genes, along with de novo tRNA predictors. The main output is a GenBank file that requires minimal curation and is visualized by OGDRAW. Additional outputs support downstream analyses like comparative genomics and phylogenetics. GeSeq is developed for plants but can annotate mitochondrial genomes from non-green species like mammals. It handles NGS-derived contigs and can generate codon-based alignments for specific genes. The tool is highly customizable, allowing users to upload custom reference sequences. It supports various annotation methods, including BLAT, HMMER, and de novo tRNA prediction. GeSeq's annotation pipeline is based on BLAT-driven best-match approaches, complemented by profile HMM searches for protein and rRNA genes and de novo tRNA prediction. It uses BLAT for fast and accurate exon-intron border annotation. The tool generates two databases for each annotation job: a protein-coding (CDS) and a non-protein-coding (NA) database. It filters BLAT hits to avoid multiple annotations of the same gene or feature. GeSeq allows users to select or upload reference sequences, including NCBI RefSeq and custom sets. It supports BLAT hit filtering to retain the best matches for annotation. The tool also includes HMMER profile searches for protein and rRNA genes, and de novo tRNA prediction using tRNAscan-SE and ARAGORN. It can annotate inverted repeats (IRs) in chloroplast genomes. GeSeq generates GenBank files with annotations, including exon-intron positions and CDS translations. It provides additional outputs like multi-FASTA files and codon-based alignments for phylogenetic analysis. The tool is flexible, allowing users to adjust parameters for high-quality annotations. It is currently the fastest tool for chloroplast genome annotation, with a runtime of about 6 seconds per vascular chloroplast genome. GeSeq is a versatile and accurate tool for organellar genome annotation, offering high flexibility and customizable reference sequences. It supports various annotation methods and provides outputs for downstream analyses. The tool is recommended for high-quality chloroplast genome annotation, with options for manual curation of challenging genes. It is available as part of the CHLOROBOX toolbox and supports submission to NCBI/EMBL/DDBJ databases.GeSeq is a web-based tool for the rapid and accurate annotation of organellar genomes, particularly chloroplast genomes. It combines batch processing with customizable reference sequence selection from NCBI or user-uploaded sequences. It includes an integrated database of manually curated reference sequences for chloroplast genomes. GeSeq identifies genes using BLAT-based homology searches and profile HMM searches for protein and rRNA genes, along with de novo tRNA predictors. The main output is a GenBank file that requires minimal curation and is visualized by OGDRAW. Additional outputs support downstream analyses like comparative genomics and phylogenetics. GeSeq is developed for plants but can annotate mitochondrial genomes from non-green species like mammals. It handles NGS-derived contigs and can generate codon-based alignments for specific genes. The tool is highly customizable, allowing users to upload custom reference sequences. It supports various annotation methods, including BLAT, HMMER, and de novo tRNA prediction. GeSeq's annotation pipeline is based on BLAT-driven best-match approaches, complemented by profile HMM searches for protein and rRNA genes and de novo tRNA prediction. It uses BLAT for fast and accurate exon-intron border annotation. The tool generates two databases for each annotation job: a protein-coding (CDS) and a non-protein-coding (NA) database. It filters BLAT hits to avoid multiple annotations of the same gene or feature. GeSeq allows users to select or upload reference sequences, including NCBI RefSeq and custom sets. It supports BLAT hit filtering to retain the best matches for annotation. The tool also includes HMMER profile searches for protein and rRNA genes, and de novo tRNA prediction using tRNAscan-SE and ARAGORN. It can annotate inverted repeats (IRs) in chloroplast genomes. GeSeq generates GenBank files with annotations, including exon-intron positions and CDS translations. It provides additional outputs like multi-FASTA files and codon-based alignments for phylogenetic analysis. The tool is flexible, allowing users to adjust parameters for high-quality annotations. It is currently the fastest tool for chloroplast genome annotation, with a runtime of about 6 seconds per vascular chloroplast genome. GeSeq is a versatile and accurate tool for organellar genome annotation, offering high flexibility and customizable reference sequences. It supports various annotation methods and provides outputs for downstream analyses. The tool is recommended for high-quality chloroplast genome annotation, with options for manual curation of challenging genes. It is available as part of the CHLOROBOX toolbox and supports submission to NCBI/EMBL/DDBJ databases.
Reach us at info@study.space