Rapid and sensitive detection of genome contamination at scale with FCS-GX

Rapid and sensitive detection of genome contamination at scale with FCS-GX

2024 | Alexander Astashyn†‡, Eric S. Tvedte†‡, Deacon Sweeney†, Victor Sapojnikov†, Nathan Bouk†, Victor Joukov†, Eyal Mozes†, Pooja K. Strope†, Pape M. Sylla†, Lukas Wagner†, Shelby L. Bidwell†‡, Larissa C. Brown†, Karen Clark†, Emily W. Davis†, Brian Smith-White†, Wratko Hlavina†, Kim D. Pruitt†, Valerie A. Schneider† and Terence D. Murphy†‡
FCS-GX is a new contamination detection and removal tool designed to identify and remove contaminant sequences in assembled genomes. It is part of the NCBI Foreign Contamination Screen (FCS) tool suite, optimized for high sensitivity and specificity. FCS-GX can screen most genomes in 0.1–10 minutes and has been tested on artificially fragmented genomes, demonstrating high sensitivity and specificity for diverse contaminant species. The tool was used to screen 1.6 million GenBank assemblies, identifying 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. The contamination was reduced to 0.01% of bases in the NCBI RefSeq collection. FCS-GX uses a large reference database of 709 Gbp from assemblies and common contaminants, and it assigns taxonomic labels to sequences based on alignment score information. The tool is available at <https://github.com/ncbi/fcs/> or <https://doi.org/10.5281/zenodo.10651084>. FCS-GX has higher sensitivity and specificity compared to other methods, and it can handle chimeric sequences and lateral gene transfer events without systematic confounding effects. The tool is recommended for use early in the assembly process to improve data quality and avoid artifacts that impact downstream analyses.FCS-GX is a new contamination detection and removal tool designed to identify and remove contaminant sequences in assembled genomes. It is part of the NCBI Foreign Contamination Screen (FCS) tool suite, optimized for high sensitivity and specificity. FCS-GX can screen most genomes in 0.1–10 minutes and has been tested on artificially fragmented genomes, demonstrating high sensitivity and specificity for diverse contaminant species. The tool was used to screen 1.6 million GenBank assemblies, identifying 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. The contamination was reduced to 0.01% of bases in the NCBI RefSeq collection. FCS-GX uses a large reference database of 709 Gbp from assemblies and common contaminants, and it assigns taxonomic labels to sequences based on alignment score information. The tool is available at <https://github.com/ncbi/fcs/> or <https://doi.org/10.5281/zenodo.10651084>. FCS-GX has higher sensitivity and specificity compared to other methods, and it can handle chimeric sequences and lateral gene transfer events without systematic confounding effects. The tool is recommended for use early in the assembly process to improve data quality and avoid artifacts that impact downstream analyses.
Reach us at info@study.space