ViralFlow v1.0—a computational workflow for streamlining viral genomic surveillance

ViralFlow v1.0—a computational workflow for streamlining viral genomic surveillance

2024 | Alexandre Freitas da Silva, Antonio Marinho da Silva Neto, Cleber Furtado Aksenen, Pedro Miguel Carneiro Jeronimo, Filipe Zimmer Dezordi, Suzana Porto Almeida, Hudson Marques Paula Costa, Richard Steiner Salvato, Tulio de Lima Campos, Gabriel da Luz Wallau
ViralFlow v1.0 is a computational workflow designed for viral genomic surveillance. It is a general-purpose reference-based genome assembler for all viruses with available reference genomes. The workflow includes new virus-agnostic modules for studying nucleotide and amino acid mutations. ViralFlow v1.0 runs on various computational infrastructures, from laptops to high-performance computing environments, and generates standard, well-formatted outputs suitable for public health reporting and scientific analysis. It is available at https://viralflow.github.io/index-en.html. The workflow processes raw reads with a minimum Phred score of 20 for quality control and uses a minimum length of 75 bases for read trimming. It employs fastp v0.23.4 for adapter removal and deduplication, and samtools v1.11 for primer region removal. Reads are mapped to a reference genome using BWA v0.7.17, and consensus genomes are built using samtools and iVar. Coverage plots and mutation visualization are generated using BAMdash and snipit tools. ViralFlow v1.0 includes new modules for mutation annotation and prediction, as well as for visualization and genomic mapping. It uses a package manager based on Micromamba for speed and robustness, and standardizes containers for fast configuration and reproducibility. The workflow is implemented in NextFlow, providing better management, efficient parallelism, and continuous process checkpointing, which improves reproducibility and allows easy implementation of new features. ViralFlow v1.0 was tested using simulated and real datasets for SARS-CoV-2, monkeypox virus, Dengue virus, and Zika virus. It demonstrated consistent performance across different datasets and showed higher lineage concordance compared to its predecessor. The workflow is scalable and can be implemented in low-resource settings, with efficient memory and CPU usage. It also offers improved modularity, transparency, and usability, making it suitable for both advanced users and developers. ViralFlow v1.0 is available on GitHub and Figshare, with data used in the manuscript simulated or generated from original samples. The workflow is a versatile tool for viral genomic surveillance, offering a range of features for analyzing viral genomes, mutations, and lineage assignments. Future developments include incorporating heuristics for reference genome selection and analysis of long reads.ViralFlow v1.0 is a computational workflow designed for viral genomic surveillance. It is a general-purpose reference-based genome assembler for all viruses with available reference genomes. The workflow includes new virus-agnostic modules for studying nucleotide and amino acid mutations. ViralFlow v1.0 runs on various computational infrastructures, from laptops to high-performance computing environments, and generates standard, well-formatted outputs suitable for public health reporting and scientific analysis. It is available at https://viralflow.github.io/index-en.html. The workflow processes raw reads with a minimum Phred score of 20 for quality control and uses a minimum length of 75 bases for read trimming. It employs fastp v0.23.4 for adapter removal and deduplication, and samtools v1.11 for primer region removal. Reads are mapped to a reference genome using BWA v0.7.17, and consensus genomes are built using samtools and iVar. Coverage plots and mutation visualization are generated using BAMdash and snipit tools. ViralFlow v1.0 includes new modules for mutation annotation and prediction, as well as for visualization and genomic mapping. It uses a package manager based on Micromamba for speed and robustness, and standardizes containers for fast configuration and reproducibility. The workflow is implemented in NextFlow, providing better management, efficient parallelism, and continuous process checkpointing, which improves reproducibility and allows easy implementation of new features. ViralFlow v1.0 was tested using simulated and real datasets for SARS-CoV-2, monkeypox virus, Dengue virus, and Zika virus. It demonstrated consistent performance across different datasets and showed higher lineage concordance compared to its predecessor. The workflow is scalable and can be implemented in low-resource settings, with efficient memory and CPU usage. It also offers improved modularity, transparency, and usability, making it suitable for both advanced users and developers. ViralFlow v1.0 is available on GitHub and Figshare, with data used in the manuscript simulated or generated from original samples. The workflow is a versatile tool for viral genomic surveillance, offering a range of features for analyzing viral genomes, mutations, and lineage assignments. Future developments include incorporating heuristics for reference genome selection and analysis of long reads.
Reach us at info@study.space
Understanding ViralFlow v1.0%E2%80%94a computational workflow for streamlining viral genomic surveillance