2018 | James Hadfield, Colin Megill, Sidney M. Bell, John Huddleston, Barney Potter, Charlton Callender, Pavel Sagulenko, Trevor Bedford, Richard A. Neher
Nextstrain is a real-time tracking system for pathogen evolution, combining a database of viral genomes, a bioinformatics pipeline for phylodynamics analysis, and an interactive visualization platform. It provides a real-time view into the evolution and spread of high public health importance viral pathogens. The visualization integrates sequence data with other data types such as geographic information, serology, or host species. Nextstrain compiles current understanding into a single accessible location, open to health professionals, epidemiologists, virologists, and the public.
Nextstrain consists of data curation, analysis, and visualization components. Python scripts maintain a database of available sequences and related metadata, sourced from public repositories such as NCBI, GISAID, and ViPR, as well as GitHub repositories and other sources of genomic data. A suite of tools perform phylodynamic analysis, including subsampling, alignment, phylogenetic inference, temporal dating of ancestral nodes, and discrete trait geographic reconstruction. This leverages the maximum likelihood analyses implemented in TreeTime, allowing a full analysis of the entire Ebola epidemic in under 2 hours on a modern laptop.
Nextstrain is designed to be adaptable to different pathogens. Visualization is available through nextstrain.org. This approach is similar to Nextflu but extended and generalized to different viral pathogens. There is a growing need for surveillance of non-influenza viruses, and Nextstrain is able to be extended to most outbreaks with readily accessible genomic data. However, potential for recombination or low mutation rate may confound phylogenetic signal.
Nextstrain tracks and reconstructs mutations across the tree and displays this information as a bar chart of entropy at each position in the genome. This allows interrogation of genetic change which may be adaptive or underlying a change in disease dynamics. For many pathogens, the emergence and spread of gain-of-function variants is a grave concern. For example, China has experienced seasonal epidemics of influenza A/H7N9. Despite no known human-to-human transmission events, the high mortality rate makes the threat of mutations which facilitate human-to-human transmission of extreme concern.
Nextstrain presents a single, continuously updated overview of both endemic viral disease and emergent viral outbreaks, based upon the same underlying bioinformatics architecture. This architecture is well positioned to respond to future outbreaks, be they viral or bacterial. Analysis of such outbreaks relies on public sharing of data, and Nextstrain has the ability to automatically update as new sequences from a range of public databases and repositories appear. Scientists are justifiably hesitant to cede control of their data, and we try to address these concerns by preventing access to the raw genome sequences, and by clearly indicating the source of each sequence. Derived data, such as phylogenetic trees, metadata and screenshots are available, and one can append private metadata via CSV files. We believe this strikes a compromise between keeping certain data private and allowing the dissemination of results important to the wider scientific community, thereby encouraging collaboration between scientists. Genomic epidemiologyNextstrain is a real-time tracking system for pathogen evolution, combining a database of viral genomes, a bioinformatics pipeline for phylodynamics analysis, and an interactive visualization platform. It provides a real-time view into the evolution and spread of high public health importance viral pathogens. The visualization integrates sequence data with other data types such as geographic information, serology, or host species. Nextstrain compiles current understanding into a single accessible location, open to health professionals, epidemiologists, virologists, and the public.
Nextstrain consists of data curation, analysis, and visualization components. Python scripts maintain a database of available sequences and related metadata, sourced from public repositories such as NCBI, GISAID, and ViPR, as well as GitHub repositories and other sources of genomic data. A suite of tools perform phylodynamic analysis, including subsampling, alignment, phylogenetic inference, temporal dating of ancestral nodes, and discrete trait geographic reconstruction. This leverages the maximum likelihood analyses implemented in TreeTime, allowing a full analysis of the entire Ebola epidemic in under 2 hours on a modern laptop.
Nextstrain is designed to be adaptable to different pathogens. Visualization is available through nextstrain.org. This approach is similar to Nextflu but extended and generalized to different viral pathogens. There is a growing need for surveillance of non-influenza viruses, and Nextstrain is able to be extended to most outbreaks with readily accessible genomic data. However, potential for recombination or low mutation rate may confound phylogenetic signal.
Nextstrain tracks and reconstructs mutations across the tree and displays this information as a bar chart of entropy at each position in the genome. This allows interrogation of genetic change which may be adaptive or underlying a change in disease dynamics. For many pathogens, the emergence and spread of gain-of-function variants is a grave concern. For example, China has experienced seasonal epidemics of influenza A/H7N9. Despite no known human-to-human transmission events, the high mortality rate makes the threat of mutations which facilitate human-to-human transmission of extreme concern.
Nextstrain presents a single, continuously updated overview of both endemic viral disease and emergent viral outbreaks, based upon the same underlying bioinformatics architecture. This architecture is well positioned to respond to future outbreaks, be they viral or bacterial. Analysis of such outbreaks relies on public sharing of data, and Nextstrain has the ability to automatically update as new sequences from a range of public databases and repositories appear. Scientists are justifiably hesitant to cede control of their data, and we try to address these concerns by preventing access to the raw genome sequences, and by clearly indicating the source of each sequence. Derived data, such as phylogenetic trees, metadata and screenshots are available, and one can append private metadata via CSV files. We believe this strikes a compromise between keeping certain data private and allowing the dissemination of results important to the wider scientific community, thereby encouraging collaboration between scientists. Genomic epidemiology