VFDB 2016: hierarchical and refined dataset for big data analysis—10 years on

VFDB 2016: hierarchical and refined dataset for big data analysis—10 years on

2016 | Lihong Chen, Dandan Zheng, Bo Liu, Jian Yang and Qi Jin
The Virulence Factor Database (VFDB) is a comprehensive resource for bacterial virulence factors (VFs). Since its inception in 2004, VFDB has provided up-to-date knowledge of VFs from various medically significant bacterial pathogens. Over the past decade, the database has expanded to include more pathogens and improved its infrastructure to support big data analysis. However, data redundancy and incomplete annotations have hindered its effectiveness for large-scale data mining. To address these issues, the VFDB has been updated with two hierarchical, non-redundant datasets: a core dataset containing only experimentally verified VFs and a full dataset including all known and predicted VFs. The core dataset has been refined with controlled vocabularies to improve gene annotation. These improvements enhance data quality and usability for bioinformatic analysis. The VFDB has also expanded to include six additional bacterial genera: Acinetobacter, Aeromonas, Anaplasma, Burkholderia, Coxiella, and Rickettsia. A new JavaScript-rich web interface has been developed to improve user experience and accessibility. The interface includes collapsible menus, expandable trees, and sortable tables, making it more intuitive and similar to desktop applications. Despite these improvements, the database still faces challenges in handling the vast amount of sequencing data generated by next-generation sequencing technologies. To address this, the strategy has been adjusted to focus on selected representative genomes rather than all complete genomes. This change allows for more efficient analysis of bacterial VFs. The VFDB continues to evolve to meet the demands of big data analysis in microbiology. The database aims to provide comprehensive and up-to-date knowledge of bacterial VFs to support research into bacterial pathogenesis and the development of new therapeutic strategies. The ongoing improvements in data curation and infrastructure are essential for advancing the field of bioinformatics and biomedical research.The Virulence Factor Database (VFDB) is a comprehensive resource for bacterial virulence factors (VFs). Since its inception in 2004, VFDB has provided up-to-date knowledge of VFs from various medically significant bacterial pathogens. Over the past decade, the database has expanded to include more pathogens and improved its infrastructure to support big data analysis. However, data redundancy and incomplete annotations have hindered its effectiveness for large-scale data mining. To address these issues, the VFDB has been updated with two hierarchical, non-redundant datasets: a core dataset containing only experimentally verified VFs and a full dataset including all known and predicted VFs. The core dataset has been refined with controlled vocabularies to improve gene annotation. These improvements enhance data quality and usability for bioinformatic analysis. The VFDB has also expanded to include six additional bacterial genera: Acinetobacter, Aeromonas, Anaplasma, Burkholderia, Coxiella, and Rickettsia. A new JavaScript-rich web interface has been developed to improve user experience and accessibility. The interface includes collapsible menus, expandable trees, and sortable tables, making it more intuitive and similar to desktop applications. Despite these improvements, the database still faces challenges in handling the vast amount of sequencing data generated by next-generation sequencing technologies. To address this, the strategy has been adjusted to focus on selected representative genomes rather than all complete genomes. This change allows for more efficient analysis of bacterial VFs. The VFDB continues to evolve to meet the demands of big data analysis in microbiology. The database aims to provide comprehensive and up-to-date knowledge of bacterial VFs to support research into bacterial pathogenesis and the development of new therapeutic strategies. The ongoing improvements in data curation and infrastructure are essential for advancing the field of bioinformatics and biomedical research.
Reach us at info@study.space