Understanding BIGSdb%3A Scalable analysis of bacterial genome variation at the population level

BIGSdb is a scalable, open-source, web-accessible database system designed for the storage, retrieval, and analysis of bacterial genome variation at the population level. It enables the efficient linking of phenotype and sequence data from a single sequence read to whole genome data for an unlimited number of bacterial isolates. Built on the widely used mlstdbNet software, BIGSdb incorporates the ability to define and identify loci and genetic variants within stored nucleotide sequences. These loci can be organized into 'schemes' for isolate characterization or evolutionary/functional analyses. Isolates and loci can be indexed by multiple names, allowing cross-referencing of different studies. The system includes LIMS functionality for linking and organizing laboratory samples, and data can be easily linked to external databases. Fine-grained authentication allows multiple users to participate in community annotation by setting up or contributing to different schemes. Applications of BIGSdb are illustrated with the genera Neisseria and Streptococcus. The source code and documentation are available at http://pubmlst.org/software/database/bigsdb/. BIGSdb is written in Perl for UNIX/Linux systems, utilizing PostgreSQL and Apache web server software. It uses BIOPERL and EMBOSS for sequence handling, and client-side JavaScript for user interaction. Built-in authentication uses Perl/JavaScript MD5 secure user authentication. Sequence homology matching uses BLAST with configurable parameters. Global configuration settings are stored in a text file, while individual databases are configured with XML files. Sequence definition databases allow new allele sequences to be defined and made available online. Users can query sequences against known alleles or all loci, with results displaying matches, nucleotide differences, and sequence positions. BIGSdb allows users to customize the query interface, selecting fields and loci of interest. Data export includes isolate data, allele identifiers, and concatenated sequences in FASTA format. The software employs a plug-in architecture, enabling additional features and analysis packages to be added without modifying the core code. Authentication and access control allow three types of users: 'users' who can view data, 'curators' who can add and modify data, and 'admins' with full control. Isolate databases can be configured to be public or have restricted access to specific users or groups. BIGSdb facilitates the construction of definition databases and can handle loci defined by nucleotide sequences or translated peptide sequences. It allows the integration of detailed phenotypic information with isolate records, enabling the examination of enzyme sequence diversity and phenotype. The system can be used as a Laboratory Information Management System (LIMS) with an optional sample table for sample tracking. Demonstration 1 shows the use of BIGSdb with Neisseria MLST databases, converting them to use BIGSdb instead of the previously used MLSTDBNET software. The system was tested with over 17,000 isolates and 8,000 STs, demonstrating scalability and performance.BIGSdb is a scalable, open-source, web-accessible database system designed for the storage, retrieval, and analysis of bacterial genome variation at the population level. It enables the efficient linking of phenotype and sequence data from a single sequence read to whole genome data for an unlimited number of bacterial isolates. Built on the widely used mlstdbNet software, BIGSdb incorporates the ability to define and identify loci and genetic variants within stored nucleotide sequences. These loci can be organized into 'schemes' for isolate characterization or evolutionary/functional analyses. Isolates and loci can be indexed by multiple names, allowing cross-referencing of different studies. The system includes LIMS functionality for linking and organizing laboratory samples, and data can be easily linked to external databases. Fine-grained authentication allows multiple users to participate in community annotation by setting up or contributing to different schemes. Applications of BIGSdb are illustrated with the genera Neisseria and Streptococcus. The source code and documentation are available at http://pubmlst.org/software/database/bigsdb/. BIGSdb is written in Perl for UNIX/Linux systems, utilizing PostgreSQL and Apache web server software. It uses BIOPERL and EMBOSS for sequence handling, and client-side JavaScript for user interaction. Built-in authentication uses Perl/JavaScript MD5 secure user authentication. Sequence homology matching uses BLAST with configurable parameters. Global configuration settings are stored in a text file, while individual databases are configured with XML files. Sequence definition databases allow new allele sequences to be defined and made available online. Users can query sequences against known alleles or all loci, with results displaying matches, nucleotide differences, and sequence positions. BIGSdb allows users to customize the query interface, selecting fields and loci of interest. Data export includes isolate data, allele identifiers, and concatenated sequences in FASTA format. The software employs a plug-in architecture, enabling additional features and analysis packages to be added without modifying the core code. Authentication and access control allow three types of users: 'users' who can view data, 'curators' who can add and modify data, and 'admins' with full control. Isolate databases can be configured to be public or have restricted access to specific users or groups. BIGSdb facilitates the construction of definition databases and can handle loci defined by nucleotide sequences or translated peptide sequences. It allows the integration of detailed phenotypic information with isolate records, enabling the examination of enzyme sequence diversity and phenotype. The system can be used as a Laboratory Information Management System (LIMS) with an optional sample table for sample tracking. Demonstration 1 shows the use of BIGSdb with Neisseria MLST databases, converting them to use BIGSdb instead of the previously used MLSTDBNET software. The system was tested with over 17,000 isolates and 8,000 STs, demonstrating scalability and performance.

BIGSdb: Scalable analysis of bacterial genome variation at the population level

2010 | Keith A Jolley, Martin CJ Maiden