Understanding OrthoMCL-DB%3A querying a comprehensive multi-species collection of ortholog groups

OrthoMCL-DB is a comprehensive database for querying ortholog groups across 55 species, including 16 bacterial, 4 archaeal, and most eukaryotic genomes. The database uses the OrthoMCL algorithm to cluster proteins based on sequence similarity, involving all-against-all BLAST searches, normalization of inter-species differences, and Markov clustering. It clusters 511,797 proteins (81.6% of the total dataset) into 70,388 ortholog groups. The database allows querying by protein or group accession numbers, keywords, or BLAST similarity. It also enables identification of ortholog groups with specific phyletic patterns using a graphical interface or text-based Phyletic Pattern Expression grammar. Each ortholog group includes phyletic profiles, member proteins, multiple sequence alignments, statistical summaries, and domain architecture visualizations. The database is updated as new genome data becomes available. OrthoMCL is a graph-clustering algorithm that identifies homologous proteins and distinguishes orthologs from paralogs without phylogenetic analysis. It uses BLAST to identify reciprocal best hits and in-paralogs, then applies Markov clustering to group proteins. The algorithm is automated and applicable to multiple species datasets, bypassing manual curation. It has been validated against other ortholog identification methods and shows higher functional consistency. OrthoMCL-DB provides a web interface for querying ortholog groups, including BLAST-based similarity searches. It allows users to search by protein accession numbers, keywords, or phyletic patterns. The database includes a variety of tools for analyzing ortholog groups, such as Pfam domain architecture, BioLayout graphs, and multiple sequence alignments. It also provides a species tree based on shared ortholog groups, reflecting current understanding of organismal evolution. The database includes a wide range of data, including protein sequences, clustering results, and summary statistics. It is available for download in FASTA and SQL formats. The database is expected to be updated regularly as new genome data becomes available. OrthoMCL-DB provides a centralized resource for ortholog prediction among multiple species and is a valuable tool for evolutionary and functional genomics research.OrthoMCL-DB is a comprehensive database for querying ortholog groups across 55 species, including 16 bacterial, 4 archaeal, and most eukaryotic genomes. The database uses the OrthoMCL algorithm to cluster proteins based on sequence similarity, involving all-against-all BLAST searches, normalization of inter-species differences, and Markov clustering. It clusters 511,797 proteins (81.6% of the total dataset) into 70,388 ortholog groups. The database allows querying by protein or group accession numbers, keywords, or BLAST similarity. It also enables identification of ortholog groups with specific phyletic patterns using a graphical interface or text-based Phyletic Pattern Expression grammar. Each ortholog group includes phyletic profiles, member proteins, multiple sequence alignments, statistical summaries, and domain architecture visualizations. The database is updated as new genome data becomes available. OrthoMCL is a graph-clustering algorithm that identifies homologous proteins and distinguishes orthologs from paralogs without phylogenetic analysis. It uses BLAST to identify reciprocal best hits and in-paralogs, then applies Markov clustering to group proteins. The algorithm is automated and applicable to multiple species datasets, bypassing manual curation. It has been validated against other ortholog identification methods and shows higher functional consistency. OrthoMCL-DB provides a web interface for querying ortholog groups, including BLAST-based similarity searches. It allows users to search by protein accession numbers, keywords, or phyletic patterns. The database includes a variety of tools for analyzing ortholog groups, such as Pfam domain architecture, BioLayout graphs, and multiple sequence alignments. It also provides a species tree based on shared ortholog groups, reflecting current understanding of organismal evolution. The database includes a wide range of data, including protein sequences, clustering results, and summary statistics. It is available for download in FASTA and SQL formats. The database is expected to be updated regularly as new genome data becomes available. OrthoMCL-DB provides a centralized resource for ortholog prediction among multiple species and is a valuable tool for evolutionary and functional genomics research.

OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups

2006 | Feng Chen, Aaron J. Mackey, Christian J. Stoeckert Jr and David S. Roos