clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters

clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters

2012 | Guangchuang Yu, Li-Gen Wang, Yanyan Han, and Qing-Yu He
The R package clusterProfiler is introduced for comparing biological themes among gene clusters. It automates biological-term classification and enrichment analysis of gene clusters, combining analysis and visualization modules into a reusable workflow. The package supports three species: humans, mice, and yeast, and can be extended to other species and ontologies. It is released under the Artistic-2.0 License within Bioconductor and is freely available online. ClusterProfiler offers methods for gene classification and enrichment analysis, including groupGO, enrichGO, and enrichKEGG. It provides a function, compareCluster, to automatically calculate enriched functional categories of gene clusters and offers visualization methods. The comparison function is designed for any biological or biomedical ontology, including GO, KEGG, and Disease Ontology (DO). It can also compare gene-disease associations. The package is implemented in R and depends on Bioconductor annotation data. It uses GO.db and KEGG.db to obtain maps of the entire GO and KEGG corpus. It supports genome-wide annotation of mapping Entrez gene identifiers or ORF identifiers for humans, mice, and yeast. In a study, clusterProfiler was used to analyze breast tumor expression data, identifying differentially expressed genes and gene clusters. It compared these clusters based on enriched biological processes, highlighting clusters related to cellular component organization, developmental process, and cell cycle. ClusterProfiler is a user-friendly tool for biologists analyzing high-throughput data from transcriptomics or proteomics. It can be extended to support new organisms and integrated into data analysis pipelines. Future improvements include using semantic similarity among KEGG pathways and GO terms, ranking gene similarities within clusters, and developing a statistical model based on directed acyclic graphs for comparing functional profiles. The work was supported by various Chinese funding sources. The authors declare no conflicting financial interests.The R package clusterProfiler is introduced for comparing biological themes among gene clusters. It automates biological-term classification and enrichment analysis of gene clusters, combining analysis and visualization modules into a reusable workflow. The package supports three species: humans, mice, and yeast, and can be extended to other species and ontologies. It is released under the Artistic-2.0 License within Bioconductor and is freely available online. ClusterProfiler offers methods for gene classification and enrichment analysis, including groupGO, enrichGO, and enrichKEGG. It provides a function, compareCluster, to automatically calculate enriched functional categories of gene clusters and offers visualization methods. The comparison function is designed for any biological or biomedical ontology, including GO, KEGG, and Disease Ontology (DO). It can also compare gene-disease associations. The package is implemented in R and depends on Bioconductor annotation data. It uses GO.db and KEGG.db to obtain maps of the entire GO and KEGG corpus. It supports genome-wide annotation of mapping Entrez gene identifiers or ORF identifiers for humans, mice, and yeast. In a study, clusterProfiler was used to analyze breast tumor expression data, identifying differentially expressed genes and gene clusters. It compared these clusters based on enriched biological processes, highlighting clusters related to cellular component organization, developmental process, and cell cycle. ClusterProfiler is a user-friendly tool for biologists analyzing high-throughput data from transcriptomics or proteomics. It can be extended to support new organisms and integrated into data analysis pipelines. Future improvements include using semantic similarity among KEGG pathways and GO terms, ranking gene similarities within clusters, and developing a statistical model based on directed acyclic graphs for comparing functional profiles. The work was supported by various Chinese funding sources. The authors declare no conflicting financial interests.
Reach us at info@study.space