Hierarchical Information Clustering by Means of Topologically Embedded Graphs

Hierarchical Information Clustering by Means of Topologically Embedded Graphs

March 9, 2012 | Won-Min Song, T. Di Matteo, Tomaso Aste
This paper introduces a graph-theoretic method for unsupervised and deterministic clustering and hierarchy extraction from complex datasets without prior information. The method constructs topologically embedded networks with significant links and analyzes their structure to identify both intra-cluster and inter-cluster hierarchies. It is tested on artificial and real datasets, showing superior performance compared to established methods. The approach is applied to gene expression data from lymphoma samples, revealing biologically significant gene groups related to diagnosis, prognosis, and treatment. The method, called DBHT, uses topologically embedded graphs (PMFGs) to identify clusters and hierarchies. It builds a bubble tree from separating 3-cliques, which naturally forms a hierarchy. The bubble tree is then used to determine cluster memberships and hierarchies at three levels: intra-bubble, intra-cluster, and inter-cluster. The method is computationally efficient, with complexity less than O(|V|³), and is robust to noise and varying data structures. The DBHT technique is tested on synthetic data with known clustering structures and real gene expression data. It outperforms other clustering methods like k-means++, Spectral clustering, SOM, and Q-cut in terms of accuracy and robustness. In gene expression data from lymphoma samples, DBHT identifies meaningful gene clusters associated with different clinical outcomes. For example, it detects gene clusters related to cell cycle regulation, tumor suppression, and survival rates, providing insights into the biological mechanisms underlying lymphoma subtypes. The method is also applied to a benchmark gene expression dataset, showing superior performance in clustering and hierarchy detection. It successfully identifies meaningful gene clusters that reflect biological significance, such as those involved in proliferation, apoptosis, and survival. The DBHT technique provides a novel approach to clustering and hierarchy detection in complex datasets, with potential applications in bioinformatics and other fields.This paper introduces a graph-theoretic method for unsupervised and deterministic clustering and hierarchy extraction from complex datasets without prior information. The method constructs topologically embedded networks with significant links and analyzes their structure to identify both intra-cluster and inter-cluster hierarchies. It is tested on artificial and real datasets, showing superior performance compared to established methods. The approach is applied to gene expression data from lymphoma samples, revealing biologically significant gene groups related to diagnosis, prognosis, and treatment. The method, called DBHT, uses topologically embedded graphs (PMFGs) to identify clusters and hierarchies. It builds a bubble tree from separating 3-cliques, which naturally forms a hierarchy. The bubble tree is then used to determine cluster memberships and hierarchies at three levels: intra-bubble, intra-cluster, and inter-cluster. The method is computationally efficient, with complexity less than O(|V|³), and is robust to noise and varying data structures. The DBHT technique is tested on synthetic data with known clustering structures and real gene expression data. It outperforms other clustering methods like k-means++, Spectral clustering, SOM, and Q-cut in terms of accuracy and robustness. In gene expression data from lymphoma samples, DBHT identifies meaningful gene clusters associated with different clinical outcomes. For example, it detects gene clusters related to cell cycle regulation, tumor suppression, and survival rates, providing insights into the biological mechanisms underlying lymphoma subtypes. The method is also applied to a benchmark gene expression dataset, showing superior performance in clustering and hierarchy detection. It successfully identifies meaningful gene clusters that reflect biological significance, such as those involved in proliferation, apoptosis, and survival. The DBHT technique provides a novel approach to clustering and hierarchy detection in complex datasets, with potential applications in bioinformatics and other fields.
Reach us at info@study.space