[slides] Data clustering%3A 50 years beyond K-means

Data clustering is a fundamental method for organizing data into meaningful groups based on similarities. Unlike supervised learning, clustering does not rely on predefined category labels. It is widely used in various fields, including biology, psychology, and computer science. The concept of clustering dates back to 1954, and K-means is one of the most well-known clustering algorithms, independently developed by several researchers in the 1950s and 1960s. Clustering has become a crucial area in machine learning, data mining, and pattern recognition, with a vast body of literature. However, choosing the right clustering algorithm, distance metric, and number of clusters remains challenging. The key issue in clustering is defining a similarity measure that captures the structure of the data. Despite numerous clustering algorithms, users often face difficulties in selecting the most appropriate method. This talk will provide an overview of clustering, discuss major challenges, summarize well-known methods, and highlight emerging research directions such as semi-supervised clustering, ensemble clustering, learning distance metrics, and simultaneous feature selection and clustering.Data clustering is a fundamental method for organizing data into meaningful groups based on similarities. Unlike supervised learning, clustering does not rely on predefined category labels. It is widely used in various fields, including biology, psychology, and computer science. The concept of clustering dates back to 1954, and K-means is one of the most well-known clustering algorithms, independently developed by several researchers in the 1950s and 1960s. Clustering has become a crucial area in machine learning, data mining, and pattern recognition, with a vast body of literature. However, choosing the right clustering algorithm, distance metric, and number of clusters remains challenging. The key issue in clustering is defining a similarity measure that captures the structure of the data. Despite numerous clustering algorithms, users often face difficulties in selecting the most appropriate method. This talk will provide an overview of clustering, discuss major challenges, summarize well-known methods, and highlight emerging research directions such as semi-supervised clustering, ensemble clustering, learning distance metrics, and simultaneous feature selection and clustering.

Data Clustering: 50 Years Beyond K-means

2008 | Anil K. Jain