[slides and audio] Unsupervised Learning by Probabilistic Latent Semantic Analysis

This paper introduces a novel statistical method for factor analysis of binary and count data, called Probabilistic Latent Semantic Analysis (PLSA). Unlike Latent Semantic Analysis (LSA), which uses Singular Value Decomposition (SVD) and is rooted in linear algebra, PLSA employs a generative latent class model to perform a probabilistic mixture decomposition. This approach provides a more principled and statistically sound foundation. The paper proposes using a temperature-controlled version of the Expectation Maximization (EM) algorithm for model fitting, which has shown excellent performance in practice. PLSA is particularly useful for applications such as information retrieval, natural language processing, and machine learning from text. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. Experiments indicate that PLSA consistently outperforms standard LSA in terms of predictive performance.This paper introduces a novel statistical method for factor analysis of binary and count data, called Probabilistic Latent Semantic Analysis (PLSA). Unlike Latent Semantic Analysis (LSA), which uses Singular Value Decomposition (SVD) and is rooted in linear algebra, PLSA employs a generative latent class model to perform a probabilistic mixture decomposition. This approach provides a more principled and statistically sound foundation. The paper proposes using a temperature-controlled version of the Expectation Maximization (EM) algorithm for model fitting, which has shown excellent performance in practice. PLSA is particularly useful for applications such as information retrieval, natural language processing, and machine learning from text. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. Experiments indicate that PLSA consistently outperforms standard LSA in terms of predictive performance.

Unsupervised Learning by Probabilistic Latent Semantic Analysis

2001 | THOMAS HOFMANN