December 2006, Vol. 101, No. 476, Theory and Methods | Yee Whye TEH, Michael I. JORDAN, Matthew J. BEAL, and David M. BLEI
The paper introduces the hierarchical Dirichlet process (HDP), a nonparametric Bayesian model designed to handle grouped data where each group contains observations drawn from a mixture model. The primary goal is to allow sharing of mixture components between groups, which is achieved by assuming that the number of mixture components is unknown and inferred from the data. The HDP is defined as a distribution over a set of random probability measures, with each group associated with a random measure. The global measure \( G_0 \) is distributed according to a Dirichlet process with concentration parameter \( \gamma \) and base measure \( H \), and the random measures \( G_j \) are conditionally independent given \( G_0 \), each following a Dirichlet process with base measure \( G_0 \). This setup ensures that the atoms of the global measure \( G_0 \) are shared among the groups, allowing for the sharing of mixture components.
The paper discusses three perspectives on the Dirichlet process: the stick-breaking construction, the Chinese restaurant process, and the limit of finite mixture models. It also presents Markov chain Monte Carlo (MCMC) algorithms for posterior inference in HDP mixtures, including a Gibbs sampler based on the Chinese restaurant franchise, an augmented representation involving both the Chinese restaurant franchise and the posterior for \( G_0 \), and a streamlined version of the second sampling scheme. The HDP is applied to problems in information retrieval and text modeling, demonstrating its effectiveness in sharing clusters across multiple related groups.The paper introduces the hierarchical Dirichlet process (HDP), a nonparametric Bayesian model designed to handle grouped data where each group contains observations drawn from a mixture model. The primary goal is to allow sharing of mixture components between groups, which is achieved by assuming that the number of mixture components is unknown and inferred from the data. The HDP is defined as a distribution over a set of random probability measures, with each group associated with a random measure. The global measure \( G_0 \) is distributed according to a Dirichlet process with concentration parameter \( \gamma \) and base measure \( H \), and the random measures \( G_j \) are conditionally independent given \( G_0 \), each following a Dirichlet process with base measure \( G_0 \). This setup ensures that the atoms of the global measure \( G_0 \) are shared among the groups, allowing for the sharing of mixture components.
The paper discusses three perspectives on the Dirichlet process: the stick-breaking construction, the Chinese restaurant process, and the limit of finite mixture models. It also presents Markov chain Monte Carlo (MCMC) algorithms for posterior inference in HDP mixtures, including a Gibbs sampler based on the Chinese restaurant franchise, an augmented representation involving both the Chinese restaurant franchise and the posterior for \( G_0 \), and a streamlined version of the second sampling scheme. The HDP is applied to problems in information retrieval and text modeling, demonstrating its effectiveness in sharing clusters across multiple related groups.