A Kernel Method for the Two-Sample Problem

A Kernel Method for the Two-Sample Problem

04/08 | Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, Alexander Smola
The paper proposes a framework for analyzing and comparing distributions using statistical tests to determine if two samples are drawn from different distributions. The test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). Three tests are presented: two based on large deviation bounds for the test statistic and one based on the asymptotic distribution of the statistic. The MMD (Maximum Mean Discrepancy) is defined and shown to be a metric on probability distributions when the function space is a universal RKHS. The paper reviews classical metrics like the Kolmogorov-Smirnov and Earth-Mover's distances, and discusses the computational cost of the MMD. Efficient linear-time approximations are also discussed. The MMD is applied to various problems, including attribute matching for databases and comparing distributions over graphs, demonstrating strong performance in both areas.The paper proposes a framework for analyzing and comparing distributions using statistical tests to determine if two samples are drawn from different distributions. The test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). Three tests are presented: two based on large deviation bounds for the test statistic and one based on the asymptotic distribution of the statistic. The MMD (Maximum Mean Discrepancy) is defined and shown to be a metric on probability distributions when the function space is a universal RKHS. The paper reviews classical metrics like the Kolmogorov-Smirnov and Earth-Mover's distances, and discusses the computational cost of the MMD. Efficient linear-time approximations are also discussed. The MMD is applied to various problems, including attribute matching for databases and comparing distributions over graphs, demonstrating strong performance in both areas.
Reach us at info@study.space