February 2, 2008 | Alexander Kraskov, Harald Stögbauer, and Peter Grassberger
The paper presents two improved estimators for mutual information ($M(X, Y)$) based on $k$-nearest neighbor distances, which are more data-efficient, adaptive, and have minimal bias compared to conventional binned estimators. These estimators are derived from entropy estimates and are exact for independent distributions. The authors compare their algorithms with existing methods and demonstrate their usefulness in assessing independence in independent component analysis (ICA), improving ICA, and estimating the reliability of blind source separation. They also provide estimators for redundancies between more than two random variables. The estimators are shown to be exact for independent variables and to have systematic errors that scale as $k/N$ for $N$ points. Numerical results for Gaussian distributions confirm the accuracy of the estimators, and the authors conjecture that the estimators are exact for independent variables. The paper includes detailed derivations, implementation details, and comparisons with previous algorithms, as well as applications to gene expression data and ICA.The paper presents two improved estimators for mutual information ($M(X, Y)$) based on $k$-nearest neighbor distances, which are more data-efficient, adaptive, and have minimal bias compared to conventional binned estimators. These estimators are derived from entropy estimates and are exact for independent distributions. The authors compare their algorithms with existing methods and demonstrate their usefulness in assessing independence in independent component analysis (ICA), improving ICA, and estimating the reliability of blind source separation. They also provide estimators for redundancies between more than two random variables. The estimators are shown to be exact for independent variables and to have systematic errors that scale as $k/N$ for $N$ points. Numerical results for Gaussian distributions confirm the accuracy of the estimators, and the authors conjecture that the estimators are exact for independent variables. The paper includes detailed derivations, implementation details, and comparisons with previous algorithms, as well as applications to gene expression data and ICA.