The paper introduces the Mutual Information Neural Estimator (MINE), a method for estimating mutual information between high-dimensional continuous random variables using gradient descent over neural networks. MINE is linearly scalable in both dimensionality and sample size, and is trainable via back-propagation. It is strongly consistent and can be used to minimize or maximize mutual information. The method is applied to improve adversarially trained generative models and to implement the Information Bottleneck method in a continuous setting, showing improved performance.
Mutual information measures the dependence between random variables and is equivalent to the Kullback-Leibler (KL) divergence between the joint distribution and the product of the marginals. MINE leverages dual representations of the KL-divergence, such as the Donsker-Varadhan and f-divergence representations, to estimate mutual information. These representations allow for efficient estimation using neural networks and provide tighter bounds on the mutual information.
The MINE algorithm is implemented using a statistics network that is trained to maximize a lower bound on the mutual information. The algorithm involves drawing samples from the joint and marginal distributions, evaluating the lower bound, and updating the network parameters via gradient ascent. To address the bias from stochastic gradients, the algorithm uses an exponential moving average to correct the gradient estimates.
Theoretical analysis shows that MINE is strongly consistent and has good sample complexity properties. It can approximate mutual information with arbitrary accuracy and converges to the true mutual information as the number of samples increases. The method is evaluated on various tasks, including estimating mutual information between multivariate Gaussians and capturing non-linear dependencies. MINE outperforms non-parametric methods and is effective in improving mode coverage in generative adversarial networks (GANs) and in adversarial inference models.
Applications of MINE include improving GANs by maximizing mutual information to reduce mode collapse, enhancing inference in bi-directional adversarial models, and applying the Information Bottleneck method in a continuous setting. The results demonstrate that MINE provides significant improvements in flexibility and performance across these tasks.The paper introduces the Mutual Information Neural Estimator (MINE), a method for estimating mutual information between high-dimensional continuous random variables using gradient descent over neural networks. MINE is linearly scalable in both dimensionality and sample size, and is trainable via back-propagation. It is strongly consistent and can be used to minimize or maximize mutual information. The method is applied to improve adversarially trained generative models and to implement the Information Bottleneck method in a continuous setting, showing improved performance.
Mutual information measures the dependence between random variables and is equivalent to the Kullback-Leibler (KL) divergence between the joint distribution and the product of the marginals. MINE leverages dual representations of the KL-divergence, such as the Donsker-Varadhan and f-divergence representations, to estimate mutual information. These representations allow for efficient estimation using neural networks and provide tighter bounds on the mutual information.
The MINE algorithm is implemented using a statistics network that is trained to maximize a lower bound on the mutual information. The algorithm involves drawing samples from the joint and marginal distributions, evaluating the lower bound, and updating the network parameters via gradient ascent. To address the bias from stochastic gradients, the algorithm uses an exponential moving average to correct the gradient estimates.
Theoretical analysis shows that MINE is strongly consistent and has good sample complexity properties. It can approximate mutual information with arbitrary accuracy and converges to the true mutual information as the number of samples increases. The method is evaluated on various tasks, including estimating mutual information between multivariate Gaussians and capturing non-linear dependencies. MINE outperforms non-parametric methods and is effective in improving mode coverage in generative adversarial networks (GANs) and in adversarial inference models.
Applications of MINE include improving GANs by maximizing mutual information to reduce mode collapse, enhancing inference in bi-directional adversarial models, and applying the Information Bottleneck method in a continuous setting. The results demonstrate that MINE provides significant improvements in flexibility and performance across these tasks.