THE CONCRETE DISTRIBUTION: A CONTINUOUS RELAXATION OF DISCRETE RANDOM VARIABLES

THE CONCRETE DISTRIBUTION: A CONTINUOUS RELAXATION OF DISCRETE RANDOM VARIABLES

5 Mar 2017 | Chris J. Maddison, Andriy Mnih, & Yee Whye Teh
The paper introduces the Concrete distribution, a continuous relaxation of discrete random variables, enabling gradient-based optimization of stochastic computation graphs with discrete nodes. The Concrete distribution allows gradients to flow through discrete states by treating them as continuous variables, enabling efficient optimization using automatic differentiation. This is achieved by relaxing discrete states into continuous probability vectors, which can be sampled using a softmax function with added noise. The Concrete distribution has a closed-form density and can be used to approximate discrete distributions, with the zero-temperature limit corresponding to the discrete distribution. The paper demonstrates the effectiveness of Concrete relaxations in density estimation and structured prediction tasks, showing that they can outperform or match state-of-the-art methods like VIMCO and NVIL. The Concrete distribution is implemented in a way that is compatible with automatic differentiation libraries, allowing for efficient training of models with discrete stochastic nodes. The paper also discusses the importance of temperature settings in Concrete relaxations, noting that lower temperatures can lead to more accurate approximations of discrete distributions. Overall, the Concrete distribution provides a flexible and effective method for optimizing stochastic computation graphs with discrete nodes.The paper introduces the Concrete distribution, a continuous relaxation of discrete random variables, enabling gradient-based optimization of stochastic computation graphs with discrete nodes. The Concrete distribution allows gradients to flow through discrete states by treating them as continuous variables, enabling efficient optimization using automatic differentiation. This is achieved by relaxing discrete states into continuous probability vectors, which can be sampled using a softmax function with added noise. The Concrete distribution has a closed-form density and can be used to approximate discrete distributions, with the zero-temperature limit corresponding to the discrete distribution. The paper demonstrates the effectiveness of Concrete relaxations in density estimation and structured prediction tasks, showing that they can outperform or match state-of-the-art methods like VIMCO and NVIL. The Concrete distribution is implemented in a way that is compatible with automatic differentiation libraries, allowing for efficient training of models with discrete stochastic nodes. The paper also discusses the importance of temperature settings in Concrete relaxations, noting that lower temperatures can lead to more accurate approximations of discrete distributions. Overall, the Concrete distribution provides a flexible and effective method for optimizing stochastic computation graphs with discrete nodes.
Reach us at info@study.space