5 Mar 2017 | Chris J. Maddison, Andriy Mnih, & Yee Whye Teh
The paper introduces the Concrete distribution, a continuous relaxation of discrete random variables. The Concrete distribution is designed to enable the use of automatic differentiation (AD) in stochastic computation graphs (SCGs) with discrete stochastic nodes. By relaxing discrete states into continuous probability vectors, the Concrete distribution allows gradients to flow through the discrete states, making it possible to optimize parameters via gradient descent. The distribution has a closed-form density and can be sampled using a softmax operation with fixed additive noise. The authors demonstrate the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks. They compare the performance of Concrete relaxations to state-of-the-art methods and find that they are competitive, sometimes outperforming and sometimes underperforming. The key challenge in using Concrete relaxations is choosing an appropriate temperature during training to avoid modes in the interior of the simplex. The paper also discusses related work and provides details on the implementation and use of Concrete random variables.The paper introduces the Concrete distribution, a continuous relaxation of discrete random variables. The Concrete distribution is designed to enable the use of automatic differentiation (AD) in stochastic computation graphs (SCGs) with discrete stochastic nodes. By relaxing discrete states into continuous probability vectors, the Concrete distribution allows gradients to flow through the discrete states, making it possible to optimize parameters via gradient descent. The distribution has a closed-form density and can be sampled using a softmax operation with fixed additive noise. The authors demonstrate the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks. They compare the performance of Concrete relaxations to state-of-the-art methods and find that they are competitive, sometimes outperforming and sometimes underperforming. The key challenge in using Concrete relaxations is choosing an appropriate temperature during training to avoid modes in the interior of the simplex. The paper also discusses related work and provides details on the implementation and use of Concrete random variables.