The paper introduces the Gumbel-Softmax distribution, a continuous relaxation of categorical variables that allows for efficient gradient estimation through backpropagation. The Gumbel-Softmax distribution can smoothly annealed into a categorical distribution, making it suitable for training stochastic neural networks with discrete latent variables. The authors demonstrate that their Gumbel-Softmax estimator outperforms existing gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables. Additionally, the Gumbel-Softmax estimator enables significant speedups in semi-supervised classification by allowing direct backpropagation through the categorical latent variables without the need for marginalization. The paper also discusses the Straight-Through (ST) Gumbel Estimator, which discretizes samples during evaluation but uses the continuous approximation during training, and provides experimental results to support the effectiveness of the proposed methods.The paper introduces the Gumbel-Softmax distribution, a continuous relaxation of categorical variables that allows for efficient gradient estimation through backpropagation. The Gumbel-Softmax distribution can smoothly annealed into a categorical distribution, making it suitable for training stochastic neural networks with discrete latent variables. The authors demonstrate that their Gumbel-Softmax estimator outperforms existing gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables. Additionally, the Gumbel-Softmax estimator enables significant speedups in semi-supervised classification by allowing direct backpropagation through the categorical latent variables without the need for marginalization. The paper also discusses the Straight-Through (ST) Gumbel Estimator, which discretizes samples during evaluation but uses the continuous approximation during training, and provides experimental results to support the effectiveness of the proposed methods.