This paper proposes a general-purpose variational inference algorithm called Stein Variational Gradient Descent (SVGD), which is a natural counterpart of gradient descent for Bayesian inference. The algorithm iteratively transports a set of particles to match the target distribution by minimizing the Kullback-Leibler (KL) divergence using a form of functional gradient descent. The method is based on a new theoretical result connecting the derivative of KL divergence under smooth transforms with Stein's identity and a kernelized Stein discrepancy. The algorithm uses a set of particles for approximation and performs a form of functional gradient descent to minimize the KL divergence and drive the particles to fit the true posterior distribution. The algorithm is simple and can be applied whenever gradient descent can be applied. It reduces to gradient descent for maximum a posteriori (MAP) when using only a single particle and automatically turns into a full Bayesian approach with more particles. The algorithm has a simple form and can be implemented easily even for non-experts in variational inference. The method is tested on various real-world models and datasets, where it is competitive with existing state-of-the-art methods. The algorithm is also shown to be efficient and scalable, with a simple form that mimics the typical gradient descent algorithm. The method is compared with other variational inference methods and is found to perform well on both toy and real-world examples. The algorithm is also shown to be effective in Bayesian logistic regression and Bayesian neural networks. The paper concludes that SVGD is a simple and effective general-purpose variational inference algorithm that can be applied to a wide range of Bayesian inference problems.This paper proposes a general-purpose variational inference algorithm called Stein Variational Gradient Descent (SVGD), which is a natural counterpart of gradient descent for Bayesian inference. The algorithm iteratively transports a set of particles to match the target distribution by minimizing the Kullback-Leibler (KL) divergence using a form of functional gradient descent. The method is based on a new theoretical result connecting the derivative of KL divergence under smooth transforms with Stein's identity and a kernelized Stein discrepancy. The algorithm uses a set of particles for approximation and performs a form of functional gradient descent to minimize the KL divergence and drive the particles to fit the true posterior distribution. The algorithm is simple and can be applied whenever gradient descent can be applied. It reduces to gradient descent for maximum a posteriori (MAP) when using only a single particle and automatically turns into a full Bayesian approach with more particles. The algorithm has a simple form and can be implemented easily even for non-experts in variational inference. The method is tested on various real-world models and datasets, where it is competitive with existing state-of-the-art methods. The algorithm is also shown to be efficient and scalable, with a simple form that mimics the typical gradient descent algorithm. The method is compared with other variational inference methods and is found to perform well on both toy and real-world examples. The algorithm is also shown to be effective in Bayesian logistic regression and Bayesian neural networks. The paper concludes that SVGD is a simple and effective general-purpose variational inference algorithm that can be applied to a wide range of Bayesian inference problems.