14 Dec 2019 | Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud
The paper introduces a new family of deep neural network models called Neural Ordinary Differential Equations (Neural ODEs). Instead of defining a discrete sequence of hidden layers, Neural ODEs parameterize the derivative of the hidden state using a neural network. The output is computed using a black-box differential equation solver, which allows for constant memory cost and adaptive evaluation strategies. The authors demonstrate the properties of these models in continuous-depth residual networks and continuous-time latent variable models. They also construct continuous normalizing flows, a generative model that can be trained by maximum likelihood without partitioning or ordering data dimensions. The main technical challenge is performing reverse-mode differentiation through the ODE solver, which they address using the adjoint sensitivity method. This method scales linearly with problem size, has low memory cost, and explicitly controls numerical error. The paper includes experiments showing the effectiveness of Neural ODEs in various applications, such as supervised learning, density estimation, and time-series modeling.The paper introduces a new family of deep neural network models called Neural Ordinary Differential Equations (Neural ODEs). Instead of defining a discrete sequence of hidden layers, Neural ODEs parameterize the derivative of the hidden state using a neural network. The output is computed using a black-box differential equation solver, which allows for constant memory cost and adaptive evaluation strategies. The authors demonstrate the properties of these models in continuous-depth residual networks and continuous-time latent variable models. They also construct continuous normalizing flows, a generative model that can be trained by maximum likelihood without partitioning or ordering data dimensions. The main technical challenge is performing reverse-mode differentiation through the ODE solver, which they address using the adjoint sensitivity method. This method scales linearly with problem size, has low memory cost, and explicitly controls numerical error. The paper includes experiments showing the effectiveness of Neural ODEs in various applications, such as supervised learning, density estimation, and time-series modeling.