Optimizing Neural Networks with Kronecker-factored Approximate Curvature

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

8 Jun 2020 | James Martens* and Roger Grosse†
The paper introduces Kronecker-factored Approximate Curvature (K-FAC), an efficient method for approximating natural gradient descent in neural networks. K-FAC approximates the Fisher information matrix, which is neither diagonal nor low-rank, by representing large blocks of the Fisher as Kronecker products of smaller matrices. This approximation is derived through two stages: first, the Fisher is partitioned into blocks corresponding to entire layers, and then these blocks are approximated as Kronecker products. The inverse of this approximate Fisher is further approximated as either block-diagonal or block-tridiagonal, making it computationally efficient to invert. The method is shown to be much faster than stochastic gradient descent with momentum in practice, especially in highly stochastic optimization regimes. The paper also discusses the computational costs and provides a high-level pseudocode for K-FAC, along with a discussion of related methods and network transformations.The paper introduces Kronecker-factored Approximate Curvature (K-FAC), an efficient method for approximating natural gradient descent in neural networks. K-FAC approximates the Fisher information matrix, which is neither diagonal nor low-rank, by representing large blocks of the Fisher as Kronecker products of smaller matrices. This approximation is derived through two stages: first, the Fisher is partitioned into blocks corresponding to entire layers, and then these blocks are approximated as Kronecker products. The inverse of this approximate Fisher is further approximated as either block-diagonal or block-tridiagonal, making it computationally efficient to invert. The method is shown to be much faster than stochastic gradient descent with momentum in practice, especially in highly stochastic optimization regimes. The paper also discusses the computational costs and provides a high-level pseudocode for K-FAC, along with a discussion of related methods and network transformations.
Reach us at info@study.space
[slides] Optimizing Neural Networks with Kronecker-factored Approximate Curvature | StudySpace