Deep learning via Hessian-free optimization

Deep learning via Hessian-free optimization

2010 | James Martens
This paper presents a second-order optimization method based on the "Hessian-free" approach for training deep auto-encoders. The method, called Hessian-free optimization, avoids the need for pre-training and achieves results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks. The method is practical, scalable, and not limited to auto-encoders or specific model classes. It addresses the issue of "pathological curvature" as a possible explanation for the difficulty of deep learning and shows how second-order optimization, particularly Hessian-free, effectively deals with it. The paper discusses the challenges of training deep networks using gradient descent, which often progresses slowly and gets stuck in local minima. Second-order optimization methods, which model local curvature, are more effective in such scenarios. The paper also explores the use of Hessian-free optimization, which approximates the Hessian matrix using finite differences and linear conjugate gradient (CG) methods. This approach avoids the computational cost of explicitly computing the Hessian and is suitable for large datasets. The paper presents an algorithm for Hessian-free optimization, which uses CG to minimize a quadratic approximation of the objective function. The method is modified to be practical for machine learning tasks, including damping, efficient computation of matrix-vector products, handling large datasets, and termination conditions for CG. The algorithm is shown to be effective in training deep auto-encoders, outperforming pre-training + fine-tuning approaches. The paper also discusses the importance of curvature in optimization and how Hessian-free optimization addresses the issue of pathological curvature. It shows that the method is more effective than first-order optimization methods in solving under-fitting problems and that pre-training can help in speeding up optimization and improving generalization. The results indicate that deep learning can be achieved effectively and efficiently without pre-training, opening the door to exploring a wide range of deep or difficult-to-optimize architectures. The paper concludes with a discussion of the implications of the results and the need for further research to address the many interesting questions that arise.This paper presents a second-order optimization method based on the "Hessian-free" approach for training deep auto-encoders. The method, called Hessian-free optimization, avoids the need for pre-training and achieves results superior to those reported by Hinton & Salakhutdinov (2006) on the same tasks. The method is practical, scalable, and not limited to auto-encoders or specific model classes. It addresses the issue of "pathological curvature" as a possible explanation for the difficulty of deep learning and shows how second-order optimization, particularly Hessian-free, effectively deals with it. The paper discusses the challenges of training deep networks using gradient descent, which often progresses slowly and gets stuck in local minima. Second-order optimization methods, which model local curvature, are more effective in such scenarios. The paper also explores the use of Hessian-free optimization, which approximates the Hessian matrix using finite differences and linear conjugate gradient (CG) methods. This approach avoids the computational cost of explicitly computing the Hessian and is suitable for large datasets. The paper presents an algorithm for Hessian-free optimization, which uses CG to minimize a quadratic approximation of the objective function. The method is modified to be practical for machine learning tasks, including damping, efficient computation of matrix-vector products, handling large datasets, and termination conditions for CG. The algorithm is shown to be effective in training deep auto-encoders, outperforming pre-training + fine-tuning approaches. The paper also discusses the importance of curvature in optimization and how Hessian-free optimization addresses the issue of pathological curvature. It shows that the method is more effective than first-order optimization methods in solving under-fitting problems and that pre-training can help in speeding up optimization and improving generalization. The results indicate that deep learning can be achieved effectively and efficiently without pre-training, opening the door to exploring a wide range of deep or difficult-to-optimize architectures. The paper concludes with a discussion of the implications of the results and the need for further research to address the many interesting questions that arise.
Reach us at info@futurestudyspace.com
[slides and audio] Deep learning via Hessian-free optimization