[slides] Deep learning via Hessian-free optimization

The paper introduces a second-order optimization method based on Hessian-free (HF) optimization, which is applied to training deep auto-encoders. The method outperforms pre-training methods reported by Hinton and Salakhutdinov (2006) without requiring pre-training. The authors discuss the issue of pathological curvature, which is believed to be a major challenge in deep learning, and argue that second-order optimization effectively addresses this issue. The paper details the implementation and improvements of HF optimization, including damping, matrix-vector product computation, handling large datasets, termination conditions, and random initialization. Experimental results on various datasets show that the HF method achieves lower training errors and faster convergence compared to pre-training + fine-tuning approaches. The findings suggest that deep models can be effectively optimized without pre-training, opening up new possibilities for exploring diverse deep architectures.The paper introduces a second-order optimization method based on Hessian-free (HF) optimization, which is applied to training deep auto-encoders. The method outperforms pre-training methods reported by Hinton and Salakhutdinov (2006) without requiring pre-training. The authors discuss the issue of pathological curvature, which is believed to be a major challenge in deep learning, and argue that second-order optimization effectively addresses this issue. The paper details the implementation and improvements of HF optimization, including damping, matrix-vector product computation, handling large datasets, termination conditions, and random initialization. Experimental results on various datasets show that the HF method achieves lower training errors and faster convergence compared to pre-training + fine-tuning approaches. The findings suggest that deep models can be effectively optimized without pre-training, opening up new possibilities for exploring diverse deep architectures.

Deep learning via Hessian-free optimization

2010 | James Martens