Understanding A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning

This paper introduces a new supervised learning algorithm called the Scaled Conjugate Gradient (SCG) algorithm, which is based on the Conjugate Gradient Methods used in numerical analysis. SCG leverages second-order information from neural networks but requires only O(N) memory, where N is the number of weights in the network. The algorithm is benchmarked against the standard backpropagation (BP) and the conjugate gradient backpropagation (CGB) algorithms, showing a speed-up of at least an order of magnitude relative to BP. SCG avoids the time-consuming line-search used by CGB and BFGS by incorporating a Levenberg-Marquardt approach to scale the step size. The paper also discusses the importance of incorporating problem-dependent structural information in neural network architectures to handle complex problems more effectively. Experiments on various problems, including the parity problem and the logistic map problem, demonstrate the superior performance of SCG in terms of convergence speed and efficiency. The conclusion highlights the effectiveness of SCG in handling ravine phenomena, which are common in high-dimensional weight spaces, and its ability to utilize structural information in neural networks.This paper introduces a new supervised learning algorithm called the Scaled Conjugate Gradient (SCG) algorithm, which is based on the Conjugate Gradient Methods used in numerical analysis. SCG leverages second-order information from neural networks but requires only O(N) memory, where N is the number of weights in the network. The algorithm is benchmarked against the standard backpropagation (BP) and the conjugate gradient backpropagation (CGB) algorithms, showing a speed-up of at least an order of magnitude relative to BP. SCG avoids the time-consuming line-search used by CGB and BFGS by incorporating a Levenberg-Marquardt approach to scale the step size. The paper also discusses the importance of incorporating problem-dependent structural information in neural network architectures to handle complex problems more effectively. Experiments on various problems, including the parity problem and the logistic map problem, demonstrate the superior performance of SCG in terms of convergence speed and efficiency. The conclusion highlights the effectiveness of SCG in handling ravine phenomena, which are common in high-dimensional weight spaces, and its ability to utilize structural information in neural networks.

A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning

November 13, 1990 | Martin F. Møller