[slides] LSTM%3A A Search Space Odyssey

This paper presents a large-scale analysis of eight variants of the Long Short-Term Memory (LSTM) architecture on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of each variant were optimized separately using random search, and their importance was assessed using the fANOVA framework. The study involved 5400 experimental runs, making it the largest of its kind on LSTM networks. The results show that none of the variants significantly outperform the standard LSTM architecture, with the forget gate and the output activation function being the most critical components. The study also found that the hyperparameters are largely independent, providing guidelines for efficient adjustment. The paper concludes by discussing the implications of these findings for LSTM architecture selection and hyperparameter tuning.This paper presents a large-scale analysis of eight variants of the Long Short-Term Memory (LSTM) architecture on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of each variant were optimized separately using random search, and their importance was assessed using the fANOVA framework. The study involved 5400 experimental runs, making it the largest of its kind on LSTM networks. The results show that none of the variants significantly outperform the standard LSTM architecture, with the forget gate and the output activation function being the most critical components. The study also found that the hyperparameters are largely independent, providing guidelines for efficient adjustment. The paper concludes by discussing the implications of these findings for LSTM architecture selection and hyperparameter tuning.

LSTM: A Search Space Odyssey

4 Oct 2017 | Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, Jürgen Schmidhuber