Best Practices for a Handwritten Text Recognition System

Best Practices for a Handwritten Text Recognition System

17 Apr 2024 | George Retsinas, Giorgos Sfikas, Basilis Gatos, Christophoros Nikou
This paper presents best practices for building effective handwritten text recognition (HTR) systems, focusing on a convolutional-recurrent (CNN+LSTM) architecture. The authors propose three key modifications: 1) retaining the aspect ratio of images through padding, 2) using max-pooling to convert CNN outputs into a sequence of features, and 3) adding a CTC loss with an auxiliary shortcut branch to aid training. These modifications are evaluated on the IAM and RIMES datasets, achieving state-of-the-art results with a simple network architecture. The paper highlights the importance of these practices in improving performance and generalization, despite the simplicity of the proposed system. The code for the system is available on GitHub.This paper presents best practices for building effective handwritten text recognition (HTR) systems, focusing on a convolutional-recurrent (CNN+LSTM) architecture. The authors propose three key modifications: 1) retaining the aspect ratio of images through padding, 2) using max-pooling to convert CNN outputs into a sequence of features, and 3) adding a CTC loss with an auxiliary shortcut branch to aid training. These modifications are evaluated on the IAM and RIMES datasets, achieving state-of-the-art results with a simple network architecture. The paper highlights the importance of these practices in improving performance and generalization, despite the simplicity of the proposed system. The code for the system is available on GitHub.
Reach us at info@study.space