Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

9 Dec 2014 | Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman
This paper presents a framework for recognizing natural scene text using synthetic data and deep neural networks. The framework does not require human-labeled data and performs word recognition on the entire image, departing from character-based recognition systems. The deep neural network models are trained solely on synthetic text data, which is highly realistic and sufficient to replace real data, providing an infinite amount of training data. The paper considers three models: dictionary encoding, character sequence encoding, and bag-of-N-grams encoding, each with a different approach to recognizing words. The models achieve state-of-the-art performance on standard datasets, improving upon previous methods without requiring any data-acquisition costs. The synthetic data generation process involves rendering text with various fonts, distortions, and blending techniques to create realistic images. The paper also introduces a new synthetic word dataset, significantly larger than any previously released.This paper presents a framework for recognizing natural scene text using synthetic data and deep neural networks. The framework does not require human-labeled data and performs word recognition on the entire image, departing from character-based recognition systems. The deep neural network models are trained solely on synthetic text data, which is highly realistic and sufficient to replace real data, providing an infinite amount of training data. The paper considers three models: dictionary encoding, character sequence encoding, and bag-of-N-grams encoding, each with a different approach to recognizing words. The models achieve state-of-the-art performance on standard datasets, improving upon previous methods without requiring any data-acquisition costs. The synthetic data generation process involves rendering text with various fonts, distortions, and blending techniques to create realistic images. The paper also introduces a new synthetic word dataset, significantly larger than any previously released.
Reach us at info@study.space