Reading Text in the Wild with Convolutional Neural Networks

Reading Text in the Wild with Convolutional Neural Networks

4 Dec 2014 | Max Jaderberg · Karen Simonyan · Andrea Vedaldi · Andrew Zisserman
This paper presents an end-to-end system for text spotting and text-based image retrieval. The system uses a region proposal mechanism for detection and deep convolutional neural networks (CNNs) for recognition. The pipeline combines complementary proposal generation techniques to ensure high recall and includes a fast filtering stage to improve precision. For recognition, large CNNs are trained on synthetic data to recognize words in whole proposal regions, without requiring human-labeled data. The system achieves state-of-the-art performance on standard benchmarks and is applied to real-world scenarios, such as searching through thousands of hours of news footage via text queries. The system's key contributions include a novel text recognition method using a deep CNN that takes the entire word image as input, trained on synthetic data. A novel detection strategy uses fast region proposal methods for word detection, combining an object-agnostic region proposal method and a sliding window detector. The system is also applied to large-scale visual search of text in video, enabling high-precision retrieval of images and videos containing user-given text queries. The pipeline consists of stages: word bounding box proposal generation, proposal filtering and adjustment, text recognition, and final merging. The detection stage uses weak but fast methods to generate word bounding boxes, while the recognition stage uses a whole-word approach with a deep CNN trained on synthetic data. The system's performance is evaluated on various benchmarks, showing significant improvements over previous methods. The system is also applied to real-world scenarios, demonstrating its effectiveness in text spotting and image retrieval.This paper presents an end-to-end system for text spotting and text-based image retrieval. The system uses a region proposal mechanism for detection and deep convolutional neural networks (CNNs) for recognition. The pipeline combines complementary proposal generation techniques to ensure high recall and includes a fast filtering stage to improve precision. For recognition, large CNNs are trained on synthetic data to recognize words in whole proposal regions, without requiring human-labeled data. The system achieves state-of-the-art performance on standard benchmarks and is applied to real-world scenarios, such as searching through thousands of hours of news footage via text queries. The system's key contributions include a novel text recognition method using a deep CNN that takes the entire word image as input, trained on synthetic data. A novel detection strategy uses fast region proposal methods for word detection, combining an object-agnostic region proposal method and a sliding window detector. The system is also applied to large-scale visual search of text in video, enabling high-precision retrieval of images and videos containing user-given text queries. The pipeline consists of stages: word bounding box proposal generation, proposal filtering and adjustment, text recognition, and final merging. The detection stage uses weak but fast methods to generate word bounding boxes, while the recognition stage uses a whole-word approach with a deep CNN trained on synthetic data. The system's performance is evaluated on various benchmarks, showing significant improvements over previous methods. The system is also applied to real-world scenarios, demonstrating its effectiveness in text spotting and image retrieval.
Reach us at info@study.space
Understanding Reading Text in the Wild with Convolutional Neural Networks