[slides] Reading Text in the Wild with Convolutional Neural Networks

This paper presents an end-to-end system for text spotting and text-based image retrieval, utilizing deep convolutional neural networks (CNNs) for recognition and a region proposal mechanism for detection. The system combines complementary proposal generation techniques to achieve high recall and a subsequent filtering stage to improve precision. The recognition stage uses very large CNNs trained solely on synthetic data to perform word recognition across a 90k-word dictionary, departing from traditional character classifier systems. The detection stage employs a combination of Edge Box proposals and a sliding window detector to generate high recall word bounding box proposals, which are then refined using a CNN for bounding box regression. The final stage involves merging and ranking detections based on proximity and recognition results. The system demonstrates state-of-the-art performance on various benchmarks and is applied to real-world applications, such as instant search of news footage by text query. The paper also reviews related work in text detection and recognition, and provides a detailed description of the pipeline's components, including proposal generation, filtering and refinement, text recognition, and merging and ranking.This paper presents an end-to-end system for text spotting and text-based image retrieval, utilizing deep convolutional neural networks (CNNs) for recognition and a region proposal mechanism for detection. The system combines complementary proposal generation techniques to achieve high recall and a subsequent filtering stage to improve precision. The recognition stage uses very large CNNs trained solely on synthetic data to perform word recognition across a 90k-word dictionary, departing from traditional character classifier systems. The detection stage employs a combination of Edge Box proposals and a sliding window detector to generate high recall word bounding box proposals, which are then refined using a CNN for bounding box regression. The final stage involves merging and ranking detections based on proximity and recognition results. The system demonstrates state-of-the-art performance on various benchmarks and is applied to real-world applications, such as instant search of news footage by text query. The paper also reviews related work in text detection and recognition, and provides a detailed description of the pipeline's components, including proposal generation, filtering and refinement, text recognition, and merging and ranking.

Reading Text in the Wild with Convolutional Neural Networks

4 Dec 2014 | Max Jaderberg · Karen Simonyan · Andrea Vedaldi · Andrew Zisserman