[slides and audio] TextBoxes%3A A Fast Text Detector with a Single Deep Neural Network

This paper introduces TextBoxes, an end-to-end trainable fast scene text detector that achieves high accuracy and efficiency in a single network forward pass. TextBoxes outperforms competing methods in text localization accuracy and is significantly faster, taking only 0.09 seconds per image in a fast implementation. The detector is based on a fully convolutional network (FCN) that jointly predicts text presence and coordinate offsets to default boxes. It uses multiple output layers, or text-box layers, to handle variations in aspect ratios and sizes of words. These layers are designed with irregular convolutional kernels and default boxes to better fit words with large aspect ratios. The final outputs are aggregated and processed through non-maximum suppression. Additionally, the paper combines TextBoxes with a text recognizer (CRNN) to improve word spotting and end-to-end text recognition tasks. The combination of TextBoxes and CRNN achieves state-of-the-art performance on these tasks. The paper also discusses the architecture, training, and evaluation of TextBoxes, and compares its performance with other methods on various datasets.This paper introduces TextBoxes, an end-to-end trainable fast scene text detector that achieves high accuracy and efficiency in a single network forward pass. TextBoxes outperforms competing methods in text localization accuracy and is significantly faster, taking only 0.09 seconds per image in a fast implementation. The detector is based on a fully convolutional network (FCN) that jointly predicts text presence and coordinate offsets to default boxes. It uses multiple output layers, or text-box layers, to handle variations in aspect ratios and sizes of words. These layers are designed with irregular convolutional kernels and default boxes to better fit words with large aspect ratios. The final outputs are aggregated and processed through non-maximum suppression. Additionally, the paper combines TextBoxes with a text recognizer (CRNN) to improve word spotting and end-to-end text recognition tasks. The combination of TextBoxes and CRNN achieves state-of-the-art performance on these tasks. The paper also discusses the architecture, training, and evaluation of TextBoxes, and compares its performance with other methods on various datasets.

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

2017 | Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu