EAST: An Efficient and Accurate Scene Text Detector

EAST: An Efficient and Accurate Scene Text Detector

10 Jul 2017 | Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang
EAST: An Efficient and Accurate Scene Text Detector This paper proposes a simple yet powerful scene text detection pipeline that achieves fast and accurate text detection in natural scenes. The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps such as candidate aggregation and word partitioning, with a single neural network. The simplicity of the pipeline allows focusing on designing loss functions and neural network architecture. Experiments on standard datasets including ICDAR 2015, COCO-Text, and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps at 720p resolution. The proposed algorithm consists of two stages: a Fully Convolutional Network (FCN) and an NMS merging stage. The FCN directly produces text regions, excluding redundant and time-consuming intermediate steps. The produced text predictions, which can be either rotated rectangles or quadrangles, are sent to Non-Maximum Suppression (NMS) to yield final results. The algorithm achieves an F-score of 0.7820 on ICDAR 2015 (0.8072 when tested in multi-scale), 0.7608 on MSRA-TD500, and 0.3945 on COCO-Text, outperforming previous state-of-the-art algorithms in performance while taking much less time. The key component of the proposed algorithm is a neural network model trained to directly predict the existence of text instances and their geometries from full images. The model is a fully-convolutional neural network adapted for text detection that outputs dense per-pixel predictions of words or text lines. This eliminates intermediate steps such as candidate proposal, text region formation, and word partition. The post-processing steps only include thresholding and NMS on predicted geometric shapes. The detector is named EAST, since it is an Efficient and Accuracy Scene Text detection pipeline. The proposed algorithm significantly outperforms state-of-the-art methods in both accuracy and speed. The algorithm is efficient and accurate, achieving high performance on standard benchmarks. The algorithm is designed to be simple and efficient, with a single, lightweight neural network that surpasses all previous methods in both performance and speed. The algorithm is also flexible to produce either word level or line level predictions, whose geometric shapes can be rotated boxes or quadrangles, depending on specific applications.EAST: An Efficient and Accurate Scene Text Detector This paper proposes a simple yet powerful scene text detection pipeline that achieves fast and accurate text detection in natural scenes. The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps such as candidate aggregation and word partitioning, with a single neural network. The simplicity of the pipeline allows focusing on designing loss functions and neural network architecture. Experiments on standard datasets including ICDAR 2015, COCO-Text, and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps at 720p resolution. The proposed algorithm consists of two stages: a Fully Convolutional Network (FCN) and an NMS merging stage. The FCN directly produces text regions, excluding redundant and time-consuming intermediate steps. The produced text predictions, which can be either rotated rectangles or quadrangles, are sent to Non-Maximum Suppression (NMS) to yield final results. The algorithm achieves an F-score of 0.7820 on ICDAR 2015 (0.8072 when tested in multi-scale), 0.7608 on MSRA-TD500, and 0.3945 on COCO-Text, outperforming previous state-of-the-art algorithms in performance while taking much less time. The key component of the proposed algorithm is a neural network model trained to directly predict the existence of text instances and their geometries from full images. The model is a fully-convolutional neural network adapted for text detection that outputs dense per-pixel predictions of words or text lines. This eliminates intermediate steps such as candidate proposal, text region formation, and word partition. The post-processing steps only include thresholding and NMS on predicted geometric shapes. The detector is named EAST, since it is an Efficient and Accuracy Scene Text detection pipeline. The proposed algorithm significantly outperforms state-of-the-art methods in both accuracy and speed. The algorithm is efficient and accurate, achieving high performance on standard benchmarks. The algorithm is designed to be simple and efficient, with a single, lightweight neural network that surpasses all previous methods in both performance and speed. The algorithm is also flexible to produce either word level or line level predictions, whose geometric shapes can be rotated boxes or quadrangles, depending on specific applications.
Reach us at info@study.space