10 Jul 2017 | Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang
The paper "EAST: An Efficient and Accurate Scene Text Detector" by Xinyu Zhou et al. introduces a novel scene text detection method that aims to achieve both high accuracy and efficiency. The proposed EAST (Efficient and Accuracy Scene Text detection) pipeline is designed to directly predict words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps such as candidate aggregation and word partitioning. This simplicity allows for a focused design of loss functions and neural network architecture.
The EAST pipeline consists of two stages: a Fully Convolutional Network (FCN) and a Non-Maximum Suppression (NMS) merging stage. The FCN directly produces text predictions, which are then processed by NMS to yield the final results. The algorithm has been tested on standard datasets including ICDAR 2015, COCO-Text, and MSRA-TD500, demonstrating significantly improved performance over state-of-the-art methods in terms of both accuracy and speed.
Key contributions of the work include:
1. A simple yet powerful pipeline that directly predicts text regions without intermediate steps.
2. Flexibility in producing word or line level predictions with rotated boxes or quadrangles.
3. Significant performance and speed improvements over existing methods.
The paper also discusses the methodology, including the network design, label generation, loss functions, training, and locality-aware NMS. Experimental results show that the proposed algorithm achieves high F-scores on various benchmarks, outperforming previous methods by a large margin. Additionally, the algorithm is shown to be efficient, with fast processing times even at high resolutions.The paper "EAST: An Efficient and Accurate Scene Text Detector" by Xinyu Zhou et al. introduces a novel scene text detection method that aims to achieve both high accuracy and efficiency. The proposed EAST (Efficient and Accuracy Scene Text detection) pipeline is designed to directly predict words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps such as candidate aggregation and word partitioning. This simplicity allows for a focused design of loss functions and neural network architecture.
The EAST pipeline consists of two stages: a Fully Convolutional Network (FCN) and a Non-Maximum Suppression (NMS) merging stage. The FCN directly produces text predictions, which are then processed by NMS to yield the final results. The algorithm has been tested on standard datasets including ICDAR 2015, COCO-Text, and MSRA-TD500, demonstrating significantly improved performance over state-of-the-art methods in terms of both accuracy and speed.
Key contributions of the work include:
1. A simple yet powerful pipeline that directly predicts text regions without intermediate steps.
2. Flexibility in producing word or line level predictions with rotated boxes or quadrangles.
3. Significant performance and speed improvements over existing methods.
The paper also discusses the methodology, including the network design, label generation, loss functions, training, and locality-aware NMS. Experimental results show that the proposed algorithm achieves high F-scores on various benchmarks, outperforming previous methods by a large margin. Additionally, the algorithm is shown to be efficient, with fast processing times even at high resolutions.