15 Mar 2018 | Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, Xiangyang Xue
This paper introduces a novel rotation-based framework for detecting text in natural scene images, addressing the challenge of detecting text with arbitrary orientations. The framework, named Rotation Region Proposal Networks (RRPN), generates inclined proposals with text orientation angle information, which are then used for bounding box regression to improve the accuracy of the proposals. The Rotation Region-of-Interest (RRoI) pooling layer projects these proposals onto a feature map for a text region classifier. The entire system is built on a region-proposal-based architecture, ensuring computational efficiency compared to previous text detection systems. Experiments on three real-world datasets (MSRA-TD500, ICDAR2013, and ICDAR2015) demonstrate the effectiveness and efficiency of the proposed approach, outperforming previous methods in terms of both accuracy and speed. Key contributions include the ability to predict text orientation using region proposals, the introduction of the RRoI pooling layer, and the refinement of region proposals with arbitrary orientations.This paper introduces a novel rotation-based framework for detecting text in natural scene images, addressing the challenge of detecting text with arbitrary orientations. The framework, named Rotation Region Proposal Networks (RRPN), generates inclined proposals with text orientation angle information, which are then used for bounding box regression to improve the accuracy of the proposals. The Rotation Region-of-Interest (RRoI) pooling layer projects these proposals onto a feature map for a text region classifier. The entire system is built on a region-proposal-based architecture, ensuring computational efficiency compared to previous text detection systems. Experiments on three real-world datasets (MSRA-TD500, ICDAR2013, and ICDAR2015) demonstrate the effectiveness and efficiency of the proposed approach, outperforming previous methods in terms of both accuracy and speed. Key contributions include the ability to predict text orientation using region proposals, the introduction of the RRoI pooling layer, and the refinement of region proposals with arbitrary orientations.