[slides] Seeing Text in the Dark%3A Algorithm and Benchmark

This paper proposes a single-stage approach for localizing text in low-light environments, which avoids the need for low-light image enhancement (LLE) and instead introduces a spatial-constrained learning module during the training stage of the text detector. This module guides the text detector in preserving textual spatial features amidst feature map resizing, minimizing the loss of spatial information in texts under low-light conditions. The method incorporates spatial reconstruction and spatial semantic constraints to ensure the text detector acquires essential positional and contextual range knowledge. It also enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. Additionally, the paper presents a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. The method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released. The proposed method is designed to address the challenges of detecting arbitrary shape text in low-light conditions, which remain a significant challenge due to visual degradations such as blurred details, reduced brightness and contrast, and distorted color representation. The paper also discusses related works, including arbitrary shape text detection and low-light image enhancement, and presents an extensive dataset for low-light arbitrary-shaped text detection, LATeD, which features 13,923 multilingual and arbitrary shape texts across diverse low-light scenes. The method is evaluated on several benchmark datasets, including CTW1500, Total-text, and MSRA-TD500, and achieves state-of-the-art results on all of them. The paper also presents an ablation study on the effectiveness of the proposed modules, including the spatial-constrained learning module (SCM), dynamic snake feature pyramid network (DSF), and text shaping with rotated rectangular accumulation (TSR), and shows that these modules significantly improve the performance of the text detector in both low-light and normal-light conditions. The results demonstrate the effectiveness of the proposed method in accurately detecting text in low-light environments and highlight the importance of spatial constraints in preserving textual spatial information during feature map resizing.This paper proposes a single-stage approach for localizing text in low-light environments, which avoids the need for low-light image enhancement (LLE) and instead introduces a spatial-constrained learning module during the training stage of the text detector. This module guides the text detector in preserving textual spatial features amidst feature map resizing, minimizing the loss of spatial information in texts under low-light conditions. The method incorporates spatial reconstruction and spatial semantic constraints to ensure the text detector acquires essential positional and contextual range knowledge. It also enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. Additionally, the paper presents a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. The method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released. The proposed method is designed to address the challenges of detecting arbitrary shape text in low-light conditions, which remain a significant challenge due to visual degradations such as blurred details, reduced brightness and contrast, and distorted color representation. The paper also discusses related works, including arbitrary shape text detection and low-light image enhancement, and presents an extensive dataset for low-light arbitrary-shaped text detection, LATeD, which features 13,923 multilingual and arbitrary shape texts across diverse low-light scenes. The method is evaluated on several benchmark datasets, including CTW1500, Total-text, and MSRA-TD500, and achieves state-of-the-art results on all of them. The paper also presents an ablation study on the effectiveness of the proposed modules, including the spatial-constrained learning module (SCM), dynamic snake feature pyramid network (DSF), and text shaping with rotated rectangular accumulation (TSR), and shows that these modules significantly improve the performance of the text detector in both low-light and normal-light conditions. The results demonstrate the effectiveness of the proposed method in accurately detecting text in low-light environments and highlight the importance of spatial constraints in preserving textual spatial information during feature map resizing.

Seeing Text in the Dark: Algorithm and Benchmark

24 Apr 2024 | Chengpei Xu, Hao Fu, Long Ma, Wenjing Jia, Chengqi Zhang, Feng Xia, Xiaoyu Ai, Binghao Li, and Wenjie Zhang