2024-05-07 | Jiaxin Zhang, Dezhi Peng, Chongyu Liu, Peirong Zhang, Lianwen Jin
**DocRes: A Generalist Model for Unifying Document Image Restoration Tasks**
Document image restoration is crucial for enhancing the quality of document images, which significantly impacts the overall performance of Document AI systems. Current methods typically address distinct restoration tasks independently, leading to complex systems and the inability to leverage the synergies of multi-task learning. To overcome this challenge, the authors propose DocRes, a generalist model that unifies five document image restoration tasks: dewarping, deshadowing, appearance enhancement, deblurring, and binarization.
To enable DocRes to perform various restoration tasks, the authors introduce a novel visual prompt approach called Dynamic Task-Specific Prompt (DTSPrompt). DTSPrompt comprises distinct prior features extracted from the input image, which serve as additional characteristics to guide the model. These features can be used to enhance the model's performance and are more flexible than previous visual prompt approaches, as they can be seamlessly applied to inputs with high and variable resolutions.
Experimental results demonstrate that DocRes achieves competitive or superior performance compared to existing state-of-the-art task-specific models across various benchmarks. The source code for DocRes is publicly available at <https://github.com/ZZZHANG-JX/DocRes>.
The key contributions of this work include:
- The first exploration of generalist models for unifying document image restoration tasks.
- The introduction of DTSPrompt, a simple yet effective visual prompt approach that extracts prior features from the input image to create prompts.
- DocRes's superior performance compared to task-specific methods across various benchmarks.
The paper also discusses related works, methodology, experiments, and concludes with discussions and future directions.**DocRes: A Generalist Model for Unifying Document Image Restoration Tasks**
Document image restoration is crucial for enhancing the quality of document images, which significantly impacts the overall performance of Document AI systems. Current methods typically address distinct restoration tasks independently, leading to complex systems and the inability to leverage the synergies of multi-task learning. To overcome this challenge, the authors propose DocRes, a generalist model that unifies five document image restoration tasks: dewarping, deshadowing, appearance enhancement, deblurring, and binarization.
To enable DocRes to perform various restoration tasks, the authors introduce a novel visual prompt approach called Dynamic Task-Specific Prompt (DTSPrompt). DTSPrompt comprises distinct prior features extracted from the input image, which serve as additional characteristics to guide the model. These features can be used to enhance the model's performance and are more flexible than previous visual prompt approaches, as they can be seamlessly applied to inputs with high and variable resolutions.
Experimental results demonstrate that DocRes achieves competitive or superior performance compared to existing state-of-the-art task-specific models across various benchmarks. The source code for DocRes is publicly available at <https://github.com/ZZZHANG-JX/DocRes>.
The key contributions of this work include:
- The first exploration of generalist models for unifying document image restoration tasks.
- The introduction of DTSPrompt, a simple yet effective visual prompt approach that extracts prior features from the input image to create prompts.
- DocRes's superior performance compared to task-specific methods across various benchmarks.
The paper also discusses related works, methodology, experiments, and concludes with discussions and future directions.