The Manga Whisperer: Automatically Generating Transcriptions for Comics

The Manga Whisperer: Automatically Generating Transcriptions for Comics

1 Aug 2024 | Ragav Sachdeva, Andrew Zisserman
The Manga Whisperer: Automatically Generating Transcriptions for Comics Ragav Sachdeva and Andrew Zisserman Visual Geometry Group, Dept. of Engineering Science, University of Oxford This paper presents a model, Magi, that automatically generates transcriptions for comics (manga) by detecting panels, text blocks, and character boxes, clustering characters by identity, and associating texts to their speakers. The model is trained on a large-scale dataset of manga pages and can generate a dialogue transcription in the correct reading order. The model is evaluated on a benchmark dataset called PopManga, which consists of pages from 80+ popular manga by various artists. The model outperforms existing methods in terms of accuracy and efficiency. The main contributions of this work are: (1) a unified model that can detect panels, text blocks, and character boxes, cluster characters by identity, and associate dialogues to their speakers; (2) a novel approach to sort the detected text boxes in their reading order and generate a dialogue transcript; (3) an evaluation benchmark for this task using publicly available manga pages. The task of generating a transcription for comics is challenging due to the complex layout and the need to associate dialogues to their speakers. The model addresses these challenges by using a graph generation approach, where detections are the "nodes" and their associations are the "edges". The model is trained on a large-scale dataset of manga pages and can generate a dialogue transcription in the correct reading order. The model is evaluated on a benchmark dataset called PopManga, which consists of pages from 80+ popular manga by various artists. The model outperforms existing methods in terms of accuracy and efficiency. The model is also compared to other methods for character re-identification and speaker association, and it is shown to be superior in these tasks. The model is implemented using a CNN backbone and a transformer encoder-decoder. The model is trained on a large-scale dataset of manga pages and can generate a dialogue transcription in the correct reading order. The model is evaluated on a benchmark dataset called PopManga, which consists of pages from 80+ popular manga by various artists. The model outperforms existing methods in terms of accuracy and efficiency. The model is also compared to other methods for character re-identification and speaker association, and it is shown to be superior in these tasks.The Manga Whisperer: Automatically Generating Transcriptions for Comics Ragav Sachdeva and Andrew Zisserman Visual Geometry Group, Dept. of Engineering Science, University of Oxford This paper presents a model, Magi, that automatically generates transcriptions for comics (manga) by detecting panels, text blocks, and character boxes, clustering characters by identity, and associating texts to their speakers. The model is trained on a large-scale dataset of manga pages and can generate a dialogue transcription in the correct reading order. The model is evaluated on a benchmark dataset called PopManga, which consists of pages from 80+ popular manga by various artists. The model outperforms existing methods in terms of accuracy and efficiency. The main contributions of this work are: (1) a unified model that can detect panels, text blocks, and character boxes, cluster characters by identity, and associate dialogues to their speakers; (2) a novel approach to sort the detected text boxes in their reading order and generate a dialogue transcript; (3) an evaluation benchmark for this task using publicly available manga pages. The task of generating a transcription for comics is challenging due to the complex layout and the need to associate dialogues to their speakers. The model addresses these challenges by using a graph generation approach, where detections are the "nodes" and their associations are the "edges". The model is trained on a large-scale dataset of manga pages and can generate a dialogue transcription in the correct reading order. The model is evaluated on a benchmark dataset called PopManga, which consists of pages from 80+ popular manga by various artists. The model outperforms existing methods in terms of accuracy and efficiency. The model is also compared to other methods for character re-identification and speaker association, and it is shown to be superior in these tasks. The model is implemented using a CNN backbone and a transformer encoder-decoder. The model is trained on a large-scale dataset of manga pages and can generate a dialogue transcription in the correct reading order. The model is evaluated on a benchmark dataset called PopManga, which consists of pages from 80+ popular manga by various artists. The model outperforms existing methods in terms of accuracy and efficiency. The model is also compared to other methods for character re-identification and speaker association, and it is shown to be superior in these tasks.
Reach us at info@study.space