The paper "The Manga Whisperer: Automatically Generating Transcriptions for Comics" by Ragav Sachdeva and Andrew Zisserman addresses the challenge of making manga accessible to individuals with visual impairments by developing a model called Magi. Magi is designed to detect panels, text blocks, and character boxes, cluster characters by identity, and associate dialogues with their speakers. The model uses a unified architecture that processes high-resolution manga pages through a CNN backbone and a transformer encoder-decoder, generating predictions for object detection, character matching, and speaker association. The paper also introduces a new evaluation benchmark, PopManga, which includes 57,000+ manga pages from 80+ series, and demonstrates superior performance over existing methods in character detection, clustering, and speaker association tasks. The authors provide detailed descriptions of the model architecture, training process, and evaluation metrics, highlighting the effectiveness of their approach in generating accurate transcriptions of manga content.The paper "The Manga Whisperer: Automatically Generating Transcriptions for Comics" by Ragav Sachdeva and Andrew Zisserman addresses the challenge of making manga accessible to individuals with visual impairments by developing a model called Magi. Magi is designed to detect panels, text blocks, and character boxes, cluster characters by identity, and associate dialogues with their speakers. The model uses a unified architecture that processes high-resolution manga pages through a CNN backbone and a transformer encoder-decoder, generating predictions for object detection, character matching, and speaker association. The paper also introduces a new evaluation benchmark, PopManga, which includes 57,000+ manga pages from 80+ series, and demonstrates superior performance over existing methods in character detection, clustering, and speaker association tasks. The authors provide detailed descriptions of the model architecture, training process, and evaluation metrics, highlighting the effectiveness of their approach in generating accurate transcriptions of manga content.