[slides and audio] MISS%3A A Generative Pretraining and Finetuning Approach for Med-VQA

The paper "MISS: A Generative Pre-training and Fine-tuning Approach for Med-VQA" addresses the challenge of Medical Visual Question Answering (Med-VQA), a multimodal task that requires deep and accurate understanding of medical images. The authors propose a novel framework called Multi-task Self-Supervised-learning-based framework (MISS), which treats Med-VQA as a generative task. Unlike existing methods that treat Med-VQA as an answer classification task, MISS unifies the text encoder and multimodal encoder, aligning image-text features through multi-task learning. Additionally, the paper introduces the Transfer-and-Caption (TransCap) method, which extends the feature space of single-modal image datasets using Large Language Models (LLMs), enabling the use of traditional medical vision datasets for VLP models. The authors conduct extensive experiments and compare their method with existing Med-VQA methods, demonstrating its effectiveness and efficiency with fewer multimodal datasets. The code for MISS is available at https://github.com/TIMMY-CHAN/MISS.git.The paper "MISS: A Generative Pre-training and Fine-tuning Approach for Med-VQA" addresses the challenge of Medical Visual Question Answering (Med-VQA), a multimodal task that requires deep and accurate understanding of medical images. The authors propose a novel framework called Multi-task Self-Supervised-learning-based framework (MISS), which treats Med-VQA as a generative task. Unlike existing methods that treat Med-VQA as an answer classification task, MISS unifies the text encoder and multimodal encoder, aligning image-text features through multi-task learning. Additionally, the paper introduces the Transfer-and-Caption (TransCap) method, which extends the feature space of single-modal image datasets using Large Language Models (LLMs), enabling the use of traditional medical vision datasets for VLP models. The authors conduct extensive experiments and compare their method with existing Med-VQA methods, demonstrating its effectiveness and efficiency with fewer multimodal datasets. The code for MISS is available at https://github.com/TIMMY-CHAN/MISS.git.

MISS: A Generative Pre-training and Fine-tuning Approach for Med-VQA

19 Jun 2024 | Jiawei Chen, Dingkang Yang, Yue Jiang, Yuxuan Lei, Lihua Zhang