11 Apr 2024 | Iker García-Ferrero, Rodrigo Agerri, Aitziber Atutxa, Elena Cabrio, Iker de la Iglesia, Alberto Lavelli, Bernardo Magnini, Benjamin Molinet, Johana Ramirez-Romero, German Rigau, Jose Maria Villa-Gonzalez, Serena Villata, Andrea Zaninello
The paper introduces Medical mT5, an open-source multilingual text-to-text large language model (LLM) designed for the medical domain. The authors address the lack of multilingual evaluation benchmarks and corpora in medical NLP by compiling the largest multilingual corpus for English, French, Italian, and Spanish, totaling 3 billion tokens. This corpus is used to train Medical mT5, an encoder-decoder model based on mT5. The model is evaluated on two new multilingual benchmarks: Argument Mining and Abstractive Question Answering. Comprehensive experiments show that Medical mT5 outperforms similar-sized text-to-text models in Spanish, French, and Italian, while being competitive with state-of-the-art LLMs in English. The paper also discusses the challenges and ethical considerations in developing such models, emphasizing the importance of transparency, fairness, and open-source collaboration.The paper introduces Medical mT5, an open-source multilingual text-to-text large language model (LLM) designed for the medical domain. The authors address the lack of multilingual evaluation benchmarks and corpora in medical NLP by compiling the largest multilingual corpus for English, French, Italian, and Spanish, totaling 3 billion tokens. This corpus is used to train Medical mT5, an encoder-decoder model based on mT5. The model is evaluated on two new multilingual benchmarks: Argument Mining and Abstractive Question Answering. Comprehensive experiments show that Medical mT5 outperforms similar-sized text-to-text models in Spanish, French, and Italian, while being competitive with state-of-the-art LLMs in English. The paper also discusses the challenges and ethical considerations in developing such models, emphasizing the importance of transparency, fairness, and open-source collaboration.