7 Mar 2024 | Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, André F. T. Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Morgado, Michael Desa
This paper introduces SauLLM-7B, a large language model (LLM) specifically designed for legal text comprehension and generation. With 7 billion parameters, SauLLM-7B is the first LLM tailored for the legal domain, trained on an English legal corpus of over 30 billion tokens. The model demonstrates state-of-the-art proficiency in understanding and processing legal documents. The authors also present a novel instructional fine-tuning method that enhances SauLLM-7B's performance in legal tasks. SauLLM-7B is released under the MIT License, aiming to empower legal professionals and catalyze innovation in the intersection of AI and the legal community. The paper contributes to the legal domain by introducing a family of legal LLMs, an improved evaluation protocol for legal LLMs, and providing model, evaluation code, and licensing details. The evaluation of SauLLM-7B's legal capabilities is conducted using benchmarks and datasets, showing significant improvements over existing models.This paper introduces SauLLM-7B, a large language model (LLM) specifically designed for legal text comprehension and generation. With 7 billion parameters, SauLLM-7B is the first LLM tailored for the legal domain, trained on an English legal corpus of over 30 billion tokens. The model demonstrates state-of-the-art proficiency in understanding and processing legal documents. The authors also present a novel instructional fine-tuning method that enhances SauLLM-7B's performance in legal tasks. SauLLM-7B is released under the MIT License, aiming to empower legal professionals and catalyze innovation in the intersection of AI and the legal community. The paper contributes to the legal domain by introducing a family of legal LLMs, an improved evaluation protocol for legal LLMs, and providing model, evaluation code, and licensing details. The evaluation of SauLLM-7B's legal capabilities is conducted using benchmarks and datasets, showing significant improvements over existing models.