May 14th, 2024 | Marco Polignano, Pierpaolo Basile, Giovanni Semeraro
The paper introduces LLaMAntino-3-ANITA-8B-Inst-DPO-ITA, a state-of-the-art Large Language Model (LLM) tailored for the Italian language. Based on the Meta LLaMA-3 model, it was fine-tuned using Supervised Fine-tuning (SFT) on English and Italian datasets to enhance performance. A Dynamic Preference Optimization (DPO) process was then applied to align the model with user preferences, avoid inappropriate responses, and reduce biases. The model leverages QLoRA for efficient fine-tuning, reducing computational requirements while maintaining performance. The synergy of SFT, QLoRA, and DPO results in a robust LLM that excels in tasks like text completion, zero-shot classification, and contextual understanding. The model was evaluated on Italian and English benchmarks, showing strong results. It is available on HuggingFace and GitHub. The paper also discusses related work, including PEFT methods like LoRA and QLoRA, and DPO for preference optimization. The model was adapted to Italian using a dataset of 100k examples, achieving performance comparable to other Italian LLMs. Evaluation on various benchmarks, including Winogrande, TruthfulQA, and HellaSwag, shows the model's effectiveness. The model is applied in scenarios like Retrieval-Augmented Generation, Topic Modeling, Sentiment Analysis, Recommender Systems, and Chit-Chat. The paper concludes that the model demonstrates robust performance in Italian language tasks and has potential for broader applications.The paper introduces LLaMAntino-3-ANITA-8B-Inst-DPO-ITA, a state-of-the-art Large Language Model (LLM) tailored for the Italian language. Based on the Meta LLaMA-3 model, it was fine-tuned using Supervised Fine-tuning (SFT) on English and Italian datasets to enhance performance. A Dynamic Preference Optimization (DPO) process was then applied to align the model with user preferences, avoid inappropriate responses, and reduce biases. The model leverages QLoRA for efficient fine-tuning, reducing computational requirements while maintaining performance. The synergy of SFT, QLoRA, and DPO results in a robust LLM that excels in tasks like text completion, zero-shot classification, and contextual understanding. The model was evaluated on Italian and English benchmarks, showing strong results. It is available on HuggingFace and GitHub. The paper also discusses related work, including PEFT methods like LoRA and QLoRA, and DPO for preference optimization. The model was adapted to Italian using a dataset of 100k examples, achieving performance comparable to other Italian LLMs. Evaluation on various benchmarks, including Winogrande, TruthfulQA, and HellaSwag, shows the model's effectiveness. The model is applied in scenarios like Retrieval-Augmented Generation, Topic Modeling, Sentiment Analysis, Recommender Systems, and Chit-Chat. The paper concludes that the model demonstrates robust performance in Italian language tasks and has potential for broader applications.