The Phi-3 Mini is a highly capable language model trained on 3.3 trillion tokens, achieving performance comparable to larger models like Mixtral 8x7B and GPT-3.5 despite its small size, making it deployable on modern smartphones. The model's innovation lies in its training dataset, which combines heavily filtered web data and synthetic data. Phi-3 Mini is further optimized for robustness, safety, and chat format. The report also introduces Phi-3-Small and Phi-3-Medium, models with 7B and 14B parameters, respectively, which are significantly more capable than Phi-3 Mini. Additionally, Phi-3-Vision, a 4.2B parameter model, is introduced, capable of reasoning about images and text prompts. The post-training process includes supervised fine-tuning and direct preference optimization to enhance safety and robustness. The model's performance is evaluated on various academic benchmarks, and its safety alignment is assessed through red-teaming and automated testing. Despite its strengths, Phi-3 Mini has limitations in storing factual knowledge and generating ungrounded outputs, which can be mitigated through additional training and data augmentation.The Phi-3 Mini is a highly capable language model trained on 3.3 trillion tokens, achieving performance comparable to larger models like Mixtral 8x7B and GPT-3.5 despite its small size, making it deployable on modern smartphones. The model's innovation lies in its training dataset, which combines heavily filtered web data and synthetic data. Phi-3 Mini is further optimized for robustness, safety, and chat format. The report also introduces Phi-3-Small and Phi-3-Medium, models with 7B and 14B parameters, respectively, which are significantly more capable than Phi-3 Mini. Additionally, Phi-3-Vision, a 4.2B parameter model, is introduced, capable of reasoning about images and text prompts. The post-training process includes supervised fine-tuning and direct preference optimization to enhance safety and robustness. The model's performance is evaluated on various academic benchmarks, and its safety alignment is assessed through red-teaming and automated testing. Despite its strengths, Phi-3 Mini has limitations in storing factual knowledge and generating ungrounded outputs, which can be mitigated through additional training and data augmentation.