[slides] GPT-4 Technical Report

The technical report introduces GPT-4, a large-scale, multimodal model capable of processing both image and text inputs and generating text outputs. GPT-4 demonstrates human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score in the top 10% of test takers. The model is a Transformer-based architecture pre-trained to predict the next token in a document, with post-training alignment improving factuality and adherence to desired behavior. A key focus of the project was developing infrastructure and optimization methods that scale predictably across different scales, allowing for accurate predictions of GPT-4's performance based on smaller models trained with less compute. GPT-4 outperforms existing large language models on traditional NLP benchmarks and shows strong performance in multiple languages. Despite its capabilities, GPT-4 has limitations, including hallucinations, a limited context window, and lack of experience learning. The report discusses safety challenges and mitigations, including adversarial testing with domain experts and a model-assisted safety pipeline. GPT-4's visual input capabilities are also highlighted, and the report concludes by characterizing GPT-4's performance and potential societal impacts.The technical report introduces GPT-4, a large-scale, multimodal model capable of processing both image and text inputs and generating text outputs. GPT-4 demonstrates human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score in the top 10% of test takers. The model is a Transformer-based architecture pre-trained to predict the next token in a document, with post-training alignment improving factuality and adherence to desired behavior. A key focus of the project was developing infrastructure and optimization methods that scale predictably across different scales, allowing for accurate predictions of GPT-4's performance based on smaller models trained with less compute. GPT-4 outperforms existing large language models on traditional NLP benchmarks and shows strong performance in multiple languages. Despite its capabilities, GPT-4 has limitations, including hallucinations, a limited context window, and lack of experience learning. The report discusses safety challenges and mitigations, including adversarial testing with domain experts and a model-assisted safety pipeline. GPT-4's visual input capabilities are also highlighted, and the report concludes by characterizing GPT-4's performance and potential societal impacts.

GPT-4 Technical Report

4 Mar 2024 | OpenAI