27 Jun 2024 | Chris Cummins, Volker Seeker, Dejan Grubisic, Baptiste Rozière, Jonas Gehring, Gabriel Synnaeve, Hugh Leather
Meta introduces LLM COMPILER, a family of pre-trained models designed for code optimization tasks. Built on Code LLAMA, LLM COMPILER is trained on 546 billion tokens of LLVM-IR and assembly code, and fine-tuned for compiler optimization. It is available in two sizes: 7B and 13B parameters. The model achieves 77% of the optimizing potential of an autotuning search and 45% disassembly round trip accuracy. LLM COMPILER is released under a bespoke commercial license, allowing wide reuse. It is fine-tuned for two downstream tasks: flag tuning for code size optimization and disassembling x86_64 and ARM assembly back into LLVM-IR. The model outperforms CODE LLAMA and GPT-4 Turbo on both tasks. LLM COMPILER provides a scalable, cost-effective foundation for further research and development in compiler optimization. It is trained on a vast corpus of compiler-centric data, with additional fine-tuning for downstream tasks. The model is evaluated on flag tuning and disassembly tasks, showing significant improvements over -Oz. It also performs well on foundation model tasks like next-token prediction and compiler emulation. LLM COMPILER is a robust, pre-trained platform that enhances the understanding and manipulation of compiler intermediate representations and assembly language. The model is available for both academic researchers and industry practitioners.Meta introduces LLM COMPILER, a family of pre-trained models designed for code optimization tasks. Built on Code LLAMA, LLM COMPILER is trained on 546 billion tokens of LLVM-IR and assembly code, and fine-tuned for compiler optimization. It is available in two sizes: 7B and 13B parameters. The model achieves 77% of the optimizing potential of an autotuning search and 45% disassembly round trip accuracy. LLM COMPILER is released under a bespoke commercial license, allowing wide reuse. It is fine-tuned for two downstream tasks: flag tuning for code size optimization and disassembling x86_64 and ARM assembly back into LLVM-IR. The model outperforms CODE LLAMA and GPT-4 Turbo on both tasks. LLM COMPILER provides a scalable, cost-effective foundation for further research and development in compiler optimization. It is trained on a vast corpus of compiler-centric data, with additional fine-tuning for downstream tasks. The model is evaluated on flag tuning and disassembly tasks, showing significant improvements over -Oz. It also performs well on foundation model tasks like next-token prediction and compiler emulation. LLM COMPILER is a robust, pre-trained platform that enhances the understanding and manipulation of compiler intermediate representations and assembly language. The model is available for both academic researchers and industry practitioners.