June 2024 | SHAOHUA LI, THEODOROS THEODORIDIS, ZHENDONG SU
We introduce a novel approach for testing optimizing compilers using real-world code. The main idea is to construct well-formed programs by fusing multiple code snippets from real-world projects. The key insight is that real-world code exercises rich syntactical and semantic features, which current random program generators struggle to support. Our approach involves extracting real-world functions, injecting them into seed programs, and leveraging dynamic execution information to maintain semantics and build data dependencies. We implemented this in a tool called Creal, which identified 132 bugs in GCC and LLVM over nine months, with 121 confirmed and 101 fixed. Most bugs were miscompilations, many of which were long-latent and critical. Creal complements existing generators by boosting their expressiveness through real-world code injection. It uses a function database of over 51,000 real-world functions to generate well-formed programs. Creal's approach is effective in finding bugs that previous methods miss, demonstrating its value in compiler testing. The tool is open-sourced and can be applied to other compilers.We introduce a novel approach for testing optimizing compilers using real-world code. The main idea is to construct well-formed programs by fusing multiple code snippets from real-world projects. The key insight is that real-world code exercises rich syntactical and semantic features, which current random program generators struggle to support. Our approach involves extracting real-world functions, injecting them into seed programs, and leveraging dynamic execution information to maintain semantics and build data dependencies. We implemented this in a tool called Creal, which identified 132 bugs in GCC and LLVM over nine months, with 121 confirmed and 101 fixed. Most bugs were miscompilations, many of which were long-latent and critical. Creal complements existing generators by boosting their expressiveness through real-world code injection. It uses a function database of over 51,000 real-world functions to generate well-formed programs. Creal's approach is effective in finding bugs that previous methods miss, demonstrating its value in compiler testing. The tool is open-sourced and can be applied to other compilers.