This paper introduces AUTOIF, a scalable and reliable method for automatically generating instruction-following training data for large language models (LLMs). AUTOIF transforms the validation of instruction-following data quality into code verification, requiring LLMs to generate instructions, corresponding verification codes, and unit test samples. Execution feedback-based rejection sampling is then used to generate data for Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) training. The method achieves significant improvements across three training algorithms—SFT, Offline DPO, and Online DPO—when applied to top open-source LLMs, Qwen2 and LLaMA3, in both self-alignment and strong-to-weak distillation settings. The code for AUTOIF is publicly available at <https://github.com/QwenLM/AutoIF>.This paper introduces AUTOIF, a scalable and reliable method for automatically generating instruction-following training data for large language models (LLMs). AUTOIF transforms the validation of instruction-following data quality into code verification, requiring LLMs to generate instructions, corresponding verification codes, and unit test samples. Execution feedback-based rejection sampling is then used to generate data for Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) training. The method achieves significant improvements across three training algorithms—SFT, Offline DPO, and Online DPO—when applied to top open-source LLMs, Qwen2 and LLaMA3, in both self-alignment and strong-to-weak distillation settings. The code for AUTOIF is publicly available at <https://github.com/QwenLM/AutoIF>.