2024 | Natasha Butt, Blazej Manczak, Auke Wiggers, Corrado Rainone, David W. Zhang, Michaël Defferrard, Taco Cohen
CodeIt is a novel and scalable method for self-improvement in language models, designed to solve tasks in the Abstraction and Reasoning Corpus (ARC). The method combines program sampling and hindsight relabeling with prioritized experience replay to improve performance. CodeIt iterates between sampling programs and relabeling their outcomes, then learning from prioritized experiences. This approach allows the model to generalize across tasks and improve over time. By using a domain-specific language (DSL) and a pretrained large language model (LLM), CodeIt effectively leverages both prior knowledge and data to solve ARC tasks. The method outperforms existing neural and symbolic baselines, solving 59 out of 400 ARC evaluation tasks and achieving state-of-the-art performance. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. The method uses a pretrained encoder-decoder LLM to generate programs from demonstration examples, and it incorporates experience replay to improve sample efficiency and avoid catastrophic forgetting. CodeIt also demonstrates the ability to refine solutions over time, finding shorter programs for 53% of solved tasks. The results show that CodeIt is capable of reasoning in the program space and generalizing between tasks. The method is effective in sparse reward settings and is scalable, making it a promising approach for self-improving language models.CodeIt is a novel and scalable method for self-improvement in language models, designed to solve tasks in the Abstraction and Reasoning Corpus (ARC). The method combines program sampling and hindsight relabeling with prioritized experience replay to improve performance. CodeIt iterates between sampling programs and relabeling their outcomes, then learning from prioritized experiences. This approach allows the model to generalize across tasks and improve over time. By using a domain-specific language (DSL) and a pretrained large language model (LLM), CodeIt effectively leverages both prior knowledge and data to solve ARC tasks. The method outperforms existing neural and symbolic baselines, solving 59 out of 400 ARC evaluation tasks and achieving state-of-the-art performance. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. The method uses a pretrained encoder-decoder LLM to generate programs from demonstration examples, and it incorporates experience replay to improve sample efficiency and avoid catastrophic forgetting. CodeIt also demonstrates the ability to refine solutions over time, finding shorter programs for 53% of solved tasks. The results show that CodeIt is capable of reasoning in the program space and generalizing between tasks. The method is effective in sparse reward settings and is scalable, making it a promising approach for self-improving language models.