Understanding Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion

This paper investigates the ability of transformer-based models to learn structural recursion from examples. Structural recursion is a fundamental concept in programming languages and formal mathematics, where symbolic tools currently outperform neural models in tasks such as inferring semantic relations between datatypes and emulating program behavior. The authors introduce a general framework that connects the abstract concepts of structural recursion in programming languages to concrete sequence modeling problems and the behavior of learned models. The framework includes a representation that captures the syntax of structural recursion and two frameworks for understanding its semantics: one based on stepwise reduction and another using Abstract State Machines (ASMs). The authors identify issues with current models, noting that they cannot fully capture recursion and instead fit shortcut algorithms, leading to difficulties in solving edge cases. They also find that state-of-the-art large language models (LLMs) struggle to mine recursive rules from in-context demonstrations and fail in interesting ways when emulating reduction (step-wise computation) of recursive functions. The paper's contributions are twofold: first, it introduces a comprehensive framework for representing and reasoning about structural recursion with sequence models; second, it conducts an empirical investigation using this framework to analyze the algorithmic behaviors of learned models on structurally recursive tasks. The framework and empirical results provide insights into how to better handle recursion with sequence models, paving the way for more reliable performance on tasks fundamental to recursion.This paper investigates the ability of transformer-based models to learn structural recursion from examples. Structural recursion is a fundamental concept in programming languages and formal mathematics, where symbolic tools currently outperform neural models in tasks such as inferring semantic relations between datatypes and emulating program behavior. The authors introduce a general framework that connects the abstract concepts of structural recursion in programming languages to concrete sequence modeling problems and the behavior of learned models. The framework includes a representation that captures the syntax of structural recursion and two frameworks for understanding its semantics: one based on stepwise reduction and another using Abstract State Machines (ASMs). The authors identify issues with current models, noting that they cannot fully capture recursion and instead fit shortcut algorithms, leading to difficulties in solving edge cases. They also find that state-of-the-art large language models (LLMs) struggle to mine recursive rules from in-context demonstrations and fail in interesting ways when emulating reduction (step-wise computation) of recursive functions. The paper's contributions are twofold: first, it introduces a comprehensive framework for representing and reasoning about structural recursion with sequence models; second, it conducts an empirical investigation using this framework to analyze the algorithmic behaviors of learned models on structurally recursive tasks. The framework and empirical results provide insights into how to better handle recursion with sequence models, paving the way for more reliable performance on tasks fundamental to recursion.

Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion

23 Jan 2024 | Shizhuo Dylan Zhang, Curt Tigges, Zory Zhang, Stella Biderman, Maxim Raginsky, Talia Ringer