[slides and audio] Automata-based constraints for language model decoding

The paper "Automata-based Constraints for Language Model Decoding" by Terry Koo, Frederick Liu, and Luheng He from Google DeepMind addresses the challenge of generating valid output in formal languages using language models (LMs). The authors propose a novel approach that leverages automata theory to constrain LM decoding, ensuring that the generated output adheres to specific formal grammars or schemas. This is particularly useful for tasks such as generating structured data, API calls, or code snippets. The main contributions of the paper include: 1. **Reformulating Detokenization as Transduction**: The authors reformulate detokenization, the process of converting token sequences back into text, as a finite-state transducer (FST). This reformulation allows for efficient and correct handling of tokenization issues. 2. **Finite-State Automata (FSAs) and Transducers (FSTs)**: The paper provides a detailed background on FSAs and FSTs, including their definitions, properties, and composition. It demonstrates how to adapt character-based FSAs to token-based FSTs, ensuring that the constraints are applied correctly. 3. **Regular Expressions and Grammar-Based Constraints**: The authors define a generic method to adapt any FSA from characters to tokens, allowing for the specification of constraints in regular expressions. They also introduce extensions to handle wildcard matching and syntactic sugar, enhancing the efficiency and expressiveness of the system. 4. **Push-Down Automata (PDAs)**: The paper extends the approach to grammar-based constraints using PDAs, which can handle more complex grammars. It discusses the challenges and solutions for adapting PDAs to tokenization and ensuring deterministic behavior. The paper compares the proposed method with existing approaches, such as Outlines (Willard & Louf, 2023), highlighting its advantages in terms of speed, correctness, and extensibility. The system is shown to be significantly faster, more accurate, and modular compared to previous methods, making it suitable for large-scale deployment. The authors also provide empirical evaluations to demonstrate the effectiveness of their approach, including speed measurements and correctness tests using the Gemma model. The paper concludes with a discussion on future work, including further exploration of PDAs and potential applications in JSON expressions, Python dataclasses, and speculative decoding.The paper "Automata-based Constraints for Language Model Decoding" by Terry Koo, Frederick Liu, and Luheng He from Google DeepMind addresses the challenge of generating valid output in formal languages using language models (LMs). The authors propose a novel approach that leverages automata theory to constrain LM decoding, ensuring that the generated output adheres to specific formal grammars or schemas. This is particularly useful for tasks such as generating structured data, API calls, or code snippets. The main contributions of the paper include: 1. **Reformulating Detokenization as Transduction**: The authors reformulate detokenization, the process of converting token sequences back into text, as a finite-state transducer (FST). This reformulation allows for efficient and correct handling of tokenization issues. 2. **Finite-State Automata (FSAs) and Transducers (FSTs)**: The paper provides a detailed background on FSAs and FSTs, including their definitions, properties, and composition. It demonstrates how to adapt character-based FSAs to token-based FSTs, ensuring that the constraints are applied correctly. 3. **Regular Expressions and Grammar-Based Constraints**: The authors define a generic method to adapt any FSA from characters to tokens, allowing for the specification of constraints in regular expressions. They also introduce extensions to handle wildcard matching and syntactic sugar, enhancing the efficiency and expressiveness of the system. 4. **Push-Down Automata (PDAs)**: The paper extends the approach to grammar-based constraints using PDAs, which can handle more complex grammars. It discusses the challenges and solutions for adapting PDAs to tokenization and ensuring deterministic behavior. The paper compares the proposed method with existing approaches, such as Outlines (Willard & Louf, 2023), highlighting its advantages in terms of speed, correctness, and extensibility. The system is shown to be significantly faster, more accurate, and modular compared to previous methods, making it suitable for large-scale deployment. The authors also provide empirical evaluations to demonstrate the effectiveness of their approach, including speed measurements and correctness tests using the Gemma model. The paper concludes with a discussion on future work, including further exploration of PDAs and potential applications in JSON expressions, Python dataclasses, and speculative decoding.

Automata-based constraints for language model decoding

5 Aug 2024 | Terry Koo*, Frederick Liu*, and Luheng He

5 Aug 2024 | Terry Koo, Frederick Liu, and Luheng He