Understanding Guiding LLMs The Right Way%3A Fast%2C Non-Invasive Constrained Generation

This paper addresses the challenges of constrained decoding in large language models (LLMs), particularly the issue of token misalignment between LLM sub-word tokens and external constraints. Constrained decoding methods often suffer from performance overhead and accuracy degradation due to incorrect alignment. To tackle this, the authors propose DOMINO, a novel constrained decoding algorithm that enforces constraints in a fully subword-aligned manner while leveraging pre-computation and speculative decoding to achieve minimal overhead and even speedup over unconstrained decoding. The key contributions of the paper include: 1. Identifying the challenges of constrained decoding, such as token misalignment and the need for efficient alignment. 2. Proposing DOMINO, a minimally invasive constrained decoding algorithm that enforces context-free grammars with low overhead. 3. Conducting extensive evaluations showing that DOMINO is minimally invasive, low-overhead, and significantly outperforms other methods in terms of accuracy and throughput. The paper also discusses the challenges of existing constrained decoding approaches, including regex-based, online parser-guided, and template-based methods, and highlights their limitations. DOMINO addresses these issues by using a character scanner and parser to enforce constraints dynamically during generation, ensuring that the generated output adheres to the specified grammar while minimizing invasiveness. Experimental results on datasets like GSM8K and CoNLL-2003 demonstrate that DOMINO achieves high accuracy and throughput, outperforming baselines such as GUIDANCE and Llama.CPP. The paper concludes by emphasizing the importance of minimally invasive, highly-efficient constrained decoding methods and the potential societal impact of this work.This paper addresses the challenges of constrained decoding in large language models (LLMs), particularly the issue of token misalignment between LLM sub-word tokens and external constraints. Constrained decoding methods often suffer from performance overhead and accuracy degradation due to incorrect alignment. To tackle this, the authors propose DOMINO, a novel constrained decoding algorithm that enforces constraints in a fully subword-aligned manner while leveraging pre-computation and speculative decoding to achieve minimal overhead and even speedup over unconstrained decoding. The key contributions of the paper include: 1. Identifying the challenges of constrained decoding, such as token misalignment and the need for efficient alignment. 2. Proposing DOMINO, a minimally invasive constrained decoding algorithm that enforces context-free grammars with low overhead. 3. Conducting extensive evaluations showing that DOMINO is minimally invasive, low-overhead, and significantly outperforms other methods in terms of accuracy and throughput. The paper also discusses the challenges of existing constrained decoding approaches, including regex-based, online parser-guided, and template-based methods, and highlights their limitations. DOMINO addresses these issues by using a character scanner and parser to enforce constraints dynamically during generation, ensuring that the generated output adheres to the specified grammar while minimizing invasiveness. Experimental results on datasets like GSM8K and CoNLL-2003 demonstrate that DOMINO achieves high accuracy and throughput, outperforming baselines such as GUIDANCE and Llama.CPP. The paper concludes by emphasizing the importance of minimally invasive, highly-efficient constrained decoding methods and the potential societal impact of this work.

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

7 Feb 2024 | Luca Beurer-Kellner, Marc Fischer, Martin Vechev