Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

7 Feb 2024 | Luca Beurer-Kellner, Marc Fischer, Martin Vechev
This paper introduces DOMINO, a novel constrained decoding algorithm that enables efficient, minimally-invasive generation of text adhering to specific syntactic constraints. Unlike existing methods that impose strict formal language constraints during generation, DOMINO aligns sub-word tokens with the underlying grammar, reducing performance overhead and improving task accuracy. By leveraging pre-computation and speculative decoding, DOMINO achieves virtually no overhead and in some cases even a 2x speedup over unconstrained decoding, outperforming existing approaches significantly. The key challenge in constrained decoding is ensuring that the LLM's sub-word tokenization aligns with the syntactic constraints. DOMINO addresses this by constructing subterminal trees that track the progress of tokenization and allow for efficient, minimally-invasive constraint enforcement. This approach avoids the token misalignment issues seen in other methods, which can lead to sub-optimal tokenization and reduced task accuracy. DOMINO is evaluated on several benchmark datasets, including GSM8K and CoNLL2003, demonstrating its effectiveness in maintaining high task accuracy while significantly improving inference throughput. The algorithm is also shown to be efficient, with minimal overhead and even faster generation in some cases. Additionally, DOMINO supports speculative decoding and opportunistic masking, further enhancing its performance and efficiency. Overall, DOMINO provides a highly efficient and effective solution for constrained generation, enabling the use of large language models in a wide range of applications without the need for extensive fine-tuning or additional post-processing.This paper introduces DOMINO, a novel constrained decoding algorithm that enables efficient, minimally-invasive generation of text adhering to specific syntactic constraints. Unlike existing methods that impose strict formal language constraints during generation, DOMINO aligns sub-word tokens with the underlying grammar, reducing performance overhead and improving task accuracy. By leveraging pre-computation and speculative decoding, DOMINO achieves virtually no overhead and in some cases even a 2x speedup over unconstrained decoding, outperforming existing approaches significantly. The key challenge in constrained decoding is ensuring that the LLM's sub-word tokenization aligns with the syntactic constraints. DOMINO addresses this by constructing subterminal trees that track the progress of tokenization and allow for efficient, minimally-invasive constraint enforcement. This approach avoids the token misalignment issues seen in other methods, which can lead to sub-optimal tokenization and reduced task accuracy. DOMINO is evaluated on several benchmark datasets, including GSM8K and CoNLL2003, demonstrating its effectiveness in maintaining high task accuracy while significantly improving inference throughput. The algorithm is also shown to be efficient, with minimal overhead and even faster generation in some cases. Additionally, DOMINO supports speculative decoding and opportunistic masking, further enhancing its performance and efficiency. Overall, DOMINO provides a highly efficient and effective solution for constrained generation, enabling the use of large language models in a wide range of applications without the need for extensive fine-tuning or additional post-processing.
Reach us at info@study.space
Understanding Guiding LLMs The Right Way%3A Fast%2C Non-Invasive Constrained Generation