July 14–18, 2024, Washington, DC, USA | Hansi Zeng, Chen Luo, Hamed Zamani
This paper introduces PAG (Planning Ahead in Generative Retrieval), a novel optimization and decoding approach that enhances autoregressive generation in generative retrieval models. PAG constructs set-based and sequential identifiers for each document, with the set-based identifier built on lexical tokens and the sequential identifier derived from relevance-based representations. The key innovation is simultaneous decoding, which approximates document-level scores using set-based identifiers and guides autoregressive decoding with these scores. This approach reduces the likelihood of relevant prefixes being pruned by beam search, improving retrieval effectiveness. Extensive experiments on MSMARCO and TREC Deep Learning Tracking data show that PAG outperforms state-of-the-art generative retrieval models (e.g., RIPOR) by a significant margin (15.6% MRR improvement on MS MARCO) while achieving 22× speedup in query latency. PAG also demonstrates superior performance over dense retrieval models in terms of MRR and memory efficiency. The paper includes a thorough analysis of the impact of various components and configurations, highlighting the effectiveness of combining lexical and semantic information for retrieval.This paper introduces PAG (Planning Ahead in Generative Retrieval), a novel optimization and decoding approach that enhances autoregressive generation in generative retrieval models. PAG constructs set-based and sequential identifiers for each document, with the set-based identifier built on lexical tokens and the sequential identifier derived from relevance-based representations. The key innovation is simultaneous decoding, which approximates document-level scores using set-based identifiers and guides autoregressive decoding with these scores. This approach reduces the likelihood of relevant prefixes being pruned by beam search, improving retrieval effectiveness. Extensive experiments on MSMARCO and TREC Deep Learning Tracking data show that PAG outperforms state-of-the-art generative retrieval models (e.g., RIPOR) by a significant margin (15.6% MRR improvement on MS MARCO) while achieving 22× speedup in query latency. PAG also demonstrates superior performance over dense retrieval models in terms of MRR and memory efficiency. The paper includes a thorough analysis of the impact of various components and configurations, highlighting the effectiveness of combining lexical and semantic information for retrieval.