Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding

Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding

July 14-18, 2024 | Hansi Zeng, Chen Luo, Hamed Zamani
This paper introduces PAG, a novel optimization and decoding approach for generative retrieval that guides autoregressive generation through simultaneous decoding of document identifiers. PAG constructs set-based and sequential identifiers for each document. The set-based identifier is built from lexical tokens, while the sequential identifier is derived from relevance-based representations. Extensive experiments on MSMARCO and TREC Deep Learning Track data show that PAG outperforms the state-of-the-art generative retrieval model, RIPOR, by 15.6% in MRR@10 on MSMARCO Dev and 12.3% and 10.9% in NDCG@10 on TREC-DL 2019 and 2020, respectively. PAG achieves a 22× speedup in query latency using a 10× smaller beam size. PAG also demonstrates improvements over several effective dense retrieval models, achieving 11.9%, 4.1%, and 14.9% MRR@10 improvements on the MSMARCO Dev set compared to TAS-B, RocketQA, and TCT-ColBERT. PAG is more memory-efficient, requiring 7.7× less memory to index the entire corpus compared to single-vector dense retrieval models. PAG's approach uses simultaneous decoding to approximate document-level scores, which helps reduce the likelihood of relevant prefixes being pruned by beam search. The paper also presents ablation studies and analysis showing the impact of PAG's design decisions. PAG's framework includes a three-stage optimization pipeline for set-based and sequential DocID generation, and a unified training process for joint decoding. The results demonstrate that PAG's planning-ahead constrained beam search significantly improves retrieval effectiveness and efficiency.This paper introduces PAG, a novel optimization and decoding approach for generative retrieval that guides autoregressive generation through simultaneous decoding of document identifiers. PAG constructs set-based and sequential identifiers for each document. The set-based identifier is built from lexical tokens, while the sequential identifier is derived from relevance-based representations. Extensive experiments on MSMARCO and TREC Deep Learning Track data show that PAG outperforms the state-of-the-art generative retrieval model, RIPOR, by 15.6% in MRR@10 on MSMARCO Dev and 12.3% and 10.9% in NDCG@10 on TREC-DL 2019 and 2020, respectively. PAG achieves a 22× speedup in query latency using a 10× smaller beam size. PAG also demonstrates improvements over several effective dense retrieval models, achieving 11.9%, 4.1%, and 14.9% MRR@10 improvements on the MSMARCO Dev set compared to TAS-B, RocketQA, and TCT-ColBERT. PAG is more memory-efficient, requiring 7.7× less memory to index the entire corpus compared to single-vector dense retrieval models. PAG's approach uses simultaneous decoding to approximate document-level scores, which helps reduce the likelihood of relevant prefixes being pruned by beam search. The paper also presents ablation studies and analysis showing the impact of PAG's design decisions. PAG's framework includes a three-stage optimization pipeline for set-based and sequential DocID generation, and a unified training process for joint decoding. The results demonstrate that PAG's planning-ahead constrained beam search significantly improves retrieval effectiveness and efficiency.
Reach us at info@study.space
[slides and audio] Planning Ahead in Generative Retrieval%3A Guiding Autoregressive Generation through Simultaneous Decoding