July 14-18, 2024 | Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke
This paper investigates ranked list truncation (RLT) in the context of re-ranking, particularly in the "retrieve-then-re-rank" setup. RLT is crucial for re-ranking as it can improve efficiency by sending variable-length candidate lists to a re-ranker on a per-query basis and may also enhance effectiveness by reducing the number of irrelevant items in the input list. Despite its importance, there is limited research on applying RLT methods to this new perspective. The authors reproduce existing RLT methods in the context of re-ranking, especially newly emerged large language model (LLM)-based re-ranking. They examine the generalizability of established findings on RLT for retrieval to the "retrieve-then-re-rank" setup from three perspectives: (i) assessing RLT methods in the context of LLM-based re-ranking with lexical first-stage retrieval, (ii) investigating the impact of different types of first-stage retrievers on RLT methods, and (iii) investigating the impact of different types of re-rankers on RLT methods. They perform experiments on the TREC 2019 and 2020 deep learning tracks, investigating 8 RLT methods for pipelines involving 3 retrievers and 2 re-rankers. The results show that findings on RLT do not generalize well to the "retrieve-then-re-rank" setup. For example, supervised RLT methods do not show a clear advantage over using a fixed re-ranking depth. The choice of retriever has a substantial impact on RLT for re-ranking: with an effective retriever like SPLADE++ or RepLLaMA, a fixed re-ranking depth of 20 can already yield an excellent effectiveness/efficiency trade-off. The authors also find that distribution-based supervised RLT methods perform better than their sequential labeling-based counterparts in most cases. They conclude that RLT methods need to be improved for re-ranking, and future work includes exploring query performance prediction methods for predicting query-specific re-ranking cut-offs, RLT for pair-wise and list-wise LLM-based re-rankers, and RLT for re-ranking in conversational search.This paper investigates ranked list truncation (RLT) in the context of re-ranking, particularly in the "retrieve-then-re-rank" setup. RLT is crucial for re-ranking as it can improve efficiency by sending variable-length candidate lists to a re-ranker on a per-query basis and may also enhance effectiveness by reducing the number of irrelevant items in the input list. Despite its importance, there is limited research on applying RLT methods to this new perspective. The authors reproduce existing RLT methods in the context of re-ranking, especially newly emerged large language model (LLM)-based re-ranking. They examine the generalizability of established findings on RLT for retrieval to the "retrieve-then-re-rank" setup from three perspectives: (i) assessing RLT methods in the context of LLM-based re-ranking with lexical first-stage retrieval, (ii) investigating the impact of different types of first-stage retrievers on RLT methods, and (iii) investigating the impact of different types of re-rankers on RLT methods. They perform experiments on the TREC 2019 and 2020 deep learning tracks, investigating 8 RLT methods for pipelines involving 3 retrievers and 2 re-rankers. The results show that findings on RLT do not generalize well to the "retrieve-then-re-rank" setup. For example, supervised RLT methods do not show a clear advantage over using a fixed re-ranking depth. The choice of retriever has a substantial impact on RLT for re-ranking: with an effective retriever like SPLADE++ or RepLLaMA, a fixed re-ranking depth of 20 can already yield an excellent effectiveness/efficiency trade-off. The authors also find that distribution-based supervised RLT methods perform better than their sequential labeling-based counterparts in most cases. They conclude that RLT methods need to be improved for re-ranking, and future work includes exploring query performance prediction methods for predicting query-specific re-ranking cut-offs, RLT for pair-wise and list-wise LLM-based re-rankers, and RLT for re-ranking in conversational search.