**ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval**
This paper introduces ChatRetriever, a large conversational retrieval model adapted from large language models (LLMs). The model leverages the strong generalization capabilities of LLMs to robustly represent complex conversational sessions for dense retrieval. To achieve this, the authors propose a dual-learning approach called Contrastive Session-Masked Instruction Tuning (CSIT), which combines contrastive learning and session-masked instruction tuning. CSIT fine-tunes LLMs on high-quality conversational instruction tuning data, enhancing the model's ability to understand complex conversational sessions and improve retrieval performance.
**Key Contributions:**
1. **ChatRetriever:** The first LLM-adapted conversational dense retriever that outperforms existing methods and achieves state-of-the-art performance comparable to LLM-based rewriting approaches.
2. **CSIT:** A novel dual-learning approach that enhances complex session representation and generalization.
3. **Robustness Evaluation:** Two robustness evaluation methods are developed to assess the model's resilience in handling diverse conversational contexts.
**Experiments:**
- **Datasets:** Five conversational search benchmarks (QReCC, TopiOCQA, CasT-19, CasT-20, CasT-21) were used for evaluation.
- **Results:** ChatRetriever significantly outperforms existing conversational dense retrievers and LLM-based rewriting approaches, achieving absolute NDCG@3 improvements of 6.8% and 12.2% on CAst-20 and CAst-21, respectively.
- **Robustness:** ChatRetriever demonstrates superior robustness in handling varied conversational contexts, with smaller standard deviations in performance compared to baselines.
**Discussion:**
- **Efficiency:** The large model size of ChatRetriever (7B parameters) incurs higher time and storage costs but potentially reduces the need for extensive passage re-ranking.
- **Hard Negatives:** The use of hard negatives is crucial for improving retrieval performance, and better strategies for mining hard negatives tailored to instruction tuning data are needed.
- **Generalizability:** While ChatRetriever performs well in complex conversational retrieval tasks, it still has limitations in following complex retrieval instructions and addressing very detailed information needs.
**Conclusion:**
ChatRetriever leverages the powerful capabilities of LLMs for conversational dense retrieval, demonstrating superior performance and robustness. Future work will focus on further enhancing its generalization capabilities and exploring its potential in broader IR scenarios.**ChatRetriever: Adapting Large Language Models for Generalized and Robust Conversational Dense Retrieval**
This paper introduces ChatRetriever, a large conversational retrieval model adapted from large language models (LLMs). The model leverages the strong generalization capabilities of LLMs to robustly represent complex conversational sessions for dense retrieval. To achieve this, the authors propose a dual-learning approach called Contrastive Session-Masked Instruction Tuning (CSIT), which combines contrastive learning and session-masked instruction tuning. CSIT fine-tunes LLMs on high-quality conversational instruction tuning data, enhancing the model's ability to understand complex conversational sessions and improve retrieval performance.
**Key Contributions:**
1. **ChatRetriever:** The first LLM-adapted conversational dense retriever that outperforms existing methods and achieves state-of-the-art performance comparable to LLM-based rewriting approaches.
2. **CSIT:** A novel dual-learning approach that enhances complex session representation and generalization.
3. **Robustness Evaluation:** Two robustness evaluation methods are developed to assess the model's resilience in handling diverse conversational contexts.
**Experiments:**
- **Datasets:** Five conversational search benchmarks (QReCC, TopiOCQA, CasT-19, CasT-20, CasT-21) were used for evaluation.
- **Results:** ChatRetriever significantly outperforms existing conversational dense retrievers and LLM-based rewriting approaches, achieving absolute NDCG@3 improvements of 6.8% and 12.2% on CAst-20 and CAst-21, respectively.
- **Robustness:** ChatRetriever demonstrates superior robustness in handling varied conversational contexts, with smaller standard deviations in performance compared to baselines.
**Discussion:**
- **Efficiency:** The large model size of ChatRetriever (7B parameters) incurs higher time and storage costs but potentially reduces the need for extensive passage re-ranking.
- **Hard Negatives:** The use of hard negatives is crucial for improving retrieval performance, and better strategies for mining hard negatives tailored to instruction tuning data are needed.
- **Generalizability:** While ChatRetriever performs well in complex conversational retrieval tasks, it still has limitations in following complex retrieval instructions and addressing very detailed information needs.
**Conclusion:**
ChatRetriever leverages the powerful capabilities of LLMs for conversational dense retrieval, demonstrating superior performance and robustness. Future work will focus on further enhancing its generalization capabilities and exploring its potential in broader IR scenarios.