3 May 2024 | Margarida M. Campos, António Farinhas, Chrysoula Zerva, Mário A.T. Figueiredo, André F.T. Martins
This paper provides a comprehensive survey of conformal prediction (CP) techniques and their applications in natural language processing (NLP). CP is a model-agnostic and distribution-free framework that offers statistical guarantees for uncertainty quantification, addressing key challenges in NLP systems such as hallucinations, poor calibration, and unreliable explanations. Unlike traditional methods, CP provides prediction sets that include the true label with a specified probability, ensuring reliability in critical applications.
CP is built on three core components: a trained predictor, a calibration set, and a non-conformity score. The non-conformity score measures how unlikely an input-output pair is compared to the remaining data. Using this score, CP generates prediction sets that include the true label with high probability. Theoretical guarantees ensure that the coverage probability is at least $1 - \alpha$, assuming data exchangeability.
CP has been extended to handle non-exchangeable data, providing conditional coverage guarantees and addressing issues like class imbalance and fairness. These extensions include Mondrian CP, which provides coverage guarantees for different data categories, and conformal risk control, which allows for error control beyond coverage.
In NLP applications, CP has been used for text classification, sequence tagging, and natural language generation. For example, CP has been applied to binary text classification, multilabel tasks, and document retrieval, providing reliable prediction sets and calibration measures. In natural language generation, CP helps mitigate hallucinations by providing confidence intervals and ensuring the generated text is reliable.
CP is also useful for uncertainty-based evaluation, allowing for the comparison of different models and the assessment of their confidence in predictions. Additionally, CP can be used to improve inference efficiency by reducing the computational cost of NLP models while maintaining performance.
Despite its advantages, CP faces challenges in generation tasks and data limitations. Future research directions include improving CP for human-computer interaction, handling label variation, ensuring fairness, and addressing data limitations. The paper highlights the potential of CP in NLP and calls for further research to explore its applications and challenges.This paper provides a comprehensive survey of conformal prediction (CP) techniques and their applications in natural language processing (NLP). CP is a model-agnostic and distribution-free framework that offers statistical guarantees for uncertainty quantification, addressing key challenges in NLP systems such as hallucinations, poor calibration, and unreliable explanations. Unlike traditional methods, CP provides prediction sets that include the true label with a specified probability, ensuring reliability in critical applications.
CP is built on three core components: a trained predictor, a calibration set, and a non-conformity score. The non-conformity score measures how unlikely an input-output pair is compared to the remaining data. Using this score, CP generates prediction sets that include the true label with high probability. Theoretical guarantees ensure that the coverage probability is at least $1 - \alpha$, assuming data exchangeability.
CP has been extended to handle non-exchangeable data, providing conditional coverage guarantees and addressing issues like class imbalance and fairness. These extensions include Mondrian CP, which provides coverage guarantees for different data categories, and conformal risk control, which allows for error control beyond coverage.
In NLP applications, CP has been used for text classification, sequence tagging, and natural language generation. For example, CP has been applied to binary text classification, multilabel tasks, and document retrieval, providing reliable prediction sets and calibration measures. In natural language generation, CP helps mitigate hallucinations by providing confidence intervals and ensuring the generated text is reliable.
CP is also useful for uncertainty-based evaluation, allowing for the comparison of different models and the assessment of their confidence in predictions. Additionally, CP can be used to improve inference efficiency by reducing the computational cost of NLP models while maintaining performance.
Despite its advantages, CP faces challenges in generation tasks and data limitations. Future research directions include improving CP for human-computer interaction, handling label variation, ensuring fairness, and addressing data limitations. The paper highlights the potential of CP in NLP and calls for further research to explore its applications and challenges.