This paper presents a method for uncertainty quantification in fine-tuned large language models (LLMs) using ensembles of low-rank adaptation (LoRA). The approach leverages Bayesian posterior approximations to analyze the uncertainty in LLMs after fine-tuning on multiple-choice datasets. The study focuses on three common datasets: CommonsenseQA (CQA), MMLU STEM, and MMLU Social Sciences. By using a pre-trained Mistral-7b model as a prior, the researchers fine-tune the model on these datasets and analyze the evolution of entropic uncertainty measures during and after fine-tuning.
The method involves using predictive entropy and mutual information to quantify uncertainty, with predictive entropy capturing both aleatoric and epistemic components, while mutual information is solely epistemic. These measures are calculated using a Bayesian posterior derived from an ensemble of LLMs fine-tuned with LoRA. The study hypothesizes that signals from entropic uncertainty measures can indicate the difficulty of data domains for a given architecture.
The results show that LoRA ensembles provide a computationally efficient way to approximate the posterior distribution of fine-tuned LLMs. The analysis of uncertainty measures reveals insights into the complexity of the datasets and the model's efficacy on different target domains. The study also highlights the importance of distinguishing between aleatoric and epistemic uncertainty in understanding the reliability of LLM predictions.
The paper contributes to the field by providing a principled approach to uncertainty quantification in fine-tuned LLMs, using Bayesian methods and LoRA ensembles. The findings suggest that uncertainty measures can help in assessing the reliability of model predictions and identifying areas where the model may be uncertain or overconfident. The study also emphasizes the need for further research into the limitations of LLMs and the importance of understanding the sources of uncertainty in their predictions.This paper presents a method for uncertainty quantification in fine-tuned large language models (LLMs) using ensembles of low-rank adaptation (LoRA). The approach leverages Bayesian posterior approximations to analyze the uncertainty in LLMs after fine-tuning on multiple-choice datasets. The study focuses on three common datasets: CommonsenseQA (CQA), MMLU STEM, and MMLU Social Sciences. By using a pre-trained Mistral-7b model as a prior, the researchers fine-tune the model on these datasets and analyze the evolution of entropic uncertainty measures during and after fine-tuning.
The method involves using predictive entropy and mutual information to quantify uncertainty, with predictive entropy capturing both aleatoric and epistemic components, while mutual information is solely epistemic. These measures are calculated using a Bayesian posterior derived from an ensemble of LLMs fine-tuned with LoRA. The study hypothesizes that signals from entropic uncertainty measures can indicate the difficulty of data domains for a given architecture.
The results show that LoRA ensembles provide a computationally efficient way to approximate the posterior distribution of fine-tuned LLMs. The analysis of uncertainty measures reveals insights into the complexity of the datasets and the model's efficacy on different target domains. The study also highlights the importance of distinguishing between aleatoric and epistemic uncertainty in understanding the reliability of LLM predictions.
The paper contributes to the field by providing a principled approach to uncertainty quantification in fine-tuned LLMs, using Bayesian methods and LoRA ensembles. The findings suggest that uncertainty measures can help in assessing the reliability of model predictions and identifying areas where the model may be uncertain or overconfident. The study also emphasizes the need for further research into the limitations of LLMs and the importance of understanding the sources of uncertainty in their predictions.