[slides and audio] Exploring Precision and Recall to assess the quality and diversity of LLMs

This paper introduces a novel evaluation framework for Large Language Models (LLMs) such as Llama-2 and MISTRAL, focusing on adapting Precision and Recall metrics from image generation to text generation. The approach aims to assess the quality and diversity of generated text without relying on aligned corpora. The study evaluates state-of-the-art LLMs, revealing insights into their performance on open-ended generation tasks, which are not adequately captured by traditional benchmarks. The findings highlight a trade-off between the quality and diversity of generated samples, particularly when models are fine-tuned on instruction datasets or with human feedback. The work extends the toolkit for distribution-based NLP evaluation, offering practical insights into the capabilities and challenges of current LLMs in generating diverse and high-quality text. The authors release their code and data to facilitate further research.This paper introduces a novel evaluation framework for Large Language Models (LLMs) such as Llama-2 and MISTRAL, focusing on adapting Precision and Recall metrics from image generation to text generation. The approach aims to assess the quality and diversity of generated text without relying on aligned corpora. The study evaluates state-of-the-art LLMs, revealing insights into their performance on open-ended generation tasks, which are not adequately captured by traditional benchmarks. The findings highlight a trade-off between the quality and diversity of generated samples, particularly when models are fine-tuned on instruction datasets or with human feedback. The work extends the toolkit for distribution-based NLP evaluation, offering practical insights into the capabilities and challenges of current LLMs in generating diverse and high-quality text. The authors release their code and data to facilitate further research.

Exploring Precision and Recall to assess the quality and diversity of LLMs

4 Jun 2024 | Florian Le Bronnec*,1,2 Alexandre Verine*,1 Benjamin Negrevergne1 Yann Chevaleyre1 Alexandre Allauzen1

4 Jun 2024 | Florian Le Bronnec,1,2 Alexandre Verine,1 Benjamin Negrevergne1 Yann Chevaleyre1 Alexandre Allauzen1