Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics

Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics

23 Aug 2024 | Weijia Zhang, Mohammad Aliannejadi, Yifei Yuan, Jiahuan Pei, Jia-Hong Huang, Evangelos Kanoulas
The paper "Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics" addresses the challenge of assessing the effectiveness of citations in generated text from large language models (LLMs). The authors propose a comparative evaluation framework to assess the alignment between metric scores and human judgments in three support levels: full, partial, and no support. The framework employs correlation analysis, classification evaluation, and retrieval evaluation to comprehensively evaluate the performance of various faithfulness metrics. The results show that no single metric consistently excels across all evaluations, highlighting the complexity of automated citation evaluation. The authors provide practical recommendations for developing more effective metrics, including the development of training resources, the introduction of contrastive learning, and the creation of more explainable metrics. The study contributes to the field by systematically investigating the impact of fine-grained support levels on faithfulness metrics and offering insights into the limitations and improvements needed for automated citation evaluation.The paper "Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics" addresses the challenge of assessing the effectiveness of citations in generated text from large language models (LLMs). The authors propose a comparative evaluation framework to assess the alignment between metric scores and human judgments in three support levels: full, partial, and no support. The framework employs correlation analysis, classification evaluation, and retrieval evaluation to comprehensively evaluate the performance of various faithfulness metrics. The results show that no single metric consistently excels across all evaluations, highlighting the complexity of automated citation evaluation. The authors provide practical recommendations for developing more effective metrics, including the development of training resources, the introduction of contrastive learning, and the creation of more explainable metrics. The study contributes to the field by systematically investigating the impact of fine-grained support levels on faithfulness metrics and offering insights into the limitations and improvements needed for automated citation evaluation.
Reach us at info@study.space
Understanding Towards Fine-Grained Citation Evaluation in Generated Text%3A A Comparative Analysis of Faithfulness Metrics