FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

28 Feb 2024 | Congying Xia1*, Chen Xing1*, Jiangshu Du2, Xinyi Yang1, Yihao Feng1, Ran Xu1, Wenpeng Yin3, Caiming Xiong1
This paper introduces FoFO, a pioneering benchmark designed to evaluate large language models' (LLMs) ability to follow complex, domain-specific formats. The benchmark addresses the gap in existing evaluations, which often fail to assess LLMs' format-following capabilities adequately. FoFO is developed through an AI-human collaborative method, featuring a wide range of real-world domain-specific formats and instructions. The evaluation across both open-source and closed-source LLMs highlights three key findings: open-source models significantly lag behind closed-source ones in format adherence; LLMs' format-following performance is independent of their content generation quality; and LLMs' format proficiency varies across different domains. These insights suggest the need for specialized tuning to enhance format-following skills and highlight FoFO's role in guiding the selection of domain-specific AI agents. The FoFO dataset is publicly released to facilitate further research and development in this area.This paper introduces FoFO, a pioneering benchmark designed to evaluate large language models' (LLMs) ability to follow complex, domain-specific formats. The benchmark addresses the gap in existing evaluations, which often fail to assess LLMs' format-following capabilities adequately. FoFO is developed through an AI-human collaborative method, featuring a wide range of real-world domain-specific formats and instructions. The evaluation across both open-source and closed-source LLMs highlights three key findings: open-source models significantly lag behind closed-source ones in format adherence; LLMs' format-following performance is independent of their content generation quality; and LLMs' format proficiency varies across different domains. These insights suggest the need for specialized tuning to enhance format-following skills and highlight FoFO's role in guiding the selection of domain-specific AI agents. The FoFO dataset is publicly released to facilitate further research and development in this area.
Reach us at info@study.space
[slides] FOFO%3A A Benchmark to Evaluate LLMs' Format-Following Capability | StudySpace