[slides] AudioBench%3A A Universal Benchmark for Audio Large Language Models

AudioBench is a comprehensive benchmark designed to evaluate audio large language models (AudioLLMs). It consists of 8 distinct tasks and 26 datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. The benchmark addresses the lack of comprehensive evaluation frameworks for AudioLLMs by providing relevant datasets and evaluation metrics. The study evaluates four models across various aspects and finds that no single model excels consistently across all tasks. The research outlook highlights future directions for AudioLLMs, including long audio processing, multi-round query handling, multilingual capabilities, and speech generation. The open-source code, data, and leaderboard are available to support future model developments.AudioBench is a comprehensive benchmark designed to evaluate audio large language models (AudioLLMs). It consists of 8 distinct tasks and 26 datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. The benchmark addresses the lack of comprehensive evaluation frameworks for AudioLLMs by providing relevant datasets and evaluation metrics. The study evaluates four models across various aspects and finds that no single model excels consistently across all tasks. The research outlook highlights future directions for AudioLLMs, including long audio processing, multi-round query handling, multilingual capabilities, and speech generation. The open-source code, data, and leaderboard are available to support future model developments.

AudioBench: A Universal Benchmark for Audio Large Language Models

25 Jun 2024 | Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen