15 Oct 2021 | Shu-wen Yang, Po-Han Chi, Yang-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee
The Speech processing Universal PERformance Benchmark (SUPERB) is a comprehensive benchmark designed to evaluate the performance of self-supervised learning (SSL) models across a wide range of speech processing tasks. SUPERB provides a leaderboard and benchmark toolkit to assess the generalizability and reusability of SSL representations in various speech tasks. The benchmark includes ten tasks covering content, speaker, semantics, and paralinguistics, with limited labeled data and publicly available datasets to ensure fairness and accessibility.
The framework used in SUPERB leverages a frozen, shared pretrained model and task-specific lightweight prediction heads to solve all tasks. This approach minimizes architecture changes and allows for efficient evaluation of SSL representations. The results show that SSL representations, particularly those from models like wav2vec 2.0 and HuBERT, outperform traditional supervised methods and even conventional features like log mel filterbank (FBANK) in several tasks. These results demonstrate the potential of SSL representations to be powerful, generalizable, and reusable, enabling more efficient and effective speech processing.
SUPERB aims to democratize advancements in speech processing by providing a standardized benchmark for evaluating SSL models. The challenge encourages researchers to participate and submit results to drive the research frontier. The benchmark includes tasks such as automatic speech recognition (ASR), speaker identification (SID), speaker verification (ASV), emotion recognition (ER), and spoken language understanding (SLU), among others. The framework's simplicity and effectiveness highlight the importance of SSL in advancing speech processing research and applications.The Speech processing Universal PERformance Benchmark (SUPERB) is a comprehensive benchmark designed to evaluate the performance of self-supervised learning (SSL) models across a wide range of speech processing tasks. SUPERB provides a leaderboard and benchmark toolkit to assess the generalizability and reusability of SSL representations in various speech tasks. The benchmark includes ten tasks covering content, speaker, semantics, and paralinguistics, with limited labeled data and publicly available datasets to ensure fairness and accessibility.
The framework used in SUPERB leverages a frozen, shared pretrained model and task-specific lightweight prediction heads to solve all tasks. This approach minimizes architecture changes and allows for efficient evaluation of SSL representations. The results show that SSL representations, particularly those from models like wav2vec 2.0 and HuBERT, outperform traditional supervised methods and even conventional features like log mel filterbank (FBANK) in several tasks. These results demonstrate the potential of SSL representations to be powerful, generalizable, and reusable, enabling more efficient and effective speech processing.
SUPERB aims to democratize advancements in speech processing by providing a standardized benchmark for evaluating SSL models. The challenge encourages researchers to participate and submit results to drive the research frontier. The benchmark includes tasks such as automatic speech recognition (ASR), speaker identification (SID), speaker verification (ASV), emotion recognition (ER), and spoken language understanding (SLU), among others. The framework's simplicity and effectiveness highlight the importance of SSL in advancing speech processing research and applications.