2024 | Nicolae-Cătălin Ristea, Ando Saabas, Ross Cutler, Babak Naderi, Sebastian Braun, Solomiya Branets
The ICASSP 2024 Speech Signal Improvement Grand Challenge aims to promote research in enhancing speech signal quality in communication systems. Building on the success of the 2023 challenge, this year's challenge introduces a dataset synthesizer to provide all teams with a higher baseline, an objective metric (SIGMOS) for extended P.804 tests, transcripts for the 2023 test set, and adds Word Accuracy (WAcc) as a metric. A total of 13 systems were evaluated in the real-time track and 11 in the non-real-time track using both subjective P.804 and objective WAcc metrics.
The challenge focuses on improving speech quality in mainstream communication systems, which are essential for both work and personal use. Speech signal quality is measured using the SIG method according to ITU-T P.835. The challenge aims to bring researchers together to address the issue of speech quality improvement. The evaluation includes both subjective scores from P.804 tests and the WAcc metric, which is particularly significant as it showed low correlation with P.804 results.
The blind dataset consists of 500 clips from different devices, environments, and speakers, with a majority in English. The evaluation methodology includes a subjective listening test based on the P.804 standard, using Amazon Mechanical Turk for crowd-sourcing. Quality control is ensured through gold and trapping questions, and WAcc is evaluated using Azure Cognitive Services speech recognition. The final score is calculated as the average of SIG, OVRL, and WAcc.
The results show that the top teams in the real-time track are 1024k, B&N, Nju-AALab, Sluice, and IIP, while the top three in the non-real-time track are 1024k, SpeechGroup-IoA, and B&N. The challenge results are presented in Table 1, with statistical testing showing significant differences between teams. The challenge aims to advance the state-of-the-art in signal enhancement.The ICASSP 2024 Speech Signal Improvement Grand Challenge aims to promote research in enhancing speech signal quality in communication systems. Building on the success of the 2023 challenge, this year's challenge introduces a dataset synthesizer to provide all teams with a higher baseline, an objective metric (SIGMOS) for extended P.804 tests, transcripts for the 2023 test set, and adds Word Accuracy (WAcc) as a metric. A total of 13 systems were evaluated in the real-time track and 11 in the non-real-time track using both subjective P.804 and objective WAcc metrics.
The challenge focuses on improving speech quality in mainstream communication systems, which are essential for both work and personal use. Speech signal quality is measured using the SIG method according to ITU-T P.835. The challenge aims to bring researchers together to address the issue of speech quality improvement. The evaluation includes both subjective scores from P.804 tests and the WAcc metric, which is particularly significant as it showed low correlation with P.804 results.
The blind dataset consists of 500 clips from different devices, environments, and speakers, with a majority in English. The evaluation methodology includes a subjective listening test based on the P.804 standard, using Amazon Mechanical Turk for crowd-sourcing. Quality control is ensured through gold and trapping questions, and WAcc is evaluated using Azure Cognitive Services speech recognition. The final score is calculated as the average of SIG, OVRL, and WAcc.
The results show that the top teams in the real-time track are 1024k, B&N, Nju-AALab, Sluice, and IIP, while the top three in the non-real-time track are 1024k, SpeechGroup-IoA, and B&N. The challenge results are presented in Table 1, with statistical testing showing significant differences between teams. The challenge aims to advance the state-of-the-art in signal enhancement.