16 Jan 2024 | Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe'er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka
The NOTSOFAR-1 Challenge introduces a new dataset and baseline system for distant meeting transcription. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios. It includes two datasets: a benchmarking dataset of 315 meetings, each averaging 6 minutes, capturing a wide range of real-world acoustic conditions and conversational dynamics, and a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization. The tasks focus on single-device DASR, where multi-channel devices always share the same known geometry. This setup aligns with common conference room configurations and avoids technical complexities associated with multi-device tasks. The challenge aims to advance research in distant conversational speech recognition by providing key resources to unlock the potential of data-driven methods. The challenge features two tracks: single-channel and known-geometry multi-channel. The challenge includes metrics for evaluating speaker-attributed and speaker-agnostic performance. The challenge also includes a baseline system for participants to use, consisting of continuous speech separation (CSS), ASR, and speaker diarization modules. The challenge aims to promote the development of innovative, practical systems rather than performance-squeezing approaches. The challenge also includes detailed metadata for deep-dive analysis, capturing acoustic events and conversational aspects. The challenge includes a simulated training dataset with real acoustic transfer functions, allowing for the development of geometry-specific solutions. The challenge also includes a dataset of natural meeting recordings, capturing a wide range of real-world scenarios. The challenge includes a simulated training dataset with separated speech and noise components as supervision signals for training data-driven speech separation and enhancement methods. The challenge includes a baseline system for participants to use, consisting of CSS, ASR, and speaker diarization modules. The challenge aims to advance research in distant conversational speech recognition by providing key resources to unlock the potential of data-driven methods.The NOTSOFAR-1 Challenge introduces a new dataset and baseline system for distant meeting transcription. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios. It includes two datasets: a benchmarking dataset of 315 meetings, each averaging 6 minutes, capturing a wide range of real-world acoustic conditions and conversational dynamics, and a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization. The tasks focus on single-device DASR, where multi-channel devices always share the same known geometry. This setup aligns with common conference room configurations and avoids technical complexities associated with multi-device tasks. The challenge aims to advance research in distant conversational speech recognition by providing key resources to unlock the potential of data-driven methods. The challenge features two tracks: single-channel and known-geometry multi-channel. The challenge includes metrics for evaluating speaker-attributed and speaker-agnostic performance. The challenge also includes a baseline system for participants to use, consisting of continuous speech separation (CSS), ASR, and speaker diarization modules. The challenge aims to promote the development of innovative, practical systems rather than performance-squeezing approaches. The challenge also includes detailed metadata for deep-dive analysis, capturing acoustic events and conversational aspects. The challenge includes a simulated training dataset with real acoustic transfer functions, allowing for the development of geometry-specific solutions. The challenge also includes a dataset of natural meeting recordings, capturing a wide range of real-world scenarios. The challenge includes a simulated training dataset with separated speech and noise components as supervision signals for training data-driven speech separation and enhancement methods. The challenge includes a baseline system for participants to use, consisting of CSS, ASR, and speaker diarization modules. The challenge aims to advance research in distant conversational speech recognition by providing key resources to unlock the potential of data-driven methods.