Careless Whisper: Speech-to-Text Hallucination Harms

Careless Whisper: Speech-to-Text Hallucination Harms

June 3–6, 2024, Rio de Janeiro, Brazil | ALLISON KOENECKE, ANNA SEO GYEONG CHOI, KATELYN X. MEI, HILKE SCHELLMANN, MONA SLOANE
The paper "Careless Whisper: Speech-to-Text Hallucination Harms" by Allison Koencke, Anna Seo Gyeong Choi, Katelyn X. Mei, Hilke Schellmann, and Mona Sloane evaluates the speech-to-text service Whisper, developed by OpenAI, for its hallucinations in transcriptions. The study focuses on the impact of these hallucinations, particularly on individuals with aphasia, a language disorder characterized by difficulties in expressing oneself through speech and voice. The researchers conducted a large-scale evaluation using 13,140 audio segments from the TalkBank's AphasiaBank, sourced from multiple institutions across the United States. Key findings include: - Approximately 1% of transcriptions contained hallucinated content. - 38% of these hallucinations included explicit harms such as perpetuating violence, making inaccurate associations, or implying false authority. - Hallucinations disproportionately occurred in individuals with aphasia, who tend to have longer periods of non-vocal durations. - Whisper's hallucinations were more common and harmful compared to those from competing speech recognition systems like Google Speech-to-Text. The authors call for industry practitioners to address these issues, raise awareness of potential biases, and improve the model to reduce hallucinations. They also highlight the ethical implications, particularly for subpopulations with speech impairments, and suggest that future research should focus on identifying and mitigating these biases.The paper "Careless Whisper: Speech-to-Text Hallucination Harms" by Allison Koencke, Anna Seo Gyeong Choi, Katelyn X. Mei, Hilke Schellmann, and Mona Sloane evaluates the speech-to-text service Whisper, developed by OpenAI, for its hallucinations in transcriptions. The study focuses on the impact of these hallucinations, particularly on individuals with aphasia, a language disorder characterized by difficulties in expressing oneself through speech and voice. The researchers conducted a large-scale evaluation using 13,140 audio segments from the TalkBank's AphasiaBank, sourced from multiple institutions across the United States. Key findings include: - Approximately 1% of transcriptions contained hallucinated content. - 38% of these hallucinations included explicit harms such as perpetuating violence, making inaccurate associations, or implying false authority. - Hallucinations disproportionately occurred in individuals with aphasia, who tend to have longer periods of non-vocal durations. - Whisper's hallucinations were more common and harmful compared to those from competing speech recognition systems like Google Speech-to-Text. The authors call for industry practitioners to address these issues, raise awareness of potential biases, and improve the model to reduce hallucinations. They also highlight the ethical implications, particularly for subpopulations with speech impairments, and suggest that future research should focus on identifying and mitigating these biases.
Reach us at info@study.space