18 Jun 2024 | Saturnino Luz, Sofia De La Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu
This paper presents a novel benchmark dataset and prediction tasks for assessing cognitive function through the analysis of connected speech in Mandarin Chinese and English. The dataset includes speech samples and clinical information from speakers with different levels of cognitive impairment, balanced by age and sex using propensity score matching. The prediction tasks focus on diagnosing mild cognitive impairment (MCI) and predicting cognitive test scores. The authors aim to develop language-agnostic and comparable features for these tasks, aiming to create a single predictive model that generalizes across languages. Baseline models using language-agnostic and comparable features achieved an unweighted average recall of 59.2% in diagnosis and a root mean squared error of 2.89 in score prediction. The dataset and models are available to the research community, aiming to stimulate further research in cross-lingual speech-based cognitive assessment. The study highlights the potential of speech as a digital biomarker for cognitive function and the importance of multilingual models in early detection and monitoring of cognitive decline.This paper presents a novel benchmark dataset and prediction tasks for assessing cognitive function through the analysis of connected speech in Mandarin Chinese and English. The dataset includes speech samples and clinical information from speakers with different levels of cognitive impairment, balanced by age and sex using propensity score matching. The prediction tasks focus on diagnosing mild cognitive impairment (MCI) and predicting cognitive test scores. The authors aim to develop language-agnostic and comparable features for these tasks, aiming to create a single predictive model that generalizes across languages. Baseline models using language-agnostic and comparable features achieved an unweighted average recall of 59.2% in diagnosis and a root mean squared error of 2.89 in score prediction. The dataset and models are available to the research community, aiming to stimulate further research in cross-lingual speech-based cognitive assessment. The study highlights the potential of speech as a digital biomarker for cognitive function and the importance of multilingual models in early detection and monitoring of cognitive decline.