The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences
Craig Hedge, Georgina Powell, and Petroc Sumner
Abstract: Individual differences in cognitive paradigms are increasingly used to relate cognition to brain structure, chemistry, and function. However, such efforts are often unfruitful, even with the most well-established tasks. This paper explains why robust cognitive paradigms fail to produce reliable individual differences. Experimental effects become well-established and popular when between-subject variability is low. However, low between-subject variability leads to low reliability for individual differences, destroying replicable correlations with other factors and potentially undermining published conclusions. Although these statistical issues have a long history in psychology, they are widely overlooked in cognitive psychology and neuroscience today. In three studies, we assessed the test-retest reliability of seven classic tasks: Eriksen Flanker, Stroop, stop-signal, go/no-go, Posner cueing, Navon, and Spatial-Numerical Association of Response Code (SNARC). Reliabilities ranged from 0 to .82, being surprisingly low for most tasks given their common use. As predicted, this emerged from low variance between individuals rather than high measurement variance. In other words, the very reason such tasks produce robust and easily replicable experimental effects – low between-participant variability – makes their use as correlational tools problematic. We demonstrate that taking such reliability estimates into account has the potential to qualitatively change theoretical conclusions. The implications of our findings are that well-established approaches in experimental psychology and neuropsychology may not directly translate to the study of individual differences in brain structure, chemistry, and function, and alternative metrics may be required.
Keywords: Reliability; Individual differences; Reaction time; Difference scores; Response control
Individual differences have been an annoyance rather than a challenge to the experimenter. His goal is to control behavior, and variation within treatments is proof that he has not succeeded… For reasons both statistical and philosophical, error variance is to be reduced by any possible device. (Cronbach, 1957, p. 674)
The discipline of psychology consists of two historically distinct approaches to the understanding of human behavior: the correlational approach and the experimental approach (Cronbach, 1957). The division between experimental and correlational approaches was highlighted as a failing by some theorists (Cronbach, 1957; Hull, 1945), whilst others suggest that it may be the inevitable consequence of fundamentally different levels of explanation (Borsboom, Kievit, Cervone, & Hood, 2009). The correlational, or individual differences, approach examines factors that distinguish between individuals within a population (i.e., between-subject variance). Alternatively, the experimental approach aims to precisely characterize a cognitive mechanism based on the typical or average response to a manipulation of environmental variables (i.e., within-subjectThe reliability paradox: Why robust cognitive tasks do not produce reliable individual differences
Craig Hedge, Georgina Powell, and Petroc Sumner
Abstract: Individual differences in cognitive paradigms are increasingly used to relate cognition to brain structure, chemistry, and function. However, such efforts are often unfruitful, even with the most well-established tasks. This paper explains why robust cognitive paradigms fail to produce reliable individual differences. Experimental effects become well-established and popular when between-subject variability is low. However, low between-subject variability leads to low reliability for individual differences, destroying replicable correlations with other factors and potentially undermining published conclusions. Although these statistical issues have a long history in psychology, they are widely overlooked in cognitive psychology and neuroscience today. In three studies, we assessed the test-retest reliability of seven classic tasks: Eriksen Flanker, Stroop, stop-signal, go/no-go, Posner cueing, Navon, and Spatial-Numerical Association of Response Code (SNARC). Reliabilities ranged from 0 to .82, being surprisingly low for most tasks given their common use. As predicted, this emerged from low variance between individuals rather than high measurement variance. In other words, the very reason such tasks produce robust and easily replicable experimental effects – low between-participant variability – makes their use as correlational tools problematic. We demonstrate that taking such reliability estimates into account has the potential to qualitatively change theoretical conclusions. The implications of our findings are that well-established approaches in experimental psychology and neuropsychology may not directly translate to the study of individual differences in brain structure, chemistry, and function, and alternative metrics may be required.
Keywords: Reliability; Individual differences; Reaction time; Difference scores; Response control
Individual differences have been an annoyance rather than a challenge to the experimenter. His goal is to control behavior, and variation within treatments is proof that he has not succeeded… For reasons both statistical and philosophical, error variance is to be reduced by any possible device. (Cronbach, 1957, p. 674)
The discipline of psychology consists of two historically distinct approaches to the understanding of human behavior: the correlational approach and the experimental approach (Cronbach, 1957). The division between experimental and correlational approaches was highlighted as a failing by some theorists (Cronbach, 1957; Hull, 1945), whilst others suggest that it may be the inevitable consequence of fundamentally different levels of explanation (Borsboom, Kievit, Cervone, & Hood, 2009). The correlational, or individual differences, approach examines factors that distinguish between individuals within a population (i.e., between-subject variance). Alternatively, the experimental approach aims to precisely characterize a cognitive mechanism based on the typical or average response to a manipulation of environmental variables (i.e., within-subject