Understanding Computing Inter-Rater Reliability for Observational Data%3A An Overview and Tutorial.

This paper provides an overview of methodological issues related to assessing inter-rater reliability (IRR) in observational data, focusing on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of commonly used IRR statistics. It emphasizes the importance of using correct statistical procedures and addressing how IRR affects the power of subsequent analyses. The paper discusses the assessment of IRR for nominal, ordinal, interval, and ratio variables, including the use of Cohen’s kappa and intra-class correlations (ICCs). It highlights common mistakes in assessing and reporting IRR, such as using percentages of agreement, not reporting the specific statistic used, and not considering the implications of IRR on statistical power. The paper also provides computational examples using SPSS and R syntax for computing Cohen’s kappa and ICCs, and discusses the selection of appropriate statistics based on study design and data characteristics.This paper provides an overview of methodological issues related to assessing inter-rater reliability (IRR) in observational data, focusing on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of commonly used IRR statistics. It emphasizes the importance of using correct statistical procedures and addressing how IRR affects the power of subsequent analyses. The paper discusses the assessment of IRR for nominal, ordinal, interval, and ratio variables, including the use of Cohen’s kappa and intra-class correlations (ICCs). It highlights common mistakes in assessing and reporting IRR, such as using percentages of agreement, not reporting the specific statistic used, and not considering the implications of IRR on statistical power. The paper also provides computational examples using SPSS and R syntax for computing Cohen’s kappa and ICCs, and discusses the selection of appropriate statistics based on study design and data characteristics.

Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

2012, Vol. 8(1), p. 23-34 | Kevin A. Hallgren