Statistical Validation of Image Segmentation Quality Based on a Spatial Overlap Index

Statistical Validation of Image Segmentation Quality Based on a Spatial Overlap Index

2004 February | Kelly H. Zou, PhD, Simon K. Warfield, PhD, Aditya Bharatha, MD, Clare M.C. Tempany, MD, Michael R. Kaus, PhD, Steven J. Haker, PhD, William M. Wells III, PhD, Ferenc A. Jolesz, MD, and Ron Kikinis, MD
The study presents a statistical validation method for assessing the quality of image segmentation based on the Dice similarity coefficient (DSC), a measure of spatial overlap between two segmentations. The method was applied to two clinical examples: prostate brachytherapy and brain tumor segmentation. In the first example, 10 prostate cases underwent preoperative 1.5T and intraoperative 0.5T MRI. Five manual segmentations of the prostate peripheral zone (PZ) were performed on each case. The DSC values for preoperative and intraoperative images were 0.883 and 0.838, respectively, indicating good reproducibility. In the second example, a semi-automated probabilistic fractional segmentation algorithm was applied to 9 brain tumor cases. DSC values ranged widely across tumor types, with meningiomas (0.519–0.893), astrocytomas (0.487–0.972), and mixed gliomas (0.490–0.899). The DSC was logit-transformed to enable statistical analysis, as it has a restricted range [0,1]. The logit transformation maps the DSC to an unbounded range (-∞, ∞), facilitating normality assumptions and statistical testing. The study found that DSC values were generally satisfactory but variable across cases. The DSC was used to evaluate both manual and automated segmentations, with results showing that preoperative 1.5T images provided better spatial resolution and contrast than intraoperative 0.5T images. The study also explored the effects of segmentation variability, learning curves, and case-to-case differences. ANOVA analysis revealed significant differences in reproducibility between preoperative and intraoperative images, as well as between different tumor types. The results suggest that DSC is a useful and simple metric for assessing spatial overlap and reproducibility in image segmentation. The study highlights the importance of using a composite gold standard derived from multiple manual segmentations for validating automated methods. The findings indicate that DSC can be adapted for similar validation tasks in medical imaging.The study presents a statistical validation method for assessing the quality of image segmentation based on the Dice similarity coefficient (DSC), a measure of spatial overlap between two segmentations. The method was applied to two clinical examples: prostate brachytherapy and brain tumor segmentation. In the first example, 10 prostate cases underwent preoperative 1.5T and intraoperative 0.5T MRI. Five manual segmentations of the prostate peripheral zone (PZ) were performed on each case. The DSC values for preoperative and intraoperative images were 0.883 and 0.838, respectively, indicating good reproducibility. In the second example, a semi-automated probabilistic fractional segmentation algorithm was applied to 9 brain tumor cases. DSC values ranged widely across tumor types, with meningiomas (0.519–0.893), astrocytomas (0.487–0.972), and mixed gliomas (0.490–0.899). The DSC was logit-transformed to enable statistical analysis, as it has a restricted range [0,1]. The logit transformation maps the DSC to an unbounded range (-∞, ∞), facilitating normality assumptions and statistical testing. The study found that DSC values were generally satisfactory but variable across cases. The DSC was used to evaluate both manual and automated segmentations, with results showing that preoperative 1.5T images provided better spatial resolution and contrast than intraoperative 0.5T images. The study also explored the effects of segmentation variability, learning curves, and case-to-case differences. ANOVA analysis revealed significant differences in reproducibility between preoperative and intraoperative images, as well as between different tumor types. The results suggest that DSC is a useful and simple metric for assessing spatial overlap and reproducibility in image segmentation. The study highlights the importance of using a composite gold standard derived from multiple manual segmentations for validating automated methods. The findings indicate that DSC can be adapted for similar validation tasks in medical imaging.
Reach us at info@study.space