2004 February ; 11(2): 178–189 | Kelly H. Zou, PhD, Simon K. Warfield, PhD, Aditya Bharatha, MD, Clare M.C. Tempany, MD, Michael R. Kaus, PhD, Steven J. Haker, PhD, William M. Wells III, PhD, Ferenc A. Jolesz, MD, and Ron Kikinis, MD
This study evaluates the statistical validation of image segmentation quality using the Dice similarity coefficient (DSC) as a spatial overlap index. The DSC is used to assess the reproducibility of manual segmentations and the accuracy of automated probabilistic fractional segmentations of MR images in two clinical examples. In Example 1, 10 consecutive prostate brachytherapy patients underwent both preoperative 1.5T and intraoperative 0.5T MR imaging, and five repeated manual segmentations of the prostate peripheral zone were performed. The mean DSCs were 0.883 (range, 0.876–0.893) for preoperative 1.5T images and 0.838 (range, 0.819–0.852) for intraoperative 0.5T images, with a statistically significant difference (P < .001). In Example 2, a semi-automated probabilistic fractional segmentation algorithm was applied to MR imaging of 9 cases with three types of brain tumors. The DSC values were computed and logit-transformed, and the mean values were compared using ANOVA. The results showed wide ranges of DSC values in brain tumor segmentations, with meningiomas ranging from 0.519 to 0.893, astrocytomas from 0.487 to 0.972, and other mixed gliomas from 0.490 to 0.899. The study concludes that the DSC is a simple and useful measure of spatial overlap, which can be applied to evaluate reproducibility and accuracy in image segmentation. The validation results were generally satisfactory but variable, suggesting the need for improved segmentation algorithms.This study evaluates the statistical validation of image segmentation quality using the Dice similarity coefficient (DSC) as a spatial overlap index. The DSC is used to assess the reproducibility of manual segmentations and the accuracy of automated probabilistic fractional segmentations of MR images in two clinical examples. In Example 1, 10 consecutive prostate brachytherapy patients underwent both preoperative 1.5T and intraoperative 0.5T MR imaging, and five repeated manual segmentations of the prostate peripheral zone were performed. The mean DSCs were 0.883 (range, 0.876–0.893) for preoperative 1.5T images and 0.838 (range, 0.819–0.852) for intraoperative 0.5T images, with a statistically significant difference (P < .001). In Example 2, a semi-automated probabilistic fractional segmentation algorithm was applied to MR imaging of 9 cases with three types of brain tumors. The DSC values were computed and logit-transformed, and the mean values were compared using ANOVA. The results showed wide ranges of DSC values in brain tumor segmentations, with meningiomas ranging from 0.519 to 0.893, astrocytomas from 0.487 to 0.972, and other mixed gliomas from 0.490 to 0.899. The study concludes that the DSC is a simple and useful measure of spatial overlap, which can be applied to evaluate reproducibility and accuracy in image segmentation. The validation results were generally satisfactory but variable, suggesting the need for improved segmentation algorithms.