2019 | Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, Jayne Seekins, David A. Mong, Safwan S. Halabi, Jesse K. Sandberg, Ricky Jones, David B. Larson, Curtis P. Langlotz, Bhavik N. Patel, Andrew Y. Ng
CheXpert is a large chest radiograph dataset containing 224,316 chest radiographs of 65,240 patients, labeled for the presence of 14 common chest radiographic observations. The dataset includes uncertainty labels to capture the inherent uncertainty in radiograph interpretation. The authors designed a labeler to automatically extract observations from radiology reports and classify them as positive, negative, or uncertain. The labeler uses a three-stage process: mention extraction, classification, and aggregation. The labeler was evaluated against a report evaluation set, where two board-certified radiologists annotated the reports to determine the ground truth.
The authors investigated different approaches to using the uncertainty labels for training convolutional neural networks. These approaches include ignoring uncertainty labels, binary mapping (mapping uncertain labels to 0 or 1), and self-training. The best-performing approach was found to be the U-MultiClass model, which treats the uncertain label as its own class. The model was evaluated on a validation set of 200 studies, where the consensus of three radiologists served as ground truth. The model outperformed the radiologists on three of the five selected pathologies (Cardiomegaly, Edema, and Pleural Effusion) in terms of ROC and PR curves.
The model was further evaluated on a test set of 500 studies, where the consensus of five radiologists served as ground truth. The model outperformed two of the three radiologists on four of the five selected pathologies. The dataset is publicly available to encourage further development of chest radiograph interpretation models. The authors also compared their labeler to the NIH labeler and found that their labeler performed better on several tasks, particularly on Cardiomegaly, Pneumonia, and Pneumothorax. The model was also evaluated for calibration using post-processing techniques such as isotonic regression and Platt scaling. The results showed that the model's performance was comparable to the radiologists on most tasks, with the exception of Atelectasis, where the radiologists performed better. The authors concluded that the CheXpert dataset provides a strong benchmark for evaluating chest radiograph interpretation models.CheXpert is a large chest radiograph dataset containing 224,316 chest radiographs of 65,240 patients, labeled for the presence of 14 common chest radiographic observations. The dataset includes uncertainty labels to capture the inherent uncertainty in radiograph interpretation. The authors designed a labeler to automatically extract observations from radiology reports and classify them as positive, negative, or uncertain. The labeler uses a three-stage process: mention extraction, classification, and aggregation. The labeler was evaluated against a report evaluation set, where two board-certified radiologists annotated the reports to determine the ground truth.
The authors investigated different approaches to using the uncertainty labels for training convolutional neural networks. These approaches include ignoring uncertainty labels, binary mapping (mapping uncertain labels to 0 or 1), and self-training. The best-performing approach was found to be the U-MultiClass model, which treats the uncertain label as its own class. The model was evaluated on a validation set of 200 studies, where the consensus of three radiologists served as ground truth. The model outperformed the radiologists on three of the five selected pathologies (Cardiomegaly, Edema, and Pleural Effusion) in terms of ROC and PR curves.
The model was further evaluated on a test set of 500 studies, where the consensus of five radiologists served as ground truth. The model outperformed two of the three radiologists on four of the five selected pathologies. The dataset is publicly available to encourage further development of chest radiograph interpretation models. The authors also compared their labeler to the NIH labeler and found that their labeler performed better on several tasks, particularly on Cardiomegaly, Pneumonia, and Pneumothorax. The model was also evaluated for calibration using post-processing techniques such as isotonic regression and Platt scaling. The results showed that the model's performance was comparable to the radiologists on most tasks, with the exception of Atelectasis, where the radiologists performed better. The authors concluded that the CheXpert dataset provides a strong benchmark for evaluating chest radiograph interpretation models.