2015 | Dina Demner-Fushman, Marc D. Kohli, Marc B. Rosenman, Sonya E. Shooshan, Laritza Rodriguez, Sameer Antani, George R. Thoma, Clement J. McDonald
This paper presents an approach to developing and making publicly available a collection of radiology examinations, including both images and radiologist narrative reports. The authors collected 3996 radiology reports and 8121 associated images from the Indiana Network for Patient Care and hospitals' picture archiving systems. The images and reports were automatically de-identified, and the automatic de-identification was manually verified. Key findings in the reports were coded manually to improve retrieval precision. The results showed that automatic de-identification of text achieved 100% precision but rendered some findings uninterpretable, while automatic de-identification of images was less perfect, with two images from 3996 patients (0.05%) showing protected health information. Manual encoding of findings significantly improved retrieval precision. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine's Open-i service. The study highlights the importance of manual verification and coding in enhancing the accessibility and utility of clinical document collections for secondary use.This paper presents an approach to developing and making publicly available a collection of radiology examinations, including both images and radiologist narrative reports. The authors collected 3996 radiology reports and 8121 associated images from the Indiana Network for Patient Care and hospitals' picture archiving systems. The images and reports were automatically de-identified, and the automatic de-identification was manually verified. Key findings in the reports were coded manually to improve retrieval precision. The results showed that automatic de-identification of text achieved 100% precision but rendered some findings uninterpretable, while automatic de-identification of images was less perfect, with two images from 3996 patients (0.05%) showing protected health information. Manual encoding of findings significantly improved retrieval precision. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine's Open-i service. The study highlights the importance of manual verification and coding in enhancing the accessibility and utility of clinical document collections for secondary use.