Preparing a collection of radiology examinations for distribution and retrieval

Preparing a collection of radiology examinations for distribution and retrieval

2016 | Dina Demner-Fushman, Marc D. Kohli, Marc B. Rosenman, Sonya E. Shoochan, Laritza Rodriguez, Sameer Antani, George R. Thoma, Clement J. McDonald
This paper presents an approach to developing a collection of radiology examinations, including both images and radiologist narrative reports, and making them publicly available in a searchable database. The authors collected 3996 radiology reports and 8121 associated images from the Indiana Network for Patient Care. The images and reports were de-identified automatically and then manually verified. The authors coded the key findings of the reports and assessed the benefits of manual coding on retrieval. The automatic de-identification of the narrative was aggressive, achieving 100% precision but rendering some findings uninterpretable. Automatic de-identification of images was less perfect, with two of 3996 patients (0.05%) showing protected health information. Manual encoding of findings improved retrieval precision. Stringent de-identification methods can remove all identifiers from text radiology reports. DICOM de-identification of images does not remove all identifying information and requires special attention to images scanned from film. Adding manual coding to the radiologist narrative reports significantly improved the relevancy of the retrieved clinical documents. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine (http://openi.nlm.nih.gov/). The authors manually reviewed the narrative text and DICOM images to ensure de-identification was complete. They used the Regenstrief Scrubber to de-identify text reports and the Radiologic Society of North America's Clinical Trials Processor and DICOM supplement 142 Clinical Trials De-identification methodology to de-identify DICOM files. They also manually encoded the findings and diagnoses recorded in the radiology reports using MeSH and RadLex codes. The authors conducted retrieval experiments using real-life image search queries and found that manual coding significantly improved retrieval results. The public, and doubly de-identified, collection is searchable and downloadable from the NLM image retrieval service (Open-i) that also provides access to over 2.6 million images and enriched MEDLINE citations from over 700,000 PubMedCentral articles. The collection has attracted two research groups that have obtained the data using the Open-i API. The original DICOM images are available at http://openi.nlm.nih.gov/contactus.php.This paper presents an approach to developing a collection of radiology examinations, including both images and radiologist narrative reports, and making them publicly available in a searchable database. The authors collected 3996 radiology reports and 8121 associated images from the Indiana Network for Patient Care. The images and reports were de-identified automatically and then manually verified. The authors coded the key findings of the reports and assessed the benefits of manual coding on retrieval. The automatic de-identification of the narrative was aggressive, achieving 100% precision but rendering some findings uninterpretable. Automatic de-identification of images was less perfect, with two of 3996 patients (0.05%) showing protected health information. Manual encoding of findings improved retrieval precision. Stringent de-identification methods can remove all identifiers from text radiology reports. DICOM de-identification of images does not remove all identifying information and requires special attention to images scanned from film. Adding manual coding to the radiologist narrative reports significantly improved the relevancy of the retrieved clinical documents. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine (http://openi.nlm.nih.gov/). The authors manually reviewed the narrative text and DICOM images to ensure de-identification was complete. They used the Regenstrief Scrubber to de-identify text reports and the Radiologic Society of North America's Clinical Trials Processor and DICOM supplement 142 Clinical Trials De-identification methodology to de-identify DICOM files. They also manually encoded the findings and diagnoses recorded in the radiology reports using MeSH and RadLex codes. The authors conducted retrieval experiments using real-life image search queries and found that manual coding significantly improved retrieval results. The public, and doubly de-identified, collection is searchable and downloadable from the NLM image retrieval service (Open-i) that also provides access to over 2.6 million images and enriched MEDLINE citations from over 700,000 PubMedCentral articles. The collection has attracted two research groups that have obtained the data using the Open-i API. The original DICOM images are available at http://openi.nlm.nih.gov/contactus.php.
Reach us at info@study.space
Understanding Preparing a collection of radiology examinations for distribution and retrieval