Can GPT-3.5 Generate and Code Discharge Summaries?

Can GPT-3.5 Generate and Code Discharge Summaries?

24 Jan 2024 | Matúš Falis (MScR), Aryo Pradipta Gema (MScR), Hang Dong (PhD), Luke Daines (PhD), Siddharth Basetti (MBSS), Michael Holder (MMedSci), Rose S Penfold (BMBCh), Alexandra Birch (PhD), and Beatrice Alex (PhD)
This study investigates the capabilities of GPT-3.5 in generating and coding medical documents using ICD-10 codes for data augmentation in low-resource scenarios. The researchers generated 9,606 discharge summaries based on ICD-10 code descriptions from the MIMIC-IV dataset, which were then combined with the baseline training set to form an augmented training set. Neural coding models were trained on both the baseline and augmented data and evaluated on a MIMIC-IV test set. The results showed that while augmentation slightly hindered overall performance, it improved performance for generation candidate codes and their families, including one unseen in the baseline training data. Augmented models displayed lower out-of-family error rates. However, GPT-3.5 performed poorly on real data, identifying ICD-10 codes based on prompted descriptions but struggling in realistic scenarios without in-prompt aid. Clinical professionals evaluated the generated discharge summaries, noting their correctness but suffering in variety, supporting information, and narrative. The study concludes that GPT-3.5 alone is unsuitable for ICD-10 coding, and while augmentation positively affects generation code families, it mainly benefits codes with existing examples.This study investigates the capabilities of GPT-3.5 in generating and coding medical documents using ICD-10 codes for data augmentation in low-resource scenarios. The researchers generated 9,606 discharge summaries based on ICD-10 code descriptions from the MIMIC-IV dataset, which were then combined with the baseline training set to form an augmented training set. Neural coding models were trained on both the baseline and augmented data and evaluated on a MIMIC-IV test set. The results showed that while augmentation slightly hindered overall performance, it improved performance for generation candidate codes and their families, including one unseen in the baseline training data. Augmented models displayed lower out-of-family error rates. However, GPT-3.5 performed poorly on real data, identifying ICD-10 codes based on prompted descriptions but struggling in realistic scenarios without in-prompt aid. Clinical professionals evaluated the generated discharge summaries, noting their correctness but suffering in variety, supporting information, and narrative. The study concludes that GPT-3.5 alone is unsuitable for ICD-10 coding, and while augmentation positively affects generation code families, it mainly benefits codes with existing examples.
Reach us at info@study.space
Understanding Can GPT-3.5 generate and code discharge summaries%3F