Using GPT-4 to write a scientific review article: a pilot evaluation study

Using GPT-4 to write a scientific review article: a pilot evaluation study

2024 | Zhiping Paul Wang, Priyanka Bhandary, Yizhou Wang and Jason H. Moore
This study evaluates the capabilities of GPT-4 in generating scientific review articles, focusing on text, tables, and diagrams. The research explores GPT-4's ability to summarize key points from reference papers, generate text content, suggest future research directions, and create tables and graphs. The study uses two benchmark review papers, BRP1 and BRP2, to assess GPT-4's performance in these tasks. GPT-4 demonstrated strong performance in summarizing key points from reference papers and generating text content, with an average similarity score of 0.748 to the original content. It also showed capability in suggesting future research directions, with a significant increase in performance when all relevant references were used. However, GPT-4 struggled with generating accurate tables and diagrams, indicating a need for further refinement in these areas. The study also assessed the reproducibility of GPT-4's outputs, finding that it consistently generated uniform text responses when provided with identical prompts and reference materials. However, GPT-4's performance was less stable when processing uploaded documents, suggesting limitations in its ability to handle unstructured or highly specialized scientific content. A plagiarism check using iThenticate revealed that GPT-4's reference-based content generation had a 34% similarity score with the original review paper, while the base model had a 10% score. This indicates that while GPT-4 can generate original text, there is a risk of close matches with reference articles, particularly in cases of 'copy-paste' style matches. The study concludes that GPT-4 is a valuable tool for generating text content for scientific review articles but requires further development to improve its ability to generate accurate tables and diagrams. The findings suggest that while GPT-4 can assist in the composition of scientific review articles, it is not yet capable of fully replacing human authors in this task. The study also highlights the need for further research to address the limitations of GPT-4 in generating scientific diagrams and to develop more effective methods for detecting AI-generated content in scientific publications.This study evaluates the capabilities of GPT-4 in generating scientific review articles, focusing on text, tables, and diagrams. The research explores GPT-4's ability to summarize key points from reference papers, generate text content, suggest future research directions, and create tables and graphs. The study uses two benchmark review papers, BRP1 and BRP2, to assess GPT-4's performance in these tasks. GPT-4 demonstrated strong performance in summarizing key points from reference papers and generating text content, with an average similarity score of 0.748 to the original content. It also showed capability in suggesting future research directions, with a significant increase in performance when all relevant references were used. However, GPT-4 struggled with generating accurate tables and diagrams, indicating a need for further refinement in these areas. The study also assessed the reproducibility of GPT-4's outputs, finding that it consistently generated uniform text responses when provided with identical prompts and reference materials. However, GPT-4's performance was less stable when processing uploaded documents, suggesting limitations in its ability to handle unstructured or highly specialized scientific content. A plagiarism check using iThenticate revealed that GPT-4's reference-based content generation had a 34% similarity score with the original review paper, while the base model had a 10% score. This indicates that while GPT-4 can generate original text, there is a risk of close matches with reference articles, particularly in cases of 'copy-paste' style matches. The study concludes that GPT-4 is a valuable tool for generating text content for scientific review articles but requires further development to improve its ability to generate accurate tables and diagrams. The findings suggest that while GPT-4 can assist in the composition of scientific review articles, it is not yet capable of fully replacing human authors in this task. The study also highlights the need for further research to address the limitations of GPT-4 in generating scientific diagrams and to develop more effective methods for detecting AI-generated content in scientific publications.
Reach us at info@study.space
Understanding Using GPT-4 to write a scientific review article%3A a pilot evaluation study