April 16, 2024 | Teo Susnjak, Peter Hwang, Napoleon H. Reyes, Andre L. C. Barczak, Timothy R. McIntosh, Surangika Ranathunga
This research introduces a novel approach to automate Systematic Literature Reviews (SLRs) by fine-tuning Large Language Models (LLMs) to enhance the knowledge synthesis phase. The study demonstrates a practical and efficient method for automating the final execution stages of an SLR process, which involves synthesizing information from academic papers. The results maintain high factual accuracy in LLM responses and are validated through the replication of an existing PRISMA-conforming SLR. The research addresses the challenges of LLM hallucination and proposes mechanisms to track LLM responses to their sources of information, ensuring the reliability of SLR outcomes. The findings confirm the potential of fine-tuned LLMs in streamlining labor-intensive literature review processes. Given the potential of this approach and its applicability across all research domains, the study advocates for updating PRISMA reporting guidelines to incorporate AI-driven processes, ensuring methodological transparency and reliability in future SLRs. The study also proposes a Python package for data curation in LLM fine-tuning, tailored for SLR needs. The research contributes to the field of information retrieval by providing a methodical approach for converting selected academic papers into datasets for LLM fine-tuning, ensuring factual recall through audit mechanisms, and developing evaluation metrics for factuality. The study benchmarks various AI automation methodologies for SLRs, demonstrating the efficacy of proposed methodologies through replication of a published PRISMA-conforming SLR. The research enhances the scholarly toolkit for SLRs with advanced, efficient, and context-aware AI technologies, setting new standards for academic research in terms of reliability, validity, and ethical AI use. The study also explores the integration of Retrieval-Augmented Generation (RAG) to enhance factual accuracy in SLRs and discusses the evolution of fine-tuning practices, particularly with Parameter-Efficient Fine-Tuning (PEFT) techniques. The research highlights the challenges of LLM hallucination and proposes solutions to mitigate it, emphasizing the importance of auditing LLM responses to their sources. The study proposes an SLR-automation framework that includes steps for paper selection, automated Q&A data extraction and synthesis, token insertion as new knowledge markers, Q&A permutation, and verification. The framework is tested using a case study that replicates a PRISMA-conforming SLR to evaluate its effectiveness. The methodology includes details on the selected Gold Standard SLR, dataset extraction and preparation, experimental design, LLM selection and fine-tuning strategy, RAG implementation, and hardware specifications. The evaluation focuses on factual accuracy of responses generated by different methodologies applied to the SLR dataset, using quantitative and qualitative analyses. The study concludes that the proposed framework effectively automates SLR processes, ensuring factual accuracy and reliability in literature reviews.This research introduces a novel approach to automate Systematic Literature Reviews (SLRs) by fine-tuning Large Language Models (LLMs) to enhance the knowledge synthesis phase. The study demonstrates a practical and efficient method for automating the final execution stages of an SLR process, which involves synthesizing information from academic papers. The results maintain high factual accuracy in LLM responses and are validated through the replication of an existing PRISMA-conforming SLR. The research addresses the challenges of LLM hallucination and proposes mechanisms to track LLM responses to their sources of information, ensuring the reliability of SLR outcomes. The findings confirm the potential of fine-tuned LLMs in streamlining labor-intensive literature review processes. Given the potential of this approach and its applicability across all research domains, the study advocates for updating PRISMA reporting guidelines to incorporate AI-driven processes, ensuring methodological transparency and reliability in future SLRs. The study also proposes a Python package for data curation in LLM fine-tuning, tailored for SLR needs. The research contributes to the field of information retrieval by providing a methodical approach for converting selected academic papers into datasets for LLM fine-tuning, ensuring factual recall through audit mechanisms, and developing evaluation metrics for factuality. The study benchmarks various AI automation methodologies for SLRs, demonstrating the efficacy of proposed methodologies through replication of a published PRISMA-conforming SLR. The research enhances the scholarly toolkit for SLRs with advanced, efficient, and context-aware AI technologies, setting new standards for academic research in terms of reliability, validity, and ethical AI use. The study also explores the integration of Retrieval-Augmented Generation (RAG) to enhance factual accuracy in SLRs and discusses the evolution of fine-tuning practices, particularly with Parameter-Efficient Fine-Tuning (PEFT) techniques. The research highlights the challenges of LLM hallucination and proposes solutions to mitigate it, emphasizing the importance of auditing LLM responses to their sources. The study proposes an SLR-automation framework that includes steps for paper selection, automated Q&A data extraction and synthesis, token insertion as new knowledge markers, Q&A permutation, and verification. The framework is tested using a case study that replicates a PRISMA-conforming SLR to evaluate its effectiveness. The methodology includes details on the selected Gold Standard SLR, dataset extraction and preparation, experimental design, LLM selection and fine-tuning strategy, RAG implementation, and hardware specifications. The evaluation focuses on factual accuracy of responses generated by different methodologies applied to the SLR dataset, using quantitative and qualitative analyses. The study concludes that the proposed framework effectively automates SLR processes, ensuring factual accuracy and reliability in literature reviews.