This study explores the effectiveness of fine-tuning large language models (LLMs) for five complex chemical text mining tasks: compound entity recognition, reaction role labelling, metal–organic framework (MOF) synthesis information extraction, nuclear magnetic resonance (NMR) data extraction, and the conversion of reaction paragraphs into action sequences. The fine-tuned LLMs demonstrated impressive performance, significantly reducing the need for repetitive and extensive prompt engineering experiments. The results showed that the fine-tuned ChatGPT models excelled in all tasks, achieving exact accuracy levels ranging from 69% to 95% with minimal annotated data. They even outperformed task-adaptive pre-training and fine-tuning models that were based on significantly larger in-domain data. Fine-tuned Mistral and Llama3 also showed competitive abilities. Given their versatility, robustness, and low-code capability, leveraging fine-tuned LLMs as flexible and effective toolkits for automated data acquisition could revolutionize chemical knowledge extraction.
The study compared the performance of fine-tuned LLMs with other models, including GPT-3.5-turbo, Mistral, Llama3, Llama2, T5, and BART. The fine-tuned ChatGPT models achieved state-of-the-art performance across all five tasks. For the Paragraph2Compound task, the fine-tuned models achieved F1 scores over 0.6. For the Paragraph2RXNRole task, the fine-tuned GPT-3.5-turbo achieved an F1 score of 77.1% for product extraction and 83.0% for reaction role labelling. For the Paragraph2MOFInfo task, the fine-tuned GPT-3.5-turbo achieved an exact match accuracy of 82.7% for single reactions and 68.8% for multiple reactions. For the Paragraph2NMR task, the fine-tuned GPT-3.5-turbo achieved high Levenshtein similarity and exact match accuracy. For the Paragraph2Action task, the fine-tuned models achieved high full sentence exact accuracy.
The study also highlights the potential of fine-tuning LLMs for chemical data mining, demonstrating their effectiveness in various tasks. The results show that fine-tuned LLMs can be easily generalizable and can optimize the labor-intensive and time-consuming data collection workflow, even with few data. This will accelerate the discovery and creation of novel substances, making them powerful tools for universal use. The study also discusses the challenges of using LLMs for chemical text mining, including the need for annotated data, the limitations of model architecture and memory, and the ambiguity of human expressions. The study concludes that fine-tuning LLMs is a promising approach for chemical text mining, with the potential to revolutionize the field.This study explores the effectiveness of fine-tuning large language models (LLMs) for five complex chemical text mining tasks: compound entity recognition, reaction role labelling, metal–organic framework (MOF) synthesis information extraction, nuclear magnetic resonance (NMR) data extraction, and the conversion of reaction paragraphs into action sequences. The fine-tuned LLMs demonstrated impressive performance, significantly reducing the need for repetitive and extensive prompt engineering experiments. The results showed that the fine-tuned ChatGPT models excelled in all tasks, achieving exact accuracy levels ranging from 69% to 95% with minimal annotated data. They even outperformed task-adaptive pre-training and fine-tuning models that were based on significantly larger in-domain data. Fine-tuned Mistral and Llama3 also showed competitive abilities. Given their versatility, robustness, and low-code capability, leveraging fine-tuned LLMs as flexible and effective toolkits for automated data acquisition could revolutionize chemical knowledge extraction.
The study compared the performance of fine-tuned LLMs with other models, including GPT-3.5-turbo, Mistral, Llama3, Llama2, T5, and BART. The fine-tuned ChatGPT models achieved state-of-the-art performance across all five tasks. For the Paragraph2Compound task, the fine-tuned models achieved F1 scores over 0.6. For the Paragraph2RXNRole task, the fine-tuned GPT-3.5-turbo achieved an F1 score of 77.1% for product extraction and 83.0% for reaction role labelling. For the Paragraph2MOFInfo task, the fine-tuned GPT-3.5-turbo achieved an exact match accuracy of 82.7% for single reactions and 68.8% for multiple reactions. For the Paragraph2NMR task, the fine-tuned GPT-3.5-turbo achieved high Levenshtein similarity and exact match accuracy. For the Paragraph2Action task, the fine-tuned models achieved high full sentence exact accuracy.
The study also highlights the potential of fine-tuning LLMs for chemical data mining, demonstrating their effectiveness in various tasks. The results show that fine-tuned LLMs can be easily generalizable and can optimize the labor-intensive and time-consuming data collection workflow, even with few data. This will accelerate the discovery and creation of novel substances, making them powerful tools for universal use. The study also discusses the challenges of using LLMs for chemical text mining, including the need for annotated data, the limitations of model architecture and memory, and the ambiguity of human expressions. The study concludes that fine-tuning LLMs is a promising approach for chemical text mining, with the potential to revolutionize the field.