LAWGPT: A Chinese Legal Knowledge-Enhanced Large Language Model

LAWGPT: A Chinese Legal Knowledge-Enhanced Large Language Model

7 Jun 2024 | Zhi Zhou1†, Jiang-Xin Shi12†, Peng-Xiao Song1†, Xiao-Wen Yang12†, Yi-Xuan Jin1, Lan-Zhe Guo13‡, Yu-Feng Li12†
**LAWGPT: A Chinese Legal Knowledge-Enhanced Large Language Model** Zhi Zhou, Jiang-Xin Shi, Peng-Xiao Song, Xiao-Wen Yang, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li National Key Laboratory for Novel Software Technology, Nanjing University School of Artificial Intelligence, Nanjing University School of Intelligence Science and Technology, Nanjing University {zhouz,shijx,songpx,yangx,jinyx,guolz,liyf}@landa.nju.edu.cn **Abstract** Large language models (LLMs) have shown remarkable capabilities in various downstream tasks, but they fail to meet the specific requirements of practical Chinese legal applications. Proprietary models lack data privacy for sensitive legal cases, while open-source models perform poorly due to insufficient legal knowledge. To address this, we introduce LAWGPT, the first open-source model specifically designed for Chinese legal applications. LAWGPT consists of two key components: legal-oriented pre-training and legal supervised fine-tuning. We use a large-scale Chinese legal document corpus for pre-training to incorporate legal domain knowledge. For fine-tuning, we create a knowledge-driven instruction dataset to enhance the model's performance on downstream legal tasks. Experimental results show that LAWGPT outperforms the open-source LLaMA 7B model. Our code and resources are publicly available at https://github.com/pengxiao-song/LawGPT and have received 5.7K stars on GitHub. **Introduction** Large language models (LLMs) have achieved significant success in various natural language processing (NLP) tasks. However, they struggle with practical Chinese legal applications due to data privacy concerns and insufficient legal knowledge. To overcome these challenges, we introduce LAWGPT, an open-source Chinese legal knowledge-enhanced large language model. LAWGPT is designed to ensure data privacy and enhance legal domain knowledge through legal-oriented pre-training and legal supervised fine-tuning. Our experimental results demonstrate that LAWGPT outperforms the LLaMA 7B model in major legal tasks, highlighting its effectiveness in practical Chinese legal applications. **Related Work** We review existing work on legal tasks using LLMs, including general language models, legal language models, and legal benchmarks. General LLMs have shown impressive performance on various tasks, but they lack legal domain knowledge. Legal language models fine-tune on pre-trained models or train from scratch using legal data. Legal benchmarks evaluate models across cognitive levels, such as legal knowledge memorization, understanding, and application. **Methodology** LAWGPT addresses the lack of legal domain knowledge and insufficient training on downstream legal tasks. We apply legal-oriented pre-training to incorporate legal domain knowledge and legal supervised fine-tuning to enhance performance on specific legal tasks. Our experiments show that LAWGPT outperforms the LLaMA 7B model in major legal tasks, despite the performance gap with proprietary**LAWGPT: A Chinese Legal Knowledge-Enhanced Large Language Model** Zhi Zhou, Jiang-Xin Shi, Peng-Xiao Song, Xiao-Wen Yang, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li National Key Laboratory for Novel Software Technology, Nanjing University School of Artificial Intelligence, Nanjing University School of Intelligence Science and Technology, Nanjing University {zhouz,shijx,songpx,yangx,jinyx,guolz,liyf}@landa.nju.edu.cn **Abstract** Large language models (LLMs) have shown remarkable capabilities in various downstream tasks, but they fail to meet the specific requirements of practical Chinese legal applications. Proprietary models lack data privacy for sensitive legal cases, while open-source models perform poorly due to insufficient legal knowledge. To address this, we introduce LAWGPT, the first open-source model specifically designed for Chinese legal applications. LAWGPT consists of two key components: legal-oriented pre-training and legal supervised fine-tuning. We use a large-scale Chinese legal document corpus for pre-training to incorporate legal domain knowledge. For fine-tuning, we create a knowledge-driven instruction dataset to enhance the model's performance on downstream legal tasks. Experimental results show that LAWGPT outperforms the open-source LLaMA 7B model. Our code and resources are publicly available at https://github.com/pengxiao-song/LawGPT and have received 5.7K stars on GitHub. **Introduction** Large language models (LLMs) have achieved significant success in various natural language processing (NLP) tasks. However, they struggle with practical Chinese legal applications due to data privacy concerns and insufficient legal knowledge. To overcome these challenges, we introduce LAWGPT, an open-source Chinese legal knowledge-enhanced large language model. LAWGPT is designed to ensure data privacy and enhance legal domain knowledge through legal-oriented pre-training and legal supervised fine-tuning. Our experimental results demonstrate that LAWGPT outperforms the LLaMA 7B model in major legal tasks, highlighting its effectiveness in practical Chinese legal applications. **Related Work** We review existing work on legal tasks using LLMs, including general language models, legal language models, and legal benchmarks. General LLMs have shown impressive performance on various tasks, but they lack legal domain knowledge. Legal language models fine-tune on pre-trained models or train from scratch using legal data. Legal benchmarks evaluate models across cognitive levels, such as legal knowledge memorization, understanding, and application. **Methodology** LAWGPT addresses the lack of legal domain knowledge and insufficient training on downstream legal tasks. We apply legal-oriented pre-training to incorporate legal domain knowledge and legal supervised fine-tuning to enhance performance on specific legal tasks. Our experiments show that LAWGPT outperforms the LLaMA 7B model in major legal tasks, despite the performance gap with proprietary
Reach us at info@study.space
[slides and audio] LawGPT%3A A Chinese Legal Knowledge-Enhanced Large Language Model