MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation

MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation

3 Jul 2024 | Yongan Zhang, Zhongzhi Yu, Yonggan Fu, Cheng Wan, Yingyan (Celine) Lin
The paper introduces MG-Verilog, a multi-grained dataset designed to enhance Large Language Models (LLMs) in hardware design tasks. Existing hardware datasets are often limited in size, complexity, or detail, which hinders LLM performance. To address this, the authors propose criteria for high-quality hardware datasets and develop MG-Verilog, which includes hardware descriptions at various levels of detail and corresponding Verilog code samples. The dataset is open-sourced and provides infrastructure for easy access, integration, and extension. A balanced fine-tuning scheme is introduced to leverage the diverse levels of detail in the dataset, improving LLM performance in hardware design tasks. The dataset includes over 11,000 Verilog code samples and their descriptions, covering a wide range of complexity and detail levels. The dataset is structured to balance design generation accuracy and user-friendliness, with descriptions ranging from high-level summaries to detailed line-by-line comments. The authors demonstrate that LLMs fine-tuned with MG-Verilog outperform those trained on other datasets in terms of code generation accuracy and hardware design sophistication. The dataset is publicly available and can be used for various LLM-assisted hardware design tasks. The paper also discusses related work and concludes that MG-Verilog is an effective solution for improving LLM performance in hardware design.The paper introduces MG-Verilog, a multi-grained dataset designed to enhance Large Language Models (LLMs) in hardware design tasks. Existing hardware datasets are often limited in size, complexity, or detail, which hinders LLM performance. To address this, the authors propose criteria for high-quality hardware datasets and develop MG-Verilog, which includes hardware descriptions at various levels of detail and corresponding Verilog code samples. The dataset is open-sourced and provides infrastructure for easy access, integration, and extension. A balanced fine-tuning scheme is introduced to leverage the diverse levels of detail in the dataset, improving LLM performance in hardware design tasks. The dataset includes over 11,000 Verilog code samples and their descriptions, covering a wide range of complexity and detail levels. The dataset is structured to balance design generation accuracy and user-friendliness, with descriptions ranging from high-level summaries to detailed line-by-line comments. The authors demonstrate that LLMs fine-tuned with MG-Verilog outperform those trained on other datasets in terms of code generation accuracy and hardware design sophistication. The dataset is publicly available and can be used for various LLM-assisted hardware design tasks. The paper also discusses related work and concludes that MG-Verilog is an effective solution for improving LLM performance in hardware design.
Reach us at info@study.space