3 Jan 2024 | Xianjun Yang, Junfeng Gao, Wenxin Xue, Erik Alexandersson
PLLaMa is an open-source large language model developed from LLaMa-2, enhanced with a comprehensive database of over 1.5 million plant science scholarly articles. This model is designed to improve understanding and application in plant and agricultural sciences. The model undergoes extended pretraining with plant science articles and instruction-based fine-tuning to enhance its performance in plant science-related queries. An international panel of experts verifies the accuracy of PLLaMa's responses. The model's checkpoints and source code are made publicly available for research and development. The model was trained using eight A100 80G GPUs, with a maximum token length of 1024 and a pretraining duration of about 26 hours for the 7B model and 57 hours for the 13B model. Instruction tuning was performed using four A100 80G GPUs, with a training duration of about 1.3 hours for the 7B model and 2.7 hours for the 13B model. The model was evaluated on plant science quizzes, achieving around 60% accuracy on multiple-choice questions. PLLaMa demonstrates improved performance in plant science-related tasks and is intended for further development and application in the field. The model's training and evaluation processes are detailed in the paper, along with related work and experimental configurations. The paper also includes a list of 750 plant science journal names used in the model's training.PLLaMa is an open-source large language model developed from LLaMa-2, enhanced with a comprehensive database of over 1.5 million plant science scholarly articles. This model is designed to improve understanding and application in plant and agricultural sciences. The model undergoes extended pretraining with plant science articles and instruction-based fine-tuning to enhance its performance in plant science-related queries. An international panel of experts verifies the accuracy of PLLaMa's responses. The model's checkpoints and source code are made publicly available for research and development. The model was trained using eight A100 80G GPUs, with a maximum token length of 1024 and a pretraining duration of about 26 hours for the 7B model and 57 hours for the 13B model. Instruction tuning was performed using four A100 80G GPUs, with a training duration of about 1.3 hours for the 7B model and 2.7 hours for the 13B model. The model was evaluated on plant science quizzes, achieving around 60% accuracy on multiple-choice questions. PLLaMa demonstrates improved performance in plant science-related tasks and is intended for further development and application in the field. The model's training and evaluation processes are detailed in the paper, along with related work and experimental configurations. The paper also includes a list of 750 plant science journal names used in the model's training.