2024-01-08 | Ernest Perkowski, Rui Pan, Tuan Dung Nguyen, Yuan-Sen Ting, Sandor Kruk, Tong Zhang, Charlie O'Neill, Maja Jablonska, Zechang Sun, Michael J. Smith, Huiling Liu, Kevin Schawinski, Kartheek Iyer, Ioana Ciucă, and UNIVERSETBD
The paper "AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets" explores enhancing the performance of large language models (LLMs) in astronomy-focused question-answering through targeted, continual pre-training. The authors, from various institutions, including the European Space Agency and Tsinghua University, focus on a 7B-parameter LLaMA-2 model trained on a curated set of astronomy corpora, including abstracts, introductions, and conclusions. This approach achieves notable improvements in specialized topic comprehension, despite general LLMs like GPT-4 excelling in broader scenarios due to superior reasoning capabilities.
The paper introduces AstroLLaMA-Chat, an advanced version of AstroLLaMA, which broadens the training scope to include more detailed sections of papers. The training process involves fine-tuning on a domain-specific conversational dataset, leveraging GPT-4 to generate and answer questions from arXiv papers. The model is trained using advanced techniques such as Flash Attention, ZeRO Optimization, and long-context methods, significantly reducing training time.
The authors highlight that while AstroLLaMA-Chat may not consistently outperform GPT-4 and LLaMA-2 in general astronomy-related Q&A, it performs better in highly specialized topics, such as the dimensionality of elemental abundance space and recent studies in cosmology. The model also shows marginal advantages in completing abstracts and addressing contemporary research areas.
The paper concludes by emphasizing the benefits of continual pre-training on dedicated astronomy corpora and the potential for smaller models to achieve competitive performance with modest computational resources. The model is available on Hugging Face, and the authors plan to release a more substantial 70B version in an upcoming full paper.The paper "AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets" explores enhancing the performance of large language models (LLMs) in astronomy-focused question-answering through targeted, continual pre-training. The authors, from various institutions, including the European Space Agency and Tsinghua University, focus on a 7B-parameter LLaMA-2 model trained on a curated set of astronomy corpora, including abstracts, introductions, and conclusions. This approach achieves notable improvements in specialized topic comprehension, despite general LLMs like GPT-4 excelling in broader scenarios due to superior reasoning capabilities.
The paper introduces AstroLLaMA-Chat, an advanced version of AstroLLaMA, which broadens the training scope to include more detailed sections of papers. The training process involves fine-tuning on a domain-specific conversational dataset, leveraging GPT-4 to generate and answer questions from arXiv papers. The model is trained using advanced techniques such as Flash Attention, ZeRO Optimization, and long-context methods, significantly reducing training time.
The authors highlight that while AstroLLaMA-Chat may not consistently outperform GPT-4 and LLaMA-2 in general astronomy-related Q&A, it performs better in highly specialized topics, such as the dimensionality of elemental abundance space and recent studies in cosmology. The model also shows marginal advantages in completing abstracts and addressing contemporary research areas.
The paper concludes by emphasizing the benefits of continual pre-training on dedicated astronomy corpora and the potential for smaller models to achieve competitive performance with modest computational resources. The model is available on Hugging Face, and the authors plan to release a more substantial 70B version in an upcoming full paper.