[slides] Extending Llama-3's Context Ten-Fold Overnight

The authors extend the context length of Llama-3-8B-Instruct from 8K to 80K using QLoRA fine-tuning, achieving this in just 8 hours on an 8xA800 (80G) GPU. The model demonstrates superior performance across various long-context tasks, including NIPS, topic retrieval, and long-context language understanding, while maintaining its short-context capabilities. The context extension is primarily attributed to 3.5K synthetic training samples generated by GPT-4, highlighting the potential of LLMs to extend their context length. The team has released all resources, including data, model, and training code, to facilitate future research. The model, named Llama-3-8B-Instruct-80K-QLoRA, shows remarkable performance on downstream tasks and is publicly available for further exploration.The authors extend the context length of Llama-3-8B-Instruct from 8K to 80K using QLoRA fine-tuning, achieving this in just 8 hours on an 8xA800 (80G) GPU. The model demonstrates superior performance across various long-context tasks, including NIPS, topic retrieval, and long-context language understanding, while maintaining its short-context capabilities. The context extension is primarily attributed to 3.5K synthetic training samples generated by GPT-4, highlighting the potential of LLMs to extend their context length. The team has released all resources, including data, model, and training code, to facilitate future research. The model, named Llama-3-8B-Instruct-80K-QLoRA, shows remarkable performance on downstream tasks and is publicly available for further exploration.

Extending Llama-3's Context Ten-Fold Overnight

30 Apr 2024 | Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hongjin Qian, Qiwei Ye, Zhicheng Dou