[slides and audio] Weaver%3A Foundation Models for Creative Writing

WEAVER is a family of large language models (LLMs) specifically designed for creative writing. Pre-trained on a carefully selected corpus to enhance writing capabilities, WEAVER is fine-tuned for creative and professional writing, and aligned with professional writers' preferences using novel methods for instruction data synthesis and LLM alignment. The WEAVER family includes models of MINI (1.8B), BASE (6B), PRO (14B), and ULTRA (34B) sizes, suitable for various applications and dynamically dispatched based on query complexity to balance response quality and computation cost. Evaluation on a benchmark shows WEAVER models outperform generalist LLMs, with the ULTRA model surpassing GPT-4 in various writing scenarios. WEAVER natively supports retrieval-augmented generation (RAG) and function calling. The paper presents various use cases of these abilities, including integration of external knowledge bases, tools, or APIs, and personalized writing assistance. It also discusses guidelines and best practices for pre-training and fine-tuning domain-specific LLMs. WEAVER is available at www.wawwriter.com, an innovative human-AI collaborative writing platform. The paper discusses innovations in the platform from the perspective of human-computer interaction to explain how it will revolutionize traditional AI-assisted writing systems. WEAVER is pre-trained on a carefully curated dataset, with data synthesis methods including instruction backtranslation and Constitutional DPO for preference optimization. The models are aligned through supervised fine-tuning and preference optimization. Evaluation on WRITEBENCH shows WEAVER's superior performance in creative writing. The paper also introduces WAWAWRITER, a next-generation AI-assisted writing platform that supports human-AI collaborative writing, integration of external knowledge and tools, personalized writing assistance, and infinite long text generation. The results confirm the effectiveness of the data synthesis and training framework for domain-specific LLMs.WEAVER is a family of large language models (LLMs) specifically designed for creative writing. Pre-trained on a carefully selected corpus to enhance writing capabilities, WEAVER is fine-tuned for creative and professional writing, and aligned with professional writers' preferences using novel methods for instruction data synthesis and LLM alignment. The WEAVER family includes models of MINI (1.8B), BASE (6B), PRO (14B), and ULTRA (34B) sizes, suitable for various applications and dynamically dispatched based on query complexity to balance response quality and computation cost. Evaluation on a benchmark shows WEAVER models outperform generalist LLMs, with the ULTRA model surpassing GPT-4 in various writing scenarios. WEAVER natively supports retrieval-augmented generation (RAG) and function calling. The paper presents various use cases of these abilities, including integration of external knowledge bases, tools, or APIs, and personalized writing assistance. It also discusses guidelines and best practices for pre-training and fine-tuning domain-specific LLMs. WEAVER is available at www.wawwriter.com, an innovative human-AI collaborative writing platform. The paper discusses innovations in the platform from the perspective of human-computer interaction to explain how it will revolutionize traditional AI-assisted writing systems. WEAVER is pre-trained on a carefully curated dataset, with data synthesis methods including instruction backtranslation and Constitutional DPO for preference optimization. The models are aligned through supervised fine-tuning and preference optimization. Evaluation on WRITEBENCH shows WEAVER's superior performance in creative writing. The paper also introduces WAWAWRITER, a next-generation AI-assisted writing platform that supports human-AI collaborative writing, integration of external knowledge and tools, personalized writing assistance, and infinite long text generation. The results confirm the effectiveness of the data synthesis and training framework for domain-specific LLMs.

WEAVER: Foundation Models for Creative Writing