Octopus v2: On-device language model for super agent

Octopus v2: On-device language model for super agent

16 Apr 2024 | Wei Chen, Zhiyuan Li
Octopus v2 is an on-device language model designed to enhance the performance of AI agents in software applications. It enables a 2B parameter model to outperform GPT-4 in both accuracy and latency, while reducing context length by 95%. Compared to Llama-7B with RAG-based function calling, Octopus v2 improves latency by 35 times. This model is suitable for deployment on edge devices, aligning with real-world application requirements. The model uses functional tokens to represent specific functions, allowing the model to understand software capabilities and map function descriptions to tokens. This approach enhances function calling accuracy and reduces token usage. The model is fine-tuned from Gemma 2B, achieving 99.524% accuracy in evaluations. It also supports parallel and nested function calls, with 4K data points per API required for accurate performance. The Octopus model demonstrates exceptional performance with 1,000 data points per API during training. However, it can still achieve high accuracy with as few as 100 data points. The model can be trained using full model training or LoRA training, with LoRA providing a minor accuracy decrease but maintaining high performance for production use. The model's application extends to various domains, including Android, vehicle, Yelp, and DoorDash functions. It enables efficient function calling across diverse systems, with potential for deployment on PCs, smartphones, and wearable technology. Future work includes developing a model for on-device reasoning, aiming to achieve faster performance and support local deployment for privacy and cost considerations.Octopus v2 is an on-device language model designed to enhance the performance of AI agents in software applications. It enables a 2B parameter model to outperform GPT-4 in both accuracy and latency, while reducing context length by 95%. Compared to Llama-7B with RAG-based function calling, Octopus v2 improves latency by 35 times. This model is suitable for deployment on edge devices, aligning with real-world application requirements. The model uses functional tokens to represent specific functions, allowing the model to understand software capabilities and map function descriptions to tokens. This approach enhances function calling accuracy and reduces token usage. The model is fine-tuned from Gemma 2B, achieving 99.524% accuracy in evaluations. It also supports parallel and nested function calls, with 4K data points per API required for accurate performance. The Octopus model demonstrates exceptional performance with 1,000 data points per API during training. However, it can still achieve high accuracy with as few as 100 data points. The model can be trained using full model training or LoRA training, with LoRA providing a minor accuracy decrease but maintaining high performance for production use. The model's application extends to various domains, including Android, vehicle, Yelp, and DoorDash functions. It enables efficient function calling across diverse systems, with potential for deployment on PCs, smartphones, and wearable technology. Future work includes developing a model for on-device reasoning, aiming to achieve faster performance and support local deployment for privacy and cost considerations.
Reach us at info@study.space
[slides and audio] Octopus v2%3A On-device language model for super agent