[slides and audio] Astraios%3A Parameter-Efficient Instruction Tuning Code Large Language Models

The paper introduces ASTRAIOS, a suite of 28 instruction-tuned OctoCoder models using seven tuning methods and four model sizes up to 16 billion parameters. The study investigates the performance of these models across five tasks and eight datasets, covering both code comprehension and code generation tasks. Key findings include: 1. **Full-Parameter Fine-Tuning (FFT)**: Generally leads to the best downstream performance across all scales. 2. **Parameter-Efficient Fine-Tuning (PEFT)**: Methods differ significantly in efficacy based on model scale. LoRA offers the most favorable trade-off between cost and performance. 3. **Model Robustness and Security**: Larger models tend to show reduced robustness and less security. 4. **Relationships Between Updated Parameters and Task Performance**: The final loss of small models can be extrapolated to larger ones, and validation loss in instruction tuning is a reliable indicator of overall downstream performance. The paper also explores the scalability of different tuning methods, the impact of model size and training time on cross-entropy loss, and the relationships between updated parameters, cross-entropy loss, and task performance. The results highlight the importance of understanding these models through comprehensive evaluation and suggest that instruction tuning loss can be a strong predictor of downstream performance.The paper introduces ASTRAIOS, a suite of 28 instruction-tuned OctoCoder models using seven tuning methods and four model sizes up to 16 billion parameters. The study investigates the performance of these models across five tasks and eight datasets, covering both code comprehension and code generation tasks. Key findings include: 1. **Full-Parameter Fine-Tuning (FFT)**: Generally leads to the best downstream performance across all scales. 2. **Parameter-Efficient Fine-Tuning (PEFT)**: Methods differ significantly in efficacy based on model scale. LoRA offers the most favorable trade-off between cost and performance. 3. **Model Robustness and Security**: Larger models tend to show reduced robustness and less security. 4. **Relationships Between Updated Parameters and Task Performance**: The final loss of small models can be extrapolated to larger ones, and validation loss in instruction tuning is a reliable indicator of overall downstream performance. The paper also explores the scalability of different tuning methods, the impact of model size and training time on cross-entropy loss, and the relationships between updated parameters, cross-entropy loss, and task performance. The results highlight the importance of understanding these models through comprehensive evaluation and suggest that instruction tuning loss can be a strong predictor of downstream performance.

ASTRAIOS: Parameter-Efficient Instruction Tuning Code Large Language Models

1 Jan 2024 | Terry Yue Zhuo, Armel Zebaze, Nitchakarn Suppattarachai, Leandro von Werra, Harm de Vries, Qian Liu, Niklas Muennighoff