15 Feb 2024 | Leonidas Gee, Andrea Zugarini, Leonardo Rigutini, Paolo Torroni
The paper "Fast Vocabulary Transfer for Language Model Compression" by Leonidas Gee, Andrea Zugarini, Leonardo Rigutini, and Paolo Torroni proposes a new method for compressing language models (LMs) by leveraging vocabulary transfer. The authors evaluate their method on various vertical domains and downstream tasks, demonstrating that vocabulary transfer can effectively reduce model size and inference time while maintaining marginal performance loss. The technique, called Vocabulary Transfer (VT), involves training a tokenizer on the specific domain to adapt the LM's vocabulary, thereby reducing the length of tokenized sequences and improving computational efficiency. The paper also discusses the combination of VT with Knowledge Distillation (KD) to further enhance compression and speedup. Experimental results show that VT achieves an inference speed-up of up to 1.40x and a significant reduction in model size, with minimal performance degradation. The authors conclude that VT provides a strategic trade-off between compression rate, inference speed, and accuracy, particularly in specialized domains.The paper "Fast Vocabulary Transfer for Language Model Compression" by Leonidas Gee, Andrea Zugarini, Leonardo Rigutini, and Paolo Torroni proposes a new method for compressing language models (LMs) by leveraging vocabulary transfer. The authors evaluate their method on various vertical domains and downstream tasks, demonstrating that vocabulary transfer can effectively reduce model size and inference time while maintaining marginal performance loss. The technique, called Vocabulary Transfer (VT), involves training a tokenizer on the specific domain to adapt the LM's vocabulary, thereby reducing the length of tokenized sequences and improving computational efficiency. The paper also discusses the combination of VT with Knowledge Distillation (KD) to further enhance compression and speedup. Experimental results show that VT achieves an inference speed-up of up to 1.40x and a significant reduction in model size, with minimal performance degradation. The authors conclude that VT provides a strategic trade-off between compression rate, inference speed, and accuracy, particularly in specialized domains.