6 Jun 2024 | Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marconi, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas Möllenhoff
The paper presents the Improved Variational Online Newton (IVON) method, which demonstrates that variational learning can effectively train large deep networks, including Large Language Models (LLMs) such as GPT-2 and ResNets. IVON consistently matches or outperforms Adam in terms of both accuracy and predictive uncertainty, while maintaining similar computational costs. The authors provide extensive empirical evidence to support their claims, including comparisons with other methods like MC-dropout, SWAG, and Laplace. IVON is shown to improve accuracy and uncertainty in various image classification tasks, pretrain large language models, and enable new use cases such as model merging and generalization error prediction. The paper also discusses the practical implementation of IVON and its potential for future research, highlighting its effectiveness in large-scale deep learning problems.The paper presents the Improved Variational Online Newton (IVON) method, which demonstrates that variational learning can effectively train large deep networks, including Large Language Models (LLMs) such as GPT-2 and ResNets. IVON consistently matches or outperforms Adam in terms of both accuracy and predictive uncertainty, while maintaining similar computational costs. The authors provide extensive empirical evidence to support their claims, including comparisons with other methods like MC-dropout, SWAG, and Laplace. IVON is shown to improve accuracy and uncertainty in various image classification tasks, pretrain large language models, and enable new use cases such as model merging and generalization error prediction. The paper also discusses the practical implementation of IVON and its potential for future research, highlighting its effectiveness in large-scale deep learning problems.