2024 | Yuesong Shen, Nico Daheim, Bai Cong, Peter Nickl, Gian Maria Marcon, Clement Bazan, Rio Yokota, Iryna Gurevych, Daniel Cremers, Mohammad Emtiyaz Khan, Thomas M{"o}llenhoff
This paper presents the Improved Variational Online Newton (IVON) method, which adapts the method of Lin et al. (2020) to large-scale problems and achieves state-of-the-art accuracy and uncertainty at nearly identical computational cost as Adam. IVON consistently matches or outperforms Adam in training large networks such as GPT-2 and ResNets. It provides better predictive uncertainty and is effective for tasks like finetuning, model merging, and generalization prediction. IVON is a second-order optimizer that performs well on image classification and large language models, with results showing improved accuracy and calibration. The method uses a simplified Hessian-estimation scheme and an Adam-like implementation. IVON is effective for large deep networks, especially large language models, and can be easily adapted to flexible posterior forms. The paper demonstrates that variational learning is not only effective but also useful for large deep networks, with IVON showing strong performance in various tasks and settings. The results indicate that IVON is a promising approach for Bayesian learning in deep learning.This paper presents the Improved Variational Online Newton (IVON) method, which adapts the method of Lin et al. (2020) to large-scale problems and achieves state-of-the-art accuracy and uncertainty at nearly identical computational cost as Adam. IVON consistently matches or outperforms Adam in training large networks such as GPT-2 and ResNets. It provides better predictive uncertainty and is effective for tasks like finetuning, model merging, and generalization prediction. IVON is a second-order optimizer that performs well on image classification and large language models, with results showing improved accuracy and calibration. The method uses a simplified Hessian-estimation scheme and an Adam-like implementation. IVON is effective for large deep networks, especially large language models, and can be easily adapted to flexible posterior forms. The paper demonstrates that variational learning is not only effective but also useful for large deep networks, with IVON showing strong performance in various tasks and settings. The results indicate that IVON is a promising approach for Bayesian learning in deep learning.