β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

2017 | Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner
The paper introduces β-VAE, a new framework for learning interpretable, factorised latent representations of data generative factors from raw image data in an unsupervised manner. β-VAE is a modification of the variational autoencoder (VAE) framework, introducing an adjustable hyperparameter β that balances latent channel capacity and independence constraints with reconstruction accuracy. β-VAE with appropriately tuned β > 1 outperforms VAE (β = 1), InfoGAN, and DC-IGN on various datasets, including celebA, faces, and chairs. It is stable to train, requires few assumptions about the data, and relies on tuning a single hyperparameter β, which can be optimised using weakly labelled data or heuristic visual inspection. β-VAE learns disentangled representations where single latent units are sensitive to changes in single generative factors, while being relatively invariant to changes in other factors. This allows for generalisation across different configurations of other factors. The paper proposes a new quantitative metric to evaluate the degree of disentanglement, showing that β-VAE significantly outperforms baselines such as ICA, PCA, VAE, DC-IGN, and InfoGAN. The β-VAE framework is derived by introducing a constraint over the inferred posterior distribution, encouraging statistical independence in the latent factors. This is achieved by matching the inferred distribution to a prior that controls the capacity of the latent information bottleneck. The framework is shown to consistently discover more latent factors and disentangle them more effectively than other approaches, including InfoGAN and DC-IGN. β-VAE is also robust to different architectures, optimisation parameters, and datasets, requiring few design decisions. The paper concludes that β-VAE achieves state-of-the-art results for learning disentangled representations of data generative factors. It is an important step towards developing more human-like learning and reasoning in AI, and can be used as an unsupervised pretraining stage for supervised or reinforcement learning.The paper introduces β-VAE, a new framework for learning interpretable, factorised latent representations of data generative factors from raw image data in an unsupervised manner. β-VAE is a modification of the variational autoencoder (VAE) framework, introducing an adjustable hyperparameter β that balances latent channel capacity and independence constraints with reconstruction accuracy. β-VAE with appropriately tuned β > 1 outperforms VAE (β = 1), InfoGAN, and DC-IGN on various datasets, including celebA, faces, and chairs. It is stable to train, requires few assumptions about the data, and relies on tuning a single hyperparameter β, which can be optimised using weakly labelled data or heuristic visual inspection. β-VAE learns disentangled representations where single latent units are sensitive to changes in single generative factors, while being relatively invariant to changes in other factors. This allows for generalisation across different configurations of other factors. The paper proposes a new quantitative metric to evaluate the degree of disentanglement, showing that β-VAE significantly outperforms baselines such as ICA, PCA, VAE, DC-IGN, and InfoGAN. The β-VAE framework is derived by introducing a constraint over the inferred posterior distribution, encouraging statistical independence in the latent factors. This is achieved by matching the inferred distribution to a prior that controls the capacity of the latent information bottleneck. The framework is shown to consistently discover more latent factors and disentangle them more effectively than other approaches, including InfoGAN and DC-IGN. β-VAE is also robust to different architectures, optimisation parameters, and datasets, requiring few design decisions. The paper concludes that β-VAE achieves state-of-the-art results for learning disentangled representations of data generative factors. It is an important step towards developing more human-like learning and reasoning in AI, and can be used as an unsupervised pretraining stage for supervised or reinforcement learning.
Reach us at info@study.space
[slides and audio] beta-VAE%3A Learning Basic Visual Concepts with a Constrained Variational Framework