Under review as a conference paper at ICLR 2017 | Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, Alexander Lerchner
The paper introduces β-VAE, a novel framework for unsupervised learning of interpretable factorized representations from raw image data. β-VAE is an extension of the variational autoencoder (VAE) framework, incorporating a hyperparameter β that balances the capacity of the latent information channel and the independence constraints on the latent representations. By tuning β, the model can be optimized for either reconstruction accuracy or disentanglement performance. The authors demonstrate that β-VAE outperforms both the original VAE and state-of-the-art unsupervised and semi-supervised approaches (InfoGAN and DC-IGN) in terms of disentanglement on various datasets, including CelebA, chairs, and faces. They also propose a quantitative metric to measure the degree of disentanglement and show that β-VAE consistently achieves better results. The framework is stable, requires minimal design decisions, and can be optimized using a single hyperparameter β, making it a promising approach for developing AI that can learn and reason like humans.The paper introduces β-VAE, a novel framework for unsupervised learning of interpretable factorized representations from raw image data. β-VAE is an extension of the variational autoencoder (VAE) framework, incorporating a hyperparameter β that balances the capacity of the latent information channel and the independence constraints on the latent representations. By tuning β, the model can be optimized for either reconstruction accuracy or disentanglement performance. The authors demonstrate that β-VAE outperforms both the original VAE and state-of-the-art unsupervised and semi-supervised approaches (InfoGAN and DC-IGN) in terms of disentanglement on various datasets, including CelebA, chairs, and faces. They also propose a quantitative metric to measure the degree of disentanglement and show that β-VAE consistently achieves better results. The framework is stable, requires minimal design decisions, and can be optimized using a single hyperparameter β, making it a promising approach for developing AI that can learn and reason like humans.