A Style-Based Generator Architecture for Generative Adversarial Networks

A Style-Based Generator Architecture for Generative Adversarial Networks

29 Mar 2019 | Tero Karras, Samuli Laine, Timo Aila
This paper introduces a style-based generator architecture for generative adversarial networks (GANs), which improves image generation quality and enables better control over the synthesis process. The proposed architecture separates high-level attributes (e.g., pose, identity) from stochastic variation (e.g., freckles, hair) in generated images, allowing for intuitive, scale-specific control. The generator improves traditional distribution quality metrics, leads to better interpolation properties, and better disentangles latent factors of variation. Two new automated metrics, perceptual path length and linear separability, are proposed to quantify these aspects. A new, high-quality dataset of human faces (FFHQ) is also introduced. The style-based generator starts from a learned constant input and adjusts the "style" of the image at each convolution layer based on the latent code, directly controlling the strength of image features at different scales. Combined with noise injected directly into the network, this architectural change leads to automatic, unsupervised separation of high-level attributes from stochastic variation in the generated images. The generator embeds the input latent code into an intermediate latent space, which has a profound effect on how the factors of variation are represented in the network. The input latent space must follow the probability density of the training data, and we argue that this leads to some degree of unavoidable entanglement. Our intermediate latent space is free from that restriction and is therefore allowed to be disentangled. The generator introduces explicit noise inputs to generate stochastic detail and employs mixing regularization to encourage styles to localize. The architecture also allows for the addition of noise at different layers, which affects different aspects of the generated images. The style-based generator improves the FID (Fréchet inception distance) significantly compared to traditional generators, indicating better quality and disentanglement. The generator also shows better linear separability, indicating a more disentangled representation of factors of variation. The paper also discusses the properties of the style-based generator, including style mixing, stochastic variation, and the separation of global effects from stochasticity. The generator's architecture allows for the control of global effects (e.g., pose, lighting) and stochastic variation (e.g., hair, skin pores) separately. The generator's intermediate latent space W is more disentangled than the input latent space Z, as shown by the perceptual path length and linear separability metrics. The paper also discusses the benefits of using a mapping network, which improves the generator's performance and disentanglement. The paper concludes that the traditional GAN generator architecture is inferior to a style-based design in terms of established quality metrics. The proposed style-based generator architecture provides better control over the synthesis process and improves the understanding and controllability of GAN synthesis. The paper also suggests that methods for directly shaping the intermediate latent space during training could provide interesting avenues for future work.This paper introduces a style-based generator architecture for generative adversarial networks (GANs), which improves image generation quality and enables better control over the synthesis process. The proposed architecture separates high-level attributes (e.g., pose, identity) from stochastic variation (e.g., freckles, hair) in generated images, allowing for intuitive, scale-specific control. The generator improves traditional distribution quality metrics, leads to better interpolation properties, and better disentangles latent factors of variation. Two new automated metrics, perceptual path length and linear separability, are proposed to quantify these aspects. A new, high-quality dataset of human faces (FFHQ) is also introduced. The style-based generator starts from a learned constant input and adjusts the "style" of the image at each convolution layer based on the latent code, directly controlling the strength of image features at different scales. Combined with noise injected directly into the network, this architectural change leads to automatic, unsupervised separation of high-level attributes from stochastic variation in the generated images. The generator embeds the input latent code into an intermediate latent space, which has a profound effect on how the factors of variation are represented in the network. The input latent space must follow the probability density of the training data, and we argue that this leads to some degree of unavoidable entanglement. Our intermediate latent space is free from that restriction and is therefore allowed to be disentangled. The generator introduces explicit noise inputs to generate stochastic detail and employs mixing regularization to encourage styles to localize. The architecture also allows for the addition of noise at different layers, which affects different aspects of the generated images. The style-based generator improves the FID (Fréchet inception distance) significantly compared to traditional generators, indicating better quality and disentanglement. The generator also shows better linear separability, indicating a more disentangled representation of factors of variation. The paper also discusses the properties of the style-based generator, including style mixing, stochastic variation, and the separation of global effects from stochasticity. The generator's architecture allows for the control of global effects (e.g., pose, lighting) and stochastic variation (e.g., hair, skin pores) separately. The generator's intermediate latent space W is more disentangled than the input latent space Z, as shown by the perceptual path length and linear separability metrics. The paper also discusses the benefits of using a mapping network, which improves the generator's performance and disentanglement. The paper concludes that the traditional GAN generator architecture is inferior to a style-based design in terms of established quality metrics. The proposed style-based generator architecture provides better control over the synthesis process and improves the understanding and controllability of GAN synthesis. The paper also suggests that methods for directly shaping the intermediate latent space during training could provide interesting avenues for future work.
Reach us at info@study.space