12 Mar 2024 | Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, and Shijian Lu
StyleGaussian is a novel 3D style transfer method that enables instant style transfer of any image to a 3D scene at 10 frames per second (fps). It leverages 3D Gaussian Splatting (3DGS) to achieve real-time rendering and strict multi-view consistency. The method consists of three steps: embedding, transfer, and decoding. In the embedding step, 2D VGG features are embedded into reconstructed 3D Gaussians. In the transfer step, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into stylized RGB. StyleGaussian introduces two novel designs: an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features, and a K-nearest-neighbor-based 3D CNN that decodes the stylized features without compromising multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. The method allows zero-shot 3D style transfer without requiring any optimization for new style images. StyleGaussian is evaluated on two real-world scene datasets and compared with state-of-the-art methods, demonstrating superior performance in terms of consistency, transfer time, and rendering speed. The method is also effective in style interpolation and decoder initialization, enabling efficient style transfer for new scenes. The key contributions of this work include the introduction of StyleGaussian, an efficient feature rendering strategy, and a K-nearest-neighbor-based 3D CNN for decoding. The method has potential applications in augmented reality, virtual reality, video games, and film production.StyleGaussian is a novel 3D style transfer method that enables instant style transfer of any image to a 3D scene at 10 frames per second (fps). It leverages 3D Gaussian Splatting (3DGS) to achieve real-time rendering and strict multi-view consistency. The method consists of three steps: embedding, transfer, and decoding. In the embedding step, 2D VGG features are embedded into reconstructed 3D Gaussians. In the transfer step, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into stylized RGB. StyleGaussian introduces two novel designs: an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features, and a K-nearest-neighbor-based 3D CNN that decodes the stylized features without compromising multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. The method allows zero-shot 3D style transfer without requiring any optimization for new style images. StyleGaussian is evaluated on two real-world scene datasets and compared with state-of-the-art methods, demonstrating superior performance in terms of consistency, transfer time, and rendering speed. The method is also effective in style interpolation and decoder initialization, enabling efficient style transfer for new scenes. The key contributions of this work include the introduction of StyleGaussian, an efficient feature rendering strategy, and a K-nearest-neighbor-based 3D CNN for decoding. The method has potential applications in augmented reality, virtual reality, video games, and film production.