2024 | Konstantin Schürholt, Michael W. Mahoney, Damian Borth
This paper introduces SANE, a method for learning task-agnostic representations of neural network (NN) weight spaces. SANE enables the embedding of individual NN models into a latent space, allowing for both discriminative and generative downstream tasks. Unlike previous methods that were limited to specific tasks or model sizes, SANE can scale to larger models and generalize to different architectures. It achieves this by sequentially processing subsets of neural network weights, enabling the embedding of larger models as a set of tokens into the learned representation space. SANE reveals global model information from layer-wise embeddings and can sequentially generate unseen neural network models, which was previously unattainable with hyper-representation learning methods. Empirical evaluations show that SANE matches or exceeds state-of-the-art performance on several weight representation learning benchmarks, particularly in initialization for new tasks and larger ResNet architectures. SANE also demonstrates the ability to sample models with few prompt examples, achieving higher performance and generalizing to larger models and new architectures. The method is scalable, allowing for the representation of models of varying sizes and architectures. SANE's approach to model alignment, haloing, and batch-norm conditioning further enhances its effectiveness. The paper also discusses the limitations of SANE, including the requirement for prompt examples and the focus on computer vision tasks. Overall, SANE provides a versatile and scalable approach to weight space learning, with potential applications in both academic research and industry.This paper introduces SANE, a method for learning task-agnostic representations of neural network (NN) weight spaces. SANE enables the embedding of individual NN models into a latent space, allowing for both discriminative and generative downstream tasks. Unlike previous methods that were limited to specific tasks or model sizes, SANE can scale to larger models and generalize to different architectures. It achieves this by sequentially processing subsets of neural network weights, enabling the embedding of larger models as a set of tokens into the learned representation space. SANE reveals global model information from layer-wise embeddings and can sequentially generate unseen neural network models, which was previously unattainable with hyper-representation learning methods. Empirical evaluations show that SANE matches or exceeds state-of-the-art performance on several weight representation learning benchmarks, particularly in initialization for new tasks and larger ResNet architectures. SANE also demonstrates the ability to sample models with few prompt examples, achieving higher performance and generalizing to larger models and new architectures. The method is scalable, allowing for the representation of models of varying sizes and architectures. SANE's approach to model alignment, haloing, and batch-norm conditioning further enhances its effectiveness. The paper also discusses the limitations of SANE, including the requirement for prompt examples and the focus on computer vision tasks. Overall, SANE provides a versatile and scalable approach to weight space learning, with potential applications in both academic research and industry.