[slides] Towards Scalable and Versatile Weight Space Learning

The paper introduces SANE (Sequential Autoencoder for Neural Embeddings), a method for learning task-agnostic representations of neural network weight spaces. SANE overcomes previous limitations by enabling the embedding of larger neural networks of varying architectures into a learned representation space. It extends the concept of hyper-representations by sequentially processing subsets of neural network weights, allowing for the embedding of large networks as a set of tokens. SANE reveals global model information from layer-wise embeddings and can generate unseen neural network models, which was not possible with previous hyper-representation learning methods. Extensive empirical evaluations demonstrate that SANE matches or exceeds state-of-the-art performance on several weight representation learning benchmarks, particularly in initialization for new tasks and larger ResNet architectures. The method is evaluated on both discriminative and generative downstream tasks, showing superior performance in both areas. SANE's ability to scale to larger models and generalize to new architectures and tasks makes it a valuable tool for enhancing the performance and interpretability of machine learning models.The paper introduces SANE (Sequential Autoencoder for Neural Embeddings), a method for learning task-agnostic representations of neural network weight spaces. SANE overcomes previous limitations by enabling the embedding of larger neural networks of varying architectures into a learned representation space. It extends the concept of hyper-representations by sequentially processing subsets of neural network weights, allowing for the embedding of large networks as a set of tokens. SANE reveals global model information from layer-wise embeddings and can generate unseen neural network models, which was not possible with previous hyper-representation learning methods. Extensive empirical evaluations demonstrate that SANE matches or exceeds state-of-the-art performance on several weight representation learning benchmarks, particularly in initialization for new tasks and larger ResNet architectures. The method is evaluated on both discriminative and generative downstream tasks, showing superior performance in both areas. SANE's ability to scale to larger models and generalize to new architectures and tasks makes it a valuable tool for enhancing the performance and interpretability of machine learning models.

Towards Scalable and Versatile Weight Space Learning

2024 | Konstantin Schürholt, Michael W. Mahoney, Damian Borth