[slides and audio] Mixture of Experts Soften the Curse of Dimensionality in Operator Learning

This paper introduces a mixture of neural operators (MoNOs) to address the curse of dimensionality in operator learning. MoNOs distribute the complexity of approximating non-linear operators across a network of expert neural operators (NOs), each satisfying parameter scaling restrictions. The main result is a distributed universal approximation theorem, showing that any Lipschitz non-linear operator between \(L^2([0,1]^d)\) spaces can be approximated uniformly over the Sobolev unit ball with accuracy \(\varepsilon\) using an MoNO, where each expert NO has a depth, width, and rank of \(\mathcal{O}(\varepsilon^{-1})\). This approach reduces the total number of parameters required for high-dimensional approximation while keeping the complexity of each individual NO manageable. The paper also derives new quantitative expression rates for classical NOs approximating uniformly continuous non-linear operators on compact subsets of \(L^2([0,1]^d)\). The analysis is motivated by operator learning for inverse problems, where the target operator is often uniformly continuous but has sub-Hölderian moduli of continuity, making it challenging to achieve feasible approximation rates without substantial restrictions on the domain of approximation. The proposed MoNO model leverages a tree structure to route inputs to the most appropriate expert, ensuring efficient and accurate approximation.This paper introduces a mixture of neural operators (MoNOs) to address the curse of dimensionality in operator learning. MoNOs distribute the complexity of approximating non-linear operators across a network of expert neural operators (NOs), each satisfying parameter scaling restrictions. The main result is a distributed universal approximation theorem, showing that any Lipschitz non-linear operator between \(L^2([0,1]^d)\) spaces can be approximated uniformly over the Sobolev unit ball with accuracy \(\varepsilon\) using an MoNO, where each expert NO has a depth, width, and rank of \(\mathcal{O}(\varepsilon^{-1})\). This approach reduces the total number of parameters required for high-dimensional approximation while keeping the complexity of each individual NO manageable. The paper also derives new quantitative expression rates for classical NOs approximating uniformly continuous non-linear operators on compact subsets of \(L^2([0,1]^d)\). The analysis is motivated by operator learning for inverse problems, where the target operator is often uniformly continuous but has sub-Hölderian moduli of continuity, making it challenging to achieve feasible approximation rates without substantial restrictions on the domain of approximation. The proposed MoNO model leverages a tree structure to route inputs to the most appropriate expert, ensuring efficient and accurate approximation.

Mixture of Experts Soften the Curse of Dimensionality in Operator Learning

13 Apr 2024 | Anastasis Kratsios, Takashi Furuya, Antonio Lara, Matti Lassas, Maarten de Hoop