Fisher Flow Matching for Generative Modeling over Discrete Data

Fisher Flow Matching for Generative Modeling over Discrete Data

28 May 2024 | Oscar Davis, Samuel Kessler, Mircea Petrache, İsmail İlkan Ceylan, Michael Bronstein, Avishay Joey Bose
FISHER-FLOW is a novel flow-matching model for discrete data that leverages the Fisher-Rao metric to enable continuous reparameterization of categorical distributions on a statistical manifold. By considering discrete data as points on a statistical manifold equipped with the Fisher-Rao metric, FISHER-FLOW allows for principled mapping between source and target distributions through geodesics on the positive orthant of a d-hypersphere. This approach enables more flexible and numerically stable learning of vector fields, as well as the use of the Euclidean metric on the sphere, leading to improved training dynamics and performance. The model is shown to be optimal in reducing forward KL divergence and outperforms prior diffusion and flow-matching models on synthetic and real-world benchmarks, including DNA promoter and enhancer sequence design. FISHER-FLOW also benefits from Riemannian optimal transport, which leads to straighter flows and lower variance in training. Theoretical analysis confirms that the gradient flow induced by FISHER-FLOW is optimal for matching categorical distributions on the probability simplex. Empirical results demonstrate that FISHER-FLOW achieves improved performance over existing methods in both synthetic and biological sequence design tasks.FISHER-FLOW is a novel flow-matching model for discrete data that leverages the Fisher-Rao metric to enable continuous reparameterization of categorical distributions on a statistical manifold. By considering discrete data as points on a statistical manifold equipped with the Fisher-Rao metric, FISHER-FLOW allows for principled mapping between source and target distributions through geodesics on the positive orthant of a d-hypersphere. This approach enables more flexible and numerically stable learning of vector fields, as well as the use of the Euclidean metric on the sphere, leading to improved training dynamics and performance. The model is shown to be optimal in reducing forward KL divergence and outperforms prior diffusion and flow-matching models on synthetic and real-world benchmarks, including DNA promoter and enhancer sequence design. FISHER-FLOW also benefits from Riemannian optimal transport, which leads to straighter flows and lower variance in training. Theoretical analysis confirms that the gradient flow induced by FISHER-FLOW is optimal for matching categorical distributions on the probability simplex. Empirical results demonstrate that FISHER-FLOW achieves improved performance over existing methods in both synthetic and biological sequence design tasks.
Reach us at info@study.space