June 30, 1999 | Patrick R. Amestoy², Iain S. Duff³, Jean-Yves L'Excellent⁴ and Jacko Koster⁵
This paper presents a fully asynchronous multifrontal solver for solving large sparse linear systems on distributed memory computers. The solver, called MUMPS, is designed to handle a wide range of problems, including symmetric positive definite, general symmetric, and unsymmetric matrices, which may be rank deficient. The solver uses a multifrontal approach and dynamic distributed task scheduling to accommodate numerical pivoting and allow the migration of computational tasks to lightly loaded processors. Large computational tasks are divided into subtasks to enhance parallelism, and asynchronous communication is used throughout the solution process to efficiently overlap communication and computation.
The solver is tested on matrices provided by industrial partners in the PARASOL project, and the results are presented on a Cray SGI Origin 2000 and an IBM SP2. The solver is implemented in Fortran 90 and uses MPI for message passing, along with BLAS, LAPACK, BLACS, and ScaLAPACK subroutines. The solver is designed to handle both assembled and elemental input matrix formats, and it can determine the rank and null-space basis for rank-deficient matrices, as well as return a Schur complement matrix.
The solver uses a multifrontal approach, where all elimination operations take place within dense submatrices called frontal matrices. The solver uses an assembly tree to describe the factorization of the sparse matrix, with each node corresponding to the computation of a Schur complement. The solver uses dynamic scheduling strategies to distribute tasks among processors and to handle numerical pivoting. The solver also uses type 1, type 2, and type 3 parallelism to achieve high performance. Type 1 parallelism is based on tree parallelism, type 2 parallelism is based on block partitioning of rows, and type 3 parallelism is based on 2D block cyclic distribution of the root node.
The solver is tested on a variety of test problems, including those from computational fluid dynamics, structural mechanics, and other application areas. The results show that the solver achieves good performance on both assembled and elemental input formats, and that the use of nested dissection based orderings provides better performance than minimum degree based orderings. The solver is also shown to scale well in terms of memory usage, and the memory requirements are reduced by distributing the input matrix across processors. The solver is designed to handle large problems efficiently, and the results show that it can solve large sparse linear systems on distributed memory computers.This paper presents a fully asynchronous multifrontal solver for solving large sparse linear systems on distributed memory computers. The solver, called MUMPS, is designed to handle a wide range of problems, including symmetric positive definite, general symmetric, and unsymmetric matrices, which may be rank deficient. The solver uses a multifrontal approach and dynamic distributed task scheduling to accommodate numerical pivoting and allow the migration of computational tasks to lightly loaded processors. Large computational tasks are divided into subtasks to enhance parallelism, and asynchronous communication is used throughout the solution process to efficiently overlap communication and computation.
The solver is tested on matrices provided by industrial partners in the PARASOL project, and the results are presented on a Cray SGI Origin 2000 and an IBM SP2. The solver is implemented in Fortran 90 and uses MPI for message passing, along with BLAS, LAPACK, BLACS, and ScaLAPACK subroutines. The solver is designed to handle both assembled and elemental input matrix formats, and it can determine the rank and null-space basis for rank-deficient matrices, as well as return a Schur complement matrix.
The solver uses a multifrontal approach, where all elimination operations take place within dense submatrices called frontal matrices. The solver uses an assembly tree to describe the factorization of the sparse matrix, with each node corresponding to the computation of a Schur complement. The solver uses dynamic scheduling strategies to distribute tasks among processors and to handle numerical pivoting. The solver also uses type 1, type 2, and type 3 parallelism to achieve high performance. Type 1 parallelism is based on tree parallelism, type 2 parallelism is based on block partitioning of rows, and type 3 parallelism is based on 2D block cyclic distribution of the root node.
The solver is tested on a variety of test problems, including those from computational fluid dynamics, structural mechanics, and other application areas. The results show that the solver achieves good performance on both assembled and elemental input formats, and that the use of nested dissection based orderings provides better performance than minimum degree based orderings. The solver is also shown to scale well in terms of memory usage, and the memory requirements are reduced by distributing the input matrix across processors. The solver is designed to handle large problems efficiently, and the results show that it can solve large sparse linear systems on distributed memory computers.