July 27, 1998 | P. R. Amestoy, I. S. Duff and J.-Y. L'Excellent
This paper presents a parallel distributed memory multifrontal approach for solving symmetric and unsymmetric sparse linear systems. The method uses a parallel asynchronous algorithm with dynamic scheduling to efficiently handle numerical pivoting. The authors compare the performance of LDL^T and LU factorizations, showing that the method is efficient and effective on an IBM SP2 system. The test problems used are from the Rutherford-Boeing collection and from PARASOL end users. The main implementation issues include mapping, sources of parallelism, and the difference between LU and LDL^T factorizations. The paper discusses the use of dynamic scheduling for symmetric systems and the impact of parallelism on memory and work balance. The authors also describe the implementation of type 2 and type 3 parallelism, and the performance of the algorithm on different test problems. The results show that the method achieves good speed-up and is suitable for solving large sparse systems on distributed memory systems. The paper concludes that the current version of the MUMPS code performs well and provides comparable speed-ups to shared memory variants, although scalability is affected by memory effects on small numbers of processors.This paper presents a parallel distributed memory multifrontal approach for solving symmetric and unsymmetric sparse linear systems. The method uses a parallel asynchronous algorithm with dynamic scheduling to efficiently handle numerical pivoting. The authors compare the performance of LDL^T and LU factorizations, showing that the method is efficient and effective on an IBM SP2 system. The test problems used are from the Rutherford-Boeing collection and from PARASOL end users. The main implementation issues include mapping, sources of parallelism, and the difference between LU and LDL^T factorizations. The paper discusses the use of dynamic scheduling for symmetric systems and the impact of parallelism on memory and work balance. The authors also describe the implementation of type 2 and type 3 parallelism, and the performance of the algorithm on different test problems. The results show that the method achieves good speed-up and is suitable for solving large sparse systems on distributed memory systems. The paper concludes that the current version of the MUMPS code performs well and provides comparable speed-ups to shared memory variants, although scalability is affected by memory effects on small numbers of processors.