18 March 2024 | Sebastian Kokott, Florian Merz, Yi Yao, Christian Carbogno, Mariana Rossi, Ville Havu, Markus Rampp, Matthias Scheffler, and Volker Blum
This paper presents a significantly optimized implementation of the exact exchange (EXX) evaluation for hybrid density functionals (DFAs) in the all-electron code FHI-aims, enabling efficient large-scale simulations with over 10,000 atoms. The optimization focuses on improving memory efficiency, performance, and scalability for both non-periodic and periodic boundary conditions. Key improvements include the introduction of new MPI parallelization layers and shared memory arrays according to the MPI-3 standard, which enhance workload distribution and reduce memory consumption. The implementation also optimizes other parts of the code, such as the computation of the Hartree potential and the evaluation of forces and stresses. The optimized code allows hybrid DFAs to be applied to systems with up to 30,576 atoms, including hybrid organic-inorganic perovskites, organic crystals, and ice crystals. The paper demonstrates the performance and scaling of hybrid DFA simulations for a wide range of chemical systems, showing significant improvements in computational efficiency and scalability. The implementation is based on the localized resolution-of-identity (RI) approach, which allows for efficient evaluation of the EXX contribution by exploiting the sparsity of the density matrix. The paper also describes the algorithm and its improvements, including the use of Fock matrix blocks and auto-tuning mechanisms to optimize performance. The results show that the optimized implementation achieves nearly perfect linear scaling for large systems, with significant improvements in both strong and weak scaling behavior. The paper also includes benchmark results for various systems, demonstrating the effectiveness of the optimized implementation for large-scale simulations.This paper presents a significantly optimized implementation of the exact exchange (EXX) evaluation for hybrid density functionals (DFAs) in the all-electron code FHI-aims, enabling efficient large-scale simulations with over 10,000 atoms. The optimization focuses on improving memory efficiency, performance, and scalability for both non-periodic and periodic boundary conditions. Key improvements include the introduction of new MPI parallelization layers and shared memory arrays according to the MPI-3 standard, which enhance workload distribution and reduce memory consumption. The implementation also optimizes other parts of the code, such as the computation of the Hartree potential and the evaluation of forces and stresses. The optimized code allows hybrid DFAs to be applied to systems with up to 30,576 atoms, including hybrid organic-inorganic perovskites, organic crystals, and ice crystals. The paper demonstrates the performance and scaling of hybrid DFA simulations for a wide range of chemical systems, showing significant improvements in computational efficiency and scalability. The implementation is based on the localized resolution-of-identity (RI) approach, which allows for efficient evaluation of the EXX contribution by exploiting the sparsity of the density matrix. The paper also describes the algorithm and its improvements, including the use of Fock matrix blocks and auto-tuning mechanisms to optimize performance. The results show that the optimized implementation achieves nearly perfect linear scaling for large systems, with significant improvements in both strong and weak scaling behavior. The paper also includes benchmark results for various systems, demonstrating the effectiveness of the optimized implementation for large-scale simulations.