Benchmarking Quantum Computer Simulation Software Packages: State Vector Simulators

Benchmarking Quantum Computer Simulation Software Packages: State Vector Simulators

6 Jul 2024 | Amit Jamadagni, Andreas M. Läuchli, and Cornelius Hempel
This Technical Review benchmarks several quantum computing simulation packages with a focus on their High Performance Computing (HPC) capabilities. The goal is to evaluate the performance and system size-scaling of three paradigmatic quantum computing tasks: gate-based simulation of Heisenberg spin dynamics, random circuit sampling, and quantum Fourier transform (QFT). The authors developed a containerized toolchain to benchmark a large set of simulation packages on a local HPC cluster, using different parallelization capabilities. The results help identify the most suitable package for a given simulation task and lay the foundation for a systematic community effort to benchmark and validate upcoming versions of existing and newly developed simulation packages. The simulation packages differ significantly in their computational performance, with differences of more than two orders of magnitude at both small and large problem sizes, even when run on the same hardware. Hardware accelerations, such as multithreading and GPU usage, yield substantial performance improvements. However, at problem sizes between 25 qubits (CPUs) and 30 qubits (GPUs), all evaluated packages cross over into exponential scaling behavior, with significant differences in pre-factors. The authors evaluated various simulation packages, including statevector-based simulators, density matrix formalism, tensor networks, and Clifford algebra-based simulators. They also considered packages tailored to specific hardware architectures or application domains. The selected packages were integrated into a containerized toolchain workflow, ensuring performance evaluation on equal footing, as well as extensibility, reproducibility, and ease of maintenance. The performance evaluation procedure involved translating the high-level OpenQASM instruction set of a given quantum algorithm to the specific instruction set of the chosen software package. The translation process allows for the introduction of auxiliary function calls that capture the resource consumption of computations. The wall-clock time was used as a measure of performance, and the results were compared across different packages and hardware configurations. The results show that the performance of the packages varies significantly, with some packages performing better in the large-N limit than others. The single-thread performance was evaluated for different precision settings, and the results showed that the wall-clock time can differ by up to a factor of 1000 for system sizes around N = 26 qubits. The performance of the packages was also evaluated under different hardware architectures, including multithreading and GPU-based computations. The results showed that the performance of the packages can vary significantly depending on the task and system size. The authors also performed cross-validation of the results by comparing the precision settings of the different packages and validating the quality of the solution. The results showed that the packages agree with the expected result of 0, validating the entire toolchain for the given task. The results also showed that some packages, such as QCGPU, may have issues with certain system sizes, failing to produce results without giving any error message. The limitations of the benchmarked quantum simulators were classified as design limited, time limited, and memory limited. Design limitationsThis Technical Review benchmarks several quantum computing simulation packages with a focus on their High Performance Computing (HPC) capabilities. The goal is to evaluate the performance and system size-scaling of three paradigmatic quantum computing tasks: gate-based simulation of Heisenberg spin dynamics, random circuit sampling, and quantum Fourier transform (QFT). The authors developed a containerized toolchain to benchmark a large set of simulation packages on a local HPC cluster, using different parallelization capabilities. The results help identify the most suitable package for a given simulation task and lay the foundation for a systematic community effort to benchmark and validate upcoming versions of existing and newly developed simulation packages. The simulation packages differ significantly in their computational performance, with differences of more than two orders of magnitude at both small and large problem sizes, even when run on the same hardware. Hardware accelerations, such as multithreading and GPU usage, yield substantial performance improvements. However, at problem sizes between 25 qubits (CPUs) and 30 qubits (GPUs), all evaluated packages cross over into exponential scaling behavior, with significant differences in pre-factors. The authors evaluated various simulation packages, including statevector-based simulators, density matrix formalism, tensor networks, and Clifford algebra-based simulators. They also considered packages tailored to specific hardware architectures or application domains. The selected packages were integrated into a containerized toolchain workflow, ensuring performance evaluation on equal footing, as well as extensibility, reproducibility, and ease of maintenance. The performance evaluation procedure involved translating the high-level OpenQASM instruction set of a given quantum algorithm to the specific instruction set of the chosen software package. The translation process allows for the introduction of auxiliary function calls that capture the resource consumption of computations. The wall-clock time was used as a measure of performance, and the results were compared across different packages and hardware configurations. The results show that the performance of the packages varies significantly, with some packages performing better in the large-N limit than others. The single-thread performance was evaluated for different precision settings, and the results showed that the wall-clock time can differ by up to a factor of 1000 for system sizes around N = 26 qubits. The performance of the packages was also evaluated under different hardware architectures, including multithreading and GPU-based computations. The results showed that the performance of the packages can vary significantly depending on the task and system size. The authors also performed cross-validation of the results by comparing the precision settings of the different packages and validating the quality of the solution. The results showed that the packages agree with the expected result of 0, validating the entire toolchain for the given task. The results also showed that some packages, such as QCGPU, may have issues with certain system sizes, failing to produce results without giving any error message. The limitations of the benchmarked quantum simulators were classified as design limited, time limited, and memory limited. Design limitations
Reach us at info@study.space