Update performance text

vm6502q · Oct 25, 2022 · 8b041d2 · 8b041d2
1 parent 42ce3f6
commit 8b041d2
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/docs/performance.rst b/docs/performance.rst
@@ -37,9 +37,9 @@ Method
 
 This performance document is meant to be a simple, to-the-point, and preliminary digest of these results. These results were prepared with the generous financial support of the Unitary Fund. Our benchmark code is public, largely self-explanatory, and easily reproducible.
 
-100 timed trials of single and parallel gates were run for each qubit count between 4 and 27 qubits. Three tests were performed: the quantum Fourier transform, ("QFT"), random circuits constructed from a universal gate set, and an idealized approximation of Google's Sycamore chip benchmark, as per [Sycamore]_. Additionally, parallel single qubits and random input CCX gates to 20 layer depth were chosen to highlight use cases Qrack is particularly well-suited for. The benchmarking code is available at `https://github.com/vm6502q/simulator-benchmarks <https://github.com/vm6502q/simulator-benchmarks>`_. Default build and runtime options were used for all candidates. **Notably, this means Qrack ran at single floating point accuracy whereas QCGPU and Qiskit ran at double floating point accuracy.**
+100 timed trials of single and parallel gates were run for each qubit count between 4 and 27 qubits. 27 qubits was chosen as the ceiling, as this is the maximum number of qubits attainable on the hardware platform GPU before coordination between more than one maximum allocation segment is necessary in OpenCL. Qrack can coordinate between multiple maximum allocation segments; QCGPU does not coordinate across maximum allocation segments, to last inspection of its open source code, but it might allow (potentially unreliable) higher allocation requests than maximum single allocation segment; Qiskit does not use OpenCL, where technologies like CUDA might abstract the virtual memory space over multiple maximum allocation segments. Three tests were performed: the quantum Fourier transform, ("QFT"), random circuits constructed from a universal gate set, and an idealized approximation of Google's Sycamore chip benchmark, as per [Sycamore]_. Additionally, parallel single qubits and random input CCX gates to 20 layer depth were chosen to highlight use cases Qrack is particularly well-suited for. The benchmarking code is available at `https://github.com/vm6502q/simulator-benchmarks <https://github.com/vm6502q/simulator-benchmarks>`_. Default build and runtime options were used for all candidates. **Notably, this means Qrack ran at single floating point accuracy whereas QCGPU and Qiskit ran at double floating point accuracy.**
 
-The same Alienware 17 laptop device was used for all benchmarks, (BIOS version 1.16.1, Ubuntu 20.04 LTS, Linux kernel version 5.15.0-52-generic, Intel(R) Core(TM) i9-10980HK CPU @ 2.40GHz, NVIDIA GeForce RTX 2070 Super). Benchmarks for the QFT, random universal circuits, and idealized Sycamore circuits were collected on October 24, 2022. (All other charts from data collected earlier are included for qualitative argument.)
+The same Alienware 17 laptop device was used for all benchmarks, (BIOS version 1.16.1, Ubuntu 20.04 LTS, Linux kernel version 5.15.0-52-generic, Intel(R) Core(TM) i9-10980HK CPU @ 2.40GHz, NVIDIA GeForce RTX 2070 Super). Benchmarks for the QFT, random universal circuits, and idealized Sycamore circuits were collected on October 24, 2022. All other charts from data collected earlier are included for qualitative argument.
 
 Comparative benchmarks included QCGPU, the Qiskit-Aer GPU simulator, and Qrack's default typically optimal "stack" of a "QUnit" layer on top of "QStabilizerHybrid," on top of "QPager," on top of a new Pauli gate fusion layer, on top of "QHybrid." All of these candidates are GPU-based, though Qrack "hybridizes" with CPU based simulation as appropriate to improve performance.
 
@@ -58,7 +58,7 @@ The "quantum" (or "discrete") Fourier transform (QFT/DFT) is a realistic and imp
 
 .. image:: performance/qft.png
 
-Similarly, on random universal circuits, defined above and in the benchmark repository, Qrack leads at low qubit widths, (compared to anticipated hardware).
+Similarly, on random universal circuits, defined above and in the benchmark repository, Qrack leads decisively at low qubit widths, with a modest lead maintained up to the 27 qubit mark.
 
 .. image:: performance/random_universal.png
 
@@ -81,7 +81,7 @@ Discussion
 
 Qrack::QUnit succeeds as a novel and fundamentally improved quantum simulation algorithm, over the naive Schrödinger algorithm in special cases. Primarily, QUnit does this by representing its state vector in terms of decomposed subsystems, as well as buffering and commuting Pauli X and Y basis transformations and singly-controlled gates. On user and internal probability checks, QUnit will attempt to separate the representations of independent subsystems by Schmidt decomposition. Further, Qrack will avoid applying phase effects that make no difference to the expectation values of any Hermitian operators, (no difference to "physical observables"). For each bit whose representation is separated this way, we recover a factor of close to or exactly 1/2 the subsystem RAM and gate execution time.
 
-Qrack::QPager, recently, gives several major advantages with or without a Qrack::QUnit layer on top. It usually allows 2 greater maximum qubit width allocation on the same 4-segment GPU RAM store, and it performs surprisingly well for execution speed at high qubit widths. It can also utilize larger system general RAM heap stores than what is available just as GPU RAM.
+Qrack::QPager, recently, gives several major advantages with or without a Qrack::QUnit layer on top. It usually allows 2 greater maximum qubit width allocation on the same 4-segment GPU RAM store, and it performs surprisingly well for execution speed at high qubit widths. It can also utilize larger system general RAM heap stores than what is available just as GPU RAM, like with the Intel HD. QPager also allows homogeneous and heterogeneous multi-device simulation cases, including virtualization across clusters with VirtualCL.
 
 Qrack maintains a low-width edge over other GPU simulations by "hybridizing" CPU simulation with GPU simulation. Below system-responsive default thresholds, Qrack is simulating via CPU only, with a transparent transition to GPU simulation (and then "paged" GPU simulation) as qubit width is increased.