# RCS (Nearest-Neighbor) Benchmark

## Getting Started

You need [`pyqrack`](https://pypi.org/project/pyqrack/) to run this notebook.

In [1]:
# For example, if your Jupyter installation uses pip:
# import sys
# !{sys.executable} -m pip install pyqrack

In the Python package itself, there should be an executable called `qrack_cl_precompile`. This "pre-compiles" the OpenCL "just-in-time" ("JIT") device program, for your system's accelerators. You might want to find and run this utility first, to avoid the need to "recompile" the OpenCL device program every time you first load Qrack into your environment.

## Configuration

In [2]:
# Random trial qubit width
qubit_width = 36
# Random trial circuit layer depth
layer_depth = 6
# Increment for SDRP
sdrp_inc = 0.125

In [3]:
import math
import numpy as np
import os
import random
import time

from pyqrack import QrackSimulator, QrackCircuit

# For more specific details about all available Qrack environment variables,
# See the C++ repository README: https://github.com/unitaryfund/qrack

# "NVIDIA GPU + Intel accelerator" settings are shown for purposes of
# explaining how heterogenous environments could be managed for Qrack,
# but you will likely see performance degradation compared to just NVIDIA,
# if you are using an Intel HD, (only for a common example).

# "Device ID" is the sequential index number output for each accelerator,
# in the "banner" whenever Qrack is loaded with GPU build options.
# This is the "device ID" for your primary or main (single) accelerator.
os.environ['QRACK_OCL_DEFAULT_DEVICE']='0'

# (NVIDIA GPU + Intel accelerator:)
# os.environ['QRACK_OCL_DEFAULT_DEVICE']='1'

# If you have multiple accelerators, "QUnitMulti" will attempt to distribute
# completely separable subsystems, when they arise, to multiple separate devices.
# Use this variable to input a comma-separated list of devices for "QUnitMulti."
os.environ['QRACK_QUNITMULTI_DEVICES']='0'

# (NVIDIA GPU + Intel accelerator:)
# os.environ['QRACK_QUNITMULTI_DEVICES']='1,0'
# os.environ['QRACK_QUNITMULTI_DEVICES']='1'

# If you have multiple accelerators, "QPager" can distribute (entangled) simulations
# across multiple equal-sized "pages" of state vector amplitudes.
# Use this variable to input a comma-separated list of device-to-"page" mappings.
os.environ['QRACK_QPAGER_DEVICES']='0'

# (NVIDIA GPU + Intel accelerator:)
# os.environ['QRACK_QPAGER_DEVICES']='4.1,12.0'
# os.environ['QRACK_QPAGER_DEVICES']='1'

# Some accelerators, like Intel integrated graphics, actually use general system RAM.
# In this case, OpenCL can be told to allocate on general "host" instead of "device" RAM.
# For each entry above in 'QRACK_QPAGER_DEVICES', below, "1" means "host," "0" means "device."
os.environ['QRACK_QPAGER_DEVICES_HOST_POINTER']='0'

# (NVIDIA GPU + Intel accelerator:)
# os.environ['QRACK_QPAGER_DEVICES']='4.0,12.1'

# This is the maximum qubit count you want to fit in a GPU "maximum allocation segment."
# (Your GPU probably has 4 such segments. You might want this 1 less than theoretical max,
# so that "memory fragmentation" doesn't prevent using more than 1 segment in total.)
os.environ['QRACK_MAX_PAGE_QB']='27'

# This is an overall allocation limit for your GPU(s), in megabytes.
# If you have multiple GPUs, you can list separate limits in device ID order,
# separated by a comma.
os.environ['QRACK_MAX_ALLOC_MB']='15872'

# (NVIDIA GPU + Intel accelerator:)
# os.environ['QRACK_MAX_ALLOC_MB']='23552,15872'

# This is the maximum total number of fully-entangled qubits you expect to achieve using QPager.
os.environ['QRACK_MAX_PAGING_QB']='30'

# This is the maximum total number of fully-entangled qubits you expect to fit in general RAM.
os.environ['QRACK_MAX_CPU_QB']='32'

# Above this threshold, "QTensorNetwork" restricts simulations to "past light cone."
# At or below the threshold, much more work can be reused.
os.environ['QRACK_QTENSORNETWORK_THRESHOLD_QB']='-1'

# These below are approximation options. (By default, Qrack simulates in the "ideal.")

# This is a number between "0" ("ideal") and "1" ("round to exactly Clifford") for near-Clifford rounding.
# os.environ['QRACK_NONCLIFFORD_ROUNDING_THRESHOLD']='1

# This is a number between "0" ("ideal") and "1" ("destroy all entanglement") for "SDRP,"
# "Schmidt decomposition rounding parameter". (https://arxiv.org/abs/2304.14969)
# os.environ['QRACK_QUNIT_SEPARABILITY_THRESHOLD']='0.6'

# This is a number between "0" ("ideal") and "1" ("combine all binary decision tree branches")
# that sets the allowable "epsilon" between "QBdt" branches to consider them equal.
# os.environ['QRACK_QBDT_SEPARABILITY_THRESHOLD']='0.0001'

## Run the Benchmark

In [4]:
_sqrt1_2 = 1 / np.sqrt(2)
_sqrtI = np.sqrt(1j)

def u(circ, q, th, ph, lm):
    circ.mtrx([np.cos(th / 2), -np.exp(1j * lm) * np.sin(th / 2), np.exp(1j * ph) * np.sin(th / 2), np.exp(1j * (ph + lm)) * np.cos(th / 2)], q)

def h(circ, q):
    circ.mtrx([_sqrt1_2, _sqrt1_2, _sqrt1_2, -_sqrt1_2], q)

def x(circ, q):
    circ.mtrx([0, 1, 1, 0], q)

def sqrtx(circ, q):
    circ.mtrx([(1 + 1j) / 2, (1 - 1j) / 2, (1 - 1j) / 2, (1 + 1j) / 2], q)

def y(circ, q):
    circ.mtrx([0, -1j, 1j, 0], q)

def sqrty(circ, q):
    circ.mtrx([(1 + 1j) / 2, -(1 + 1j) / 2, (1 + 1j) / 2, (1 + 1j) / 2], q)

def z(circ, q):
    circ.mtrx([1, 0, 0, -1], q)

def s(circ, q):
    circ.mtrx([1, 0, 0, 1j], q)

def t(circ, q):
    circ.mtrx([1, 0, 0, _sqrtI], q)

def rz(circ, q):
    circ.mtrx([1, 0, 0, np.exp(1j * random.uniform(0, 2 * np.pi))], q)

def cx(circ, q1, q2):
    circ.ucmtrx([q1], [0, 1, 1, 0], q2, 1)

def cy(circ, q1, q2):
    circ.ucmtrx([q1], [0, -1j, 1j, 0], q2, 1)

def cz(circ, q1, q2):
    circ.ucmtrx([q1], [1, 0, 0, -1], q2, 1)

def acx(circ, q1, q2):
    circ.ucmtrx([q1], [0, 1, 1, 0], q2, 0)

def acy(circ, q1, q2):
    circ.ucmtrx([q1], [0, -1j, 1j, 0], q2, 0)

def acz(circ, q1, q2):
    circ.ucmtrx([q1], [1, 0, 0, -1], q2, 0)

def swap(circ, q1, q2):
    circ.swap(q1, q2)

def nswap(circ, q1, q2):
    circ.ucmtrx([q1], [1, 0, 0, -1], q2, 0)
    circ.swap(q1, q2)
    circ.ucmtrx([q1], [1, 0, 0, -1], q2, 0)

def pswap(circ, q1, q2):
    circ.ucmtrx([q1], [1, 0, 0, -1], q2, 0)
    circ.swap(q1, q2)

def mswap(circ, q1, q2):
    circ.swap(q1, q2)
    circ.ucmtrx([q1], [1, 0, 0, -1], q2, 0)

def iswap(circ, q1, q2):
    circ.swap(q1, q2)
    circ.ucmtrx([q1], [1, 0, 0, -1], q2, 1)
    circ.mtrx([1, 0, 0, 1j], q1)
    circ.mtrx([1, 0, 0, 1j], q2)

def iiswap(circ, q1, q2):
    circ.mtrx([1, 0, 0, -1j], q2)
    circ.mtrx([1, 0, 0, -1j], q1)
    circ.ucmtrx([q1], [1, 0, 0, -1], q2, 1)
    circ.swap(q1, q2)

# This is a circuit with nearest-neighbor couplers that aims for high Haar-randomness.
def generate_random_circuit(n, l):
    n_range = range(n)
    l_range = range(l)

    # This is the coupler order deemed "hard" for Sycamore simulation in 2019:
    gateSequence = [ 0, 3, 2, 1, 2, 1, 0, 3 ]

    # Qubit layout is perfect square greater-than-or-equal to width,
    # with one partial remainder row if width is not a perfect square.
    row_len = math.ceil(math.sqrt(n))

    # Half (phased) swap, half singly-controlled
    two_bit_gates = swap, pswap, mswap, nswap, iswap, iiswap, cx, cy, cz, acx, acy, acz

    # Discrete:
    # one_bit_gates = h, x, y, (z, s, t)

    # Continuous (rz):
    one_bit_gates = h, (x, sqrtx), (y, sqrty), rz

    circ = QrackCircuit(False)

    for _ in l_range:
        # Single-qubit gates
        for i in n_range:
            # Random gate choice from a discrete set:
            g = random.choice(one_bit_gates)
            if type(g) is tuple:
                # The "gate choice" segments a Pauli axis, in binary choices
                for p in g:
                    # 50/50 chance of each (negative) power of 2 of axis period
                    if random.randint(0, 1) == 1:
                        p(circ, i)
            else:
                g(circ, i)
            
            # One Euler angle axis per gate:
            # h(circ, i)
            # rz(circ, i)

            # Three Euler angle axes per gate:
            # u(circ, i, random.uniform(0, 2 * math.pi), random.uniform(0, 2 * math.pi), random.uniform(0, 2 * math.pi))

        # 2-qubit couplers
        gate = gateSequence.pop(0)
        gateSequence.append(gate)
        for row in range(1, row_len, 2):
            for col in range(row_len):
                temp_row = row
                temp_col = col
                temp_row = temp_row + (1 if (gate & 2) else -1);
                temp_col = temp_col + (1 if (gate & 1) else 0)

                if (temp_row < 0) or (temp_col < 0) or (temp_row >= row_len) or (temp_col >= row_len):
                    # Row and/or column selected were out of range
                    continue

                b1 = row * row_len + col
                b2 = temp_row * row_len + temp_col

                if (b1 >= n) or (b2 >= n):
                    # Bits selected were out-of-range
                    continue

                # Swap bits, 50% of the time
                if random.randint(0, 1) == 1:
                    temp = b1
                    b1 = b2
                    b2 = temp
                
                g = random.choice(two_bit_gates)
                g(circ, b1, b2)

    # You can see the circuit definition and load it from this file name:
    circ.out_to_file('marp_circuit.qc')

    return circ

def bench_qrack(n, l):
    circ = generate_random_circuit(n, l)
    
    sdrp = 1
    found_marp = False
    result = (0, 0, 1)

    while not found_marp:
        # This is a number between "0" ("ideal") and "1" ("destroy all entanglement") for "SDRP,"
        # "Schmidt decomposition rounding parameter". (https://arxiv.org/abs/2304.14969)
        if (sdrp > 1e-6):
            os.environ['QRACK_QUNIT_SEPARABILITY_THRESHOLD']=str(sdrp)
        else:
            del os.environ['QRACK_QUNIT_SEPARABILITY_THRESHOLD']
            sdrp = 0
            found_marp = True

        start = time.perf_counter()
        # SDRP is fixed at the point the simulator instance is created.
        sim = QrackSimulator(n)
        try:
            circ.run(sim)
            fidelity = sim.get_unitary_fidelity()
            sim.m_all()
            result = (time.perf_counter() - start, fidelity, sdrp)
            print(result[0], " seconds,", result[1], "out of 1.0 fidelity, SDRP level", result[2])
        except:
            found_marp = True

        sdrp = sdrp - sdrp_inc

    return result

print(qubit_width, "qubits by", layer_depth, "circuit layers:")
start = time.perf_counter()
result = bench_qrack(qubit_width, layer_depth)
marp_time = time.perf_counter() - start

print("(MARP search took ", marp_time, " seconds overall.)")

36 qubits by 6 circuit layers:
Device #0, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_NVIDIA_GeForce_RTX_3080_Laptop_GPU.ir
0.12324832300146227  seconds, 2.2695399521257574e-09 out of 1.0 fidelity, SDRP level 1
5.456846979999682  seconds, 0.5919876204634614 out of 1.0 fidelity, SDRP level 0.875
5.497274675999506  seconds, 0.5919875851782169 out of 1.0 fidelity, SDRP level 0.75
5.563566701999662  seconds, 0.939132237615817 out of 1.0 fidelity, SDRP level 0.625
5.534209436998935  seconds, 0.9391307822229612 out of 1.0 fidelity, SDRP level 0.5
5.5322521699999925  seconds, 0.9391317898026963 out of 1.0 fidelity, SDRP level 0.375
5.568638699001895  seconds, 0.9391307822221338 out of 1.0 fidelity, SDRP level 0.25
5.5583107710008335  seconds, 0.939129886598268 out of 1.0 fidelity, SDRP level 0.125
Default platform: NVIDIA CUDA
Default device: #0, NVIDIA GeForce RTX 3080 Laptop GPU
OpenCL device #0: NVIDIA GeForce RTX 3080 Laptop GPU
Cannot instantiate a QPager with greater capacity th