# Circuit Knitting

## Getting Started

You need [`pyqrack`](https://pypi.org/project/pyqrack/) to run this notebook.

In [1]:
# For example, if your Jupyter installation uses pip:
# import sys
# !{sys.executable} -m pip install pyqrack

In the Python package itself, there should be an executable called `qrack_cl_precompile`. This "pre-compiles" the OpenCL "just-in-time" ("JIT") device program, for your system's accelerators. You might want to find and run this utility first, to avoid the need to "recompile" the OpenCL device program every time you first load Qrack into your environment.

## Configuration

In [2]:
# Qubit width
width = 16

In [3]:
import math
import os
import random
import time

# For more specific details about all available Qrack environment variables,
# See the C++ repository README: https://github.com/unitaryfund/qrack

# "NVIDIA GPU + Intel accelerator" settings are shown for purposes of
# explaining how heterogenous environments could be managed for Qrack,
# but you will likely see performance degradation compared to just NVIDIA,
# if you are using an Intel HD, (only for a common example).

# "Device ID" is the sequential index number output for each accelerator,
# in the "banner" whenever Qrack is loaded with GPU build options.
# This is the "device ID" for your primary or main (single) accelerator.
os.environ['QRACK_OCL_DEFAULT_DEVICE']='0'

# (NVIDIA GPU + Intel accelerator:)
# os.environ['QRACK_OCL_DEFAULT_DEVICE']='1'

# If you have multiple accelerators, "QUnitMulti" will attempt to distribute
# completely separable subsystems, when they arise, to multiple separate devices.
# Use this variable to input a comma-separated list of devices for "QUnitMulti."
os.environ['QRACK_QUNITMULTI_DEVICES']='0'

# (NVIDIA GPU + Intel accelerator:)
# os.environ['QRACK_QUNITMULTI_DEVICES']='1,0'
# os.environ['QRACK_QUNITMULTI_DEVICES']='1'

# If you have multiple accelerators, "QPager" can distribute (entangled) simulations
# across multiple equal-sized "pages" of state vector amplitudes.
# Use this variable to input a comma-separated list of device-to-"page" mappings.
os.environ['QRACK_QPAGER_DEVICES']='0'

# (NVIDIA GPU + Intel accelerator:)
# os.environ['QRACK_QPAGER_DEVICES']='4.1,12.0'
# os.environ['QRACK_QPAGER_DEVICES']='1'

# Some accelerators, like Intel integrated graphics, actually use general system RAM.
# In this case, OpenCL can be told to allocate on general "host" instead of "device" RAM.
# For each entry above in 'QRACK_QPAGER_DEVICES', below, "1" means "host," "0" means "device."
os.environ['QRACK_QPAGER_DEVICES_HOST_POINTER']='0'

# (NVIDIA GPU + Intel accelerator:)
# os.environ['QRACK_QPAGER_DEVICES']='4.0,12.1'

# This is the maximum qubit count you want to fit in a GPU "maximum allocation segment."
# (Your GPU probably has 4 such segments. You might want this 1 less than theoretical max,
# so that "memory fragmentation" doesn't prevent using more than 1 segment in total.)
os.environ['QRACK_MAX_PAGE_QB']='26'

# This is an overall allocation limit for your GPU(s), in megabytes.
# If you have multiple GPUs, you can list separate limits in device ID order,
# separated by a comma.
os.environ['QRACK_MAX_ALLOC_MB']='7900'

# (NVIDIA GPU + Intel accelerator:)
# os.environ['QRACK_MAX_ALLOC_MB']='23552,15872'

# This is the maximum total number of fully-entangled qubits you expect to achieve using QPager.
os.environ['QRACK_MAX_PAGING_QB']='29'

# This is the maximum total number of fully-entangled qubits you expect to fit in general RAM.
os.environ['QRACK_MAX_CPU_QB']='31'

# Above this threshold, "QTensorNetwork" restricts simulations to "past light cone."
# At or below the threshold, much more work can be reused.
os.environ['QRACK_QTENSORNETWORK_THRESHOLD_QB']='29'

# These below are approximation options. (By default, Qrack simulates in the "ideal.")

# This is a number between "0" ("ideal") and "1" ("destroy all entanglement") for "SDRP,"
# "Schmidt decomposition rounding parameter". (https://arxiv.org/abs/2304.14969)
# os.environ['QRACK_QUNIT_SEPARABILITY_THRESHOLD']='0.2'

# This is a number between "0" ("ideal") and "1" ("combine all binary decision tree branches")
# that sets the allowable "epsilon" between "QBdt" branches to consider them equal.
# os.environ['QRACK_QBDT_SEPARABILITY_THRESHOLD']='0.0001'

## Run the Benchmark

In [4]:
import numpy as np
from qiskit import QuantumCircuit
from qiskit import execute, Aer
from qiskit.providers.qrack import QasmSimulator, Sampler
from qiskit.quantum_info import PauliList
from circuit_knitting.cutting import partition_problem
from circuit_knitting.cutting import generate_cutting_experiments
from circuit_knitting.cutting import reconstruct_expectation_values

from qiskit.circuit.library import EfficientSU2

width = 4
subset_a = width >> 1
subset_b = width - (width >> 1)

circ = EfficientSU2(width, entanglement="linear", reps=2).decompose()
circ.assign_parameters([0.4] * len(circ.parameters), inplace=True)

circ.draw()

observables = PauliList(["Z" * width])
partitioned_problem = partition_problem(
    circuit=circ, partition_labels=(("A" * subset_a) + ("B" * subset_b)), observables=observables
)
subcircuits = partitioned_problem.subcircuits
subobservables = partitioned_problem.subobservables
bases = partitioned_problem.bases
subexperiments, coefficients = generate_cutting_experiments(
    circuits=subcircuits, observables=subobservables, num_samples=np.inf
)

2023-12-03 10:02:52.025220: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-03 10:02:52.054172: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-03 10:02:52.054603: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [5]:
# Set up a Qiskit Aer Sampler primitive for each circuit partition
samplers = {
    label: Sampler(run_options={"shots": 2**12}) for label in subexperiments.keys()
}

# Retrieve results from each partition's subexperiments
results = {
    label: sampler.run(subexperiments[label]).result()
    for label, sampler in samplers.items()
}

reconstructed_expvals = reconstruct_expectation_values(
    results,
    coefficients,
    subobservables,
)

print(f"Reconstructed expectation values: {[np.round(reconstructed_expvals[i], 8) for i in range(len(reconstructed_expvals))]}")

Device #0, Loaded binary from: /home/iamu/.qrack/qrack_ocl_dev_NVIDIA_GeForce_RTX_2070_Super.ir
Reconstructed expectation values: [0.41007423]
