### An example of Automatic Differentiation

In [1]:
using PauliPropagation

Note that we will define a lot of variables going forward as constant via the `const` syntax. In Julia, this does not fix the value of the variable, but its type. This is vital when using global variables inside functions so that performance is maintained.

In [2]:
const nq = 32

const topology = bricklayertopology(nq);

We define a transverse field Hamiltonian, whose derivative we will compute. This could be used within a variational energy minimization routine to find its ground state. 

The Hamiltonian here reads $H = \sum_{i}X_i + \sum_{\langle i, j\rangle}Z_iZ_j$ where $ \langle i, j\rangle$ denotes neighbors on the topology.

In [3]:
H = PauliSum(nq)

for qind in 1:nq
    add!(H, :X, qind, 1.0)
end

for pair in topology
    add!(H, [:Z, :Z], collect(pair), 1.0)
end

H

PauliSum(nqubits: 32, 63 Pauli terms:
 1.0 * IZZIIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIXIIIIII...
 1.0 * IIIIIXIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIIIXIII...
 1.0 * IIIXIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIXIIIII...
 1.0 * IIIIIIIXIIIIIIIIIIII...
 1.0 * IXIIIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIZZIIIII...
 1.0 * IIIIIIIIIIIIIIIIIIIX...
 1.0 * IIIIIIXIIIIIIIIIIIII...
 1.0 * IIIIIIIIZZIIIIIIIIII...
 1.0 * IIIIXIIIIIIIIIIIIIII...
 1.0 * IIIIIZZIIIIIIIIIIIII...
 1.0 * IIIIIIIZZIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIIIIIXI...
 1.0 * IIIIIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIIIIIII...
  ⋮)

Define some generic quantum circuit

In [4]:
nl = 4

# define our circuit and denote it with `const` to the code that uses this global variable fast
const circuit = hardwareefficientcircuit(nq, nl; topology=topology)
nparams = countparameters(circuit)

508

Importantly, we need to set our truncations. Depending on which package you are using to compute your gradients, you can use different truncations. 

`ReverseDiff` for example is a sophisticated package for automatic _reverse-mode_ differentiation. It will build a computational graph that it then differentiates using the chain rule. This is how large-scale neural networks are trained, and is commonly referred to as gradient _backpropagation_. The challenge here is that the graph for the chain rule is computed once (to the best of our knowledge), which means that only truncations during the initial computation will be respected. Truncations that we think work well here are `max_weight`, `max_freq`, and `max_sins`, as they do not depend on the particular parameters of the quantum circuit. On the other hand, which paths are explore with truncations such as `min_abs_coeff` will not be updated (again, to the best of our knowledge) as the gradients are computed.

Packages such as `ForwardDiff` or manual numerical differentiation, on the other hand, always involve computation of the loss function, which is affected by all truncations. Unfortunately, these methods are slower for circuits with more than several dozen parameters.

So let's wrap the coefficients into `PauliFreqTracker`, which keeps track how many times a path splits at a `PauliRotation`. We will use this to truncate our simulation, i.e., we will set a `max_freq` truncation. One could also truncate on `min_abs_coeff`, but `ReverseDiff` would not continually update which paths are truncated as you train based on which currently have small coefficient (at least we think so).

In [5]:
const wrapped_H = wrapcoefficients(H, PauliFreqTracker)

PauliSum(nqubits: 32, 63 Pauli terms:
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIXIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIXIIII...
 PauliFreqTracker(1.0) * IZZIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIXIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIZZIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIZZIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIXIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIXIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIZZIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIXIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIZZIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIII

Define our truncations

In [6]:
const max_freq = 30
const max_weight = 5

5

Generate some generic parameters

In [7]:
using Random
Random.seed!(42)
thetas = randn(nparams);

One expectation evaluation

In [8]:
@time psum = propagate(circuit, wrapped_H, thetas; max_freq, max_weight);
overlapwithzero(psum)

  0.601659 seconds (480.14 k allocations: 52.951 MiB, 0.88% gc time, 37.35% compilation time)


1.0578323811939663

Now wrap it into a function that takes only `thetas` as argument. This is why we denoted many global variables as `const`, because we use them in here.

This loss function does not work because the `ReverseDiff` package needs to propagate its custom coefficient type. But `H` is already stricktly typed. So the following loss function would not be automatically differentiable.

In [9]:
function naivelossfunction(thetas)
    psum = propagate(circuit, wrapped_H, thetas; max_freq, max_weight);
    return overlapwithzero(psum)
end

naivelossfunction (generic function with 1 method)

In [10]:
@time naivelossfunction(thetas)

  0.385146 seconds (32.27 k allocations: 22.070 MiB, 2.08% compilation time)


1.0578323811939663

We now create a loss function that does indeed work. It requires that we build the Hamiltonian with the correct coefficient type, which here is the element type of `thetas`. This will make everything differentiable.

In [11]:
function lossfunction(thetas)
    coefftype = eltype(thetas)

    H = PauliSum(nq, coefftype)
    for qind in 1:nq
        add!(H, :X, qind, coefftype(1.0))
    end
    for pair in topology
        add!(H, [:Z, :Z], collect(pair), coefftype(1.0))
    end
    
    wrapped_H = wrapcoefficients(H, PauliFreqTracker)

    # be also need to run the in-place version with `!`, because by default we copy the Pauli sum
    wrapped_H = propagate!(circuit, wrapped_H, thetas; max_freq, max_weight);
    return overlapwithzero(wrapped_H)
end

lossfunction (generic function with 1 method)

Instead, we need to define a loss function that creates H every time with the correct coefficient type:

In [12]:
@time lossfunction(thetas)

  0.393661 seconds (52.96 k allocations: 23.455 MiB, 0.73% gc time, 4.21% compilation time)


1.0578323811939663

Now import ReverseDiff and follow their example:

In [13]:
using ReverseDiff: GradientTape, gradient!, compile

In [14]:
### This is following an ReverseDiff.jl example

# some inputs and work buffer to play around with
grad_array = similar(thetas);

# pre-record a GradientTape for `gradsimulation` using inputs of length m with Float64 elements
@time const simulation_tape = GradientTape(lossfunction, thetas)

# first evaluation compiles and is slower
@time gradient!(grad_array, simulation_tape, thetas)
# second evaluation
@time gradient!(grad_array, simulation_tape, thetas);

  5.340950 seconds (95.81 M allocations: 3.548 GiB, 37.27% gc time, 15.38% compilation time)
  1.827979 seconds (249.37 k allocations: 16.787 MiB, 8.74% compilation time)
  1.717056 seconds


In [15]:
# compile to make it even faster
@time const compiled_simulation_tape = compile(simulation_tape)

# some inputs and work buffer to play around with
grad_array_compiled = similar(thetas);

# first evaluation compiles and is slower
@time gradient!(grad_array_compiled, compiled_simulation_tape, thetas)
# second evaluation
@time gradient!(grad_array_compiled, compiled_simulation_tape, thetas);

 20.316843 seconds (151.30 M allocations: 6.198 GiB, 62.40% gc time, 1.20% compilation time)
  1.338972 seconds (43.11 k allocations: 2.852 MiB, 3.39% compilation time)
  1.303978 seconds


`grad_array` here carries the gradient result. It is changed in-place in `gradient!` so that the array does not need to get allocated over and over.

See how calculating the gradient is only a few times slower than calculating the loss! The magic if reverse-mode differentiation.