# How to Define Custom Gates
`PauliPropagation.jl` is extensible and allows you to define your own gates. Depending on how much you can or want to code, you can definte a gate that _works_ or one that is as fast as it gets. Here will see what you need to define.

In [1]:
using PauliPropagation

Let us start by defining a `SWAP` gate. It is sub-typing from `StaticGate`, which denotes that it does not take any variable parameters at propagation time. It always acts the same.

In [2]:
struct CustomSWAPGate <: StaticGate
    qinds::Tuple{Int, Int}  # The two sites to be swapped
end

The action of a `SWAP` gate on a Pauli string is that it swaps the Paulis on two sites. We can now define a function `apply` which receives these 4 arguments in this order, as well as potential `kwargs`: `apply(gate::YourGate, pstr, theta, coefficient; kwargs...)`. We can ignore `kwargs` for now, but you can use them to pass arguments from the top level down to your function. Also `theta` is always passed, but for `StaticGate`s you should ignore it.

This is how you can define `SWAP`:

In [3]:
function PauliPropagation.apply(gate::CustomSWAPGate, pstr, theta, coefficient; kwargs...)
    # get the Pauli on the first site
    pauli1 = getpauli(pstr, gate.qinds[1])
    # get the Pauli on the second site
    pauli2 = getpauli(pstr, gate.qinds[2])
    
    # set the Pauli on the first site to the second Pauli
    pstr = setpauli(pstr, pauli2, gate.qinds[1])
    # set the Pauli on the second site to the first Pauli
    pstr = setpauli(pstr, pauli1, gate.qinds[2])

    return pstr, coefficient
end

This is it, really.

Now set up the simulation. 25 qubits in a 5 by 5 grid.

In [4]:
nx = 5
ny = 5
nq = nx * ny

topology = get2dtopology(nx, ny);

`nl` layers of a circuit consisting of `RX` and `RZZ` Pauli rotations.

In [5]:
nl = 3
base_circuit = tfitrottercircuit(nq, nl; topology=topology);
nparams = countparameters(base_circuit)

195

Define our observable as $ Z_7 Z_{13} $.

In [6]:
pstr = PauliString(nq, [:Z, :Z], [7, 13])

PauliString(nqubits: 25, 1.0 * IIIIIIZIIIIIZIIIIIII...)

Circuit parameters with a random seed.

In [7]:
using Random
Random.seed!(42)
thetas = randn(nparams);

For this notebook, we will use a minimum coefficient threshold. The results are still almost exact.

In [8]:
min_abs_coeff = 1e-4

0.0001

Now add a 1D line of SWAP gates after the first layer of gates in the base circuit.

In [9]:
nparams_per_layer = Int(length(base_circuit)/nl)

65

In [10]:
ourSWAP_circuit = deepcopy(base_circuit);
for qind in 1:(nq-1)
    insert!(ourSWAP_circuit, nparams_per_layer, CustomSWAPGate((qind, qind+1)))
end

Run the circuit

In [11]:
@time ourSWAP_psum = propagate(ourSWAP_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff)

  0.806805 seconds (595.44 k allocations: 101.505 MiB, 1.97% gc time, 67.80% compilation time)


PauliSum(nqubits: 25, 86652 Pauli terms:
 -0.00020048 * YZIZYYZZIZIYZIIIIIII...
 0.00019386 * XIIIZIXIIIZYXIIZXZII...
 0.0062958 * IIIIIZIIIIZIIIIIXZII...
 -0.00040887 * IIIIIYZZZIZXZYZZYIZI...
 -0.00040016 * YIIIZYXXZIIYIIIZXIII...
 0.0015383 * IIIIIZIIZIXXYXZIXZZI...
 0.00043597 * XZIIZIXIIIZXIIIZXIII...
 -0.0003435 * YZIZXIXIIIIZIIIIYIII...
 -0.00029089 * YZIIIXXZIIIXYZIZYIII...
 0.00047305 * IIIIIYIIZIXXXXZIXIZI...
 0.00012646 * YIIIIXIIZXYZXYZIIZZI...
 -0.00014033 * IZIIIIZZIIXYZIIZYIII...
 -0.0028668 * YIIIIXYIIIIXXZIIZIII...
 0.00027333 * ZIIIIYIYIIZYYIIIZZII...
 -0.00027386 * ZIIZYIYIIZZIIIIIIIII...
 -0.00018862 * XIIIZYYIIIXYZIIZYIII...
 0.00046973 * IIIIIYYIZIZYYYIZYZZI...
 0.00017632 * YIIIZXIIZYZZXYZIIZZI...
 0.00021627 * IIIIIXXZIIYYIIIIZIII...
 -0.00040872 * XIIIZYYZIIIXZIIZXIII...
  ⋮)

Overlap with the zero-state

In [12]:
overlapwithzero(ourSWAP_psum)

0.2832103356627234

This looks okay, but is it correct? One thing you may have noticed is that `SWAP` is a `Clifford` operation, i.e., one that takes one Pauli to exactly one other Pauli. We actually have that in our package so we can easily compare.

In [13]:
cliffSWAP_circuit = deepcopy(base_circuit);
for qind in 1:(nq-1)
    insert!(cliffSWAP_circuit, nparams_per_layer, CliffordGate(:SWAP, (qind, qind+1)))
end

In [14]:
@time cliffSWAP_psum = propagate(cliffSWAP_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.323095 seconds (63.59 k allocations: 67.480 MiB, 1.52% gc time, 22.11% compilation time)


Are the results the same?

In [15]:
overlapwithzero(cliffSWAP_psum)

0.2832103356627234

In [16]:
cliffSWAP_psum == ourSWAP_psum

true

Yes!

We can also benchmark the performance.

In [17]:
using BenchmarkTools

In [18]:
@btime propagate($ourSWAP_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  235.937 ms (692 allocations: 63.30 MiB)


In [19]:
@btime propagate($cliffSWAP_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  235.432 ms (692 allocations: 63.30 MiB)


No downside at all from defining our custom gate. How? This is because the `apply` function for this gate is *type stable*! Type stability is absolutely crucial in Julia, and codes live and die by it.

In [20]:
@code_warntype apply(CustomSWAPGate((7, 8)), pstr.term, 0.0, 1.0)

MethodInstance for PauliPropagation.apply(::CustomSWAPGate, ::UInt64, ::Float64, ::Float64)
  from apply([90mgate[39m::[1mCustomSWAPGate[22m, [90mpstr[39m, [90mtheta[39m, [90mcoefficient[39m; kwargs...)[90m @[39m [90mMain[39m [90m[4mIn[3]:1[24m[39m
Arguments
  #self#[36m::Core.Const(PauliPropagation.apply)[39m
  gate[36m::CustomSWAPGate[39m
  pstr[36m::UInt64[39m
  theta[36m::Float64[39m
  coefficient[36m::Float64[39m
Body[36m::Tuple{UInt64, Float64}[39m
[90m1 ─[39m %1 = Core.NamedTuple()[36m::Core.Const(NamedTuple())[39m
[90m│  [39m %2 = Base.pairs(%1)[36m::Core.Const(Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}())[39m
[90m│  [39m %3 = Main.:(var"#apply#1")(%2, #self#, gate, pstr, theta, coefficient)[36m::Tuple{UInt64, Float64}[39m
[90m└──[39m      return %3



All blue means that everything is great! If correctly implemented, `apply` will be type stable if it returns a known number of Pauli and coefficient pairs. Here it is just 1 because it is a Clifford gate.

Onto an example of a gate that can _split_ a Pauli string into two: The `T` gate.

In [21]:
struct CustomTGate <: StaticGate
    qind::Int
end

A `T` gate is a non-Clifford gate that commutes with `I` and `Z`, splits `X` into `cos(π/4)X - sin(π/4)Y`, and `Y` into `cos(π/4)Y + sin(π/4)X`. 

Let's write the code for that.

In [22]:
function PauliPropagation.apply(gate::CustomTGate, pstr, theta, coefficient=1.0; kwargs...)
    # get the Pauli on the site `gate.qind`
    pauli = getpauli(pstr, gate.qind)
    
    if pauli == 0 || pauli == 3  # I or Z commute
        return pstr, coefficient     
    end
    
    if pauli == 1 # X goes to X, -Y
        new_pauli = 2  # Y
        # set the Pauli
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        # adapt the coefficients
        new_coefficient = -1 * coefficient * sin(π/4)
        coefficient_prime = coefficient * cos(π/4)
        
    else # Y goes to Y, X
        new_pauli = 1  # X
        # set the Pauli
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        # adapt the coefficients
        new_coefficient = coefficient * sin(π/4)
        coefficient_prime = coefficient * cos(π/4)
    end
    
    return pstr, coefficient_prime, new_pstr, new_coefficient
    
end

Insert a layer of `TGate`s after the first layer of the base circuit.

In [23]:
ourT_circuit = deepcopy(base_circuit);
for qind in 1:nq
    insert!(ourT_circuit, nparams_per_layer, CustomTGate(qind))
end

And run:

In [24]:
@time ourT_psum = propagate(ourT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff)

  1.021635 seconds (6.01 M allocations: 319.650 MiB, 1.55% gc time, 10.01% compilation time)


PauliSum(nqubits: 25, 301012 Pauli terms:
 0.00047325 * YXZIIZIYIIIIXZIIIZII...
 -0.00012269 * YXIIIIYIIIIZXIIIIXZI...
 0.00047847 * IYXIIIZIIIIZYZIIIZII...
 0.00054682 * XXZZIZXYYIIZXZIIIZII...
 -0.00051534 * ZXIIIXXYIIIZXIIIIYZI...
 -0.00017251 * IIIIIIIIZIIXIXZIXIZI...
 -0.00026819 * YZZIIYXXXIIIZZIIIIII...
 -0.00064832 * IXIIIIIXXIIZYZIIIZII...
 -0.00011407 * XYIIIZIIIIIIYZIIIZII...
 -0.00028261 * IIIZIZXZXIIZXXZIIIZI...
 -0.00030335 * IXIZIYYYYIIIZZIIIIII...
 -0.00021259 * ZYIIIIXXIIIZYIIIIXZI...
 -0.00063572 * IXIIIIXZIIIZXIIIIXII...
 0.00071396 * IYIIIIXZIIIYYIIIIZII...
 0.00038524 * IXIIIIIYIIIZYIIIIXZI...
 0.00011024 * IIIIIIYIIIIXXIZIZXYZ...
 0.00018963 * IIIIIZXIZIIIXXZIIYXI...
 0.00012584 * IIIIIIXZZIIIZXZIIYXI...
 0.000482 * YYZIIYXYIIIIYZIIIYZI...
 -0.00029259 * IZZIIIXYXIIZXIIIIYZI...
  ⋮)

In [25]:
overlapwithzero(ourT_psum)

0.27720750460544175

But did it work? Again, we have an implementation of a `TGate` in our library. In case you are interested, we currently implement `T` gates as Pauli `Z` rotations at an angle of `π/4`. Let's compare to that.

In [26]:
frozenZ_circuit = deepcopy(base_circuit);
for qind in 1:nq
    insert!(frozenZ_circuit, nparams_per_layer, TGate(qind))
end
tofastgates!(frozenZ_circuit, nq);

If you call `PauliGate(:Z, qind, parameter)`, this will create a so-called `FrozenGate` wrapping the parametrized `PauliGate`, with a fixed `parameter` at the time of circuit construction.

Run it and compare

In [27]:
@time frozenZ_psum = propagate(frozenZ_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.806765 seconds (10.32 k allocations: 154.798 MiB, 0.70% gc time, 1.16% compilation time)


In [28]:
overlapwithzero(frozenZ_psum)

0.27720750460544175

In [29]:
frozenZ_psum == ourT_psum

true

It works! But is it optimal

In [30]:
using BenchmarkTools

In [31]:
@btime propagate($ourT_circuit, $pstr, $thetas;min_abs_coeff=$min_abs_coeff);

  870.700 ms (5917102 allocations: 313.49 MiB)


In [32]:
@btime propagate($frozenZ_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  770.170 ms (676 allocations: 154.14 MiB)


No, because `apply` for the `CustomTGate` is not type-stable.

In [33]:
@code_warntype apply(CustomTGate(7), pstr.term, 0.0, 1.0)

MethodInstance for PauliPropagation.apply(::CustomTGate, ::UInt64, ::Float64, ::Float64)
  from apply([90mgate[39m::[1mCustomTGate[22m, [90mpstr[39m, [90mtheta[39m, [90mcoefficient[39m; kwargs...)[90m @[39m [90mMain[39m [90m[4mIn[22]:1[24m[39m
Arguments
  #self#[36m::Core.Const(PauliPropagation.apply)[39m
  gate[36m::CustomTGate[39m
  pstr[36m::UInt64[39m
  theta[36m::Float64[39m
  coefficient[36m::Float64[39m
Body[33m[1m::Union{Tuple{UInt64, Float64}, Tuple{UInt64, Float64, UInt64, Float64}}[22m[39m
[90m1 ─[39m %1 = Core.NamedTuple()[36m::Core.Const(NamedTuple())[39m
[90m│  [39m %2 = Base.pairs(%1)[36m::Core.Const(Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}())[39m
[90m│  [39m %3 = Main.:(var"#apply#2")(%2, #self#, gate, pstr, theta, coefficient)[33m[1m::Union{Tuple{UInt64, Float64}, Tuple{UInt64, Float64, UInt64, Float64}}[22m[39m
[90m└──[39m      return %3



It either returns a tuple `Tuple{UInt64, Float64}` of length 2 or a tuple `Tuple{UInt64, Float64, UInt64, Float64}` of length 4. When this is the case, you may want to define some lower-level function under `propagate` for optimal performance. This is how we would do it. Yellow `@code_warntype` output means it might be okay (it is not that much slower after all), but be wary of red.

In [34]:
function PauliPropagation.applygatetoall!(gate::CustomTGate, theta, psum, second_psum, args...; kwargs...)

    for (operator, coeff) in psum
        applygatetoone!(gate, operator, coeff, theta, psum, second_psum; kwargs...)
    end

    return psum, second_psum  # don't swap psums around
end


function PauliPropagation.applygatetoone!(gate::CustomTGate, pstr, coefficient, theta, psum, second_psum, args...; kwargs...)
    
    pauli = getpauli(pstr, gate.qind)
    
    if pauli == 0 || pauli == 3  # I or Z commute
        return
    end

    if pauli == 1 # X goes to X, -Y
        new_pauli = 2  # Y
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        new_coefficient = -1 * coefficient * sin(π/4)
    else # Y goes to Y, X
        new_pauli = 1  # X
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        new_coefficient = coefficient * sin(π/4)
    end

    psum[pstr] = coefficient * cos(π/4)
    second_psum[new_pstr] = new_coefficient

    return
end

The first function re-definition is currently necessary but might change in the future. The second function is the interesting one. Here we manually update the coefficients in the propagating Pauli string dictionary for the Pauli string that already exist, and we add the new one to the second dictionary that will later be merged into the first.

Let's see if this worked.

In [35]:
@time ourT_psum2 = propagate(ourT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.818788 seconds (23.16 k allocations: 155.615 MiB, 0.73% gc time, 4.62% compilation time: 100% of which was recompilation)


In [36]:
overlapwithzero(ourT_psum2)

0.27720750460544175

In [37]:
ourT_psum == ourT_psum2

true

And check the performance.

In [38]:
@btime propagate($ourT_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  781.611 ms (676 allocations: 154.14 MiB)


In [39]:
@btime propagate($frozenZ_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  773.096 ms (676 allocations: 154.14 MiB)


Enjoy defining custom and high-performance gates! 