# Tutorial: Optimization

Hello, and welcome to our tutorial on optimization. Here, we will explore three of Tequila's built in optimizers. Chiefly, we will cover the gradient descent (GD) optimizer, we will also discuss the Phoenics and GPyOpt bayesian optimizers that can be accessed through Tequila.

# 1: The GD optimizer.

In [None]:
### start at the start: import statements!
import tequila as tq
import numpy as np
from tequila.optimizers.optimizer_gd import minimize as gd_min

## We start by selecting an objective to optimize. We will begin with a fairly simple, 2-qubit expectationvalue. We will optimize our 2-qubit circuit with the simple, but non trivial hamiltonian $[Y(0)+Qm(0)]\otimes X(1)$, where $Qm=\frac{1}{2} (I + Z)$, the projector onto the 0 state.

In [None]:
### optimizing the circuit in terms of pi makes the result of the optimization easier to interpret.

a = tq.Variable(name="a")*tq.numpy.pi
b = tq.Variable(name="b")*tq.numpy.pi
c = tq.Variable(name="c")*tq.numpy.pi
d = tq.Variable(name='d')*tq.numpy.pi

U = tq.gates.H(target=[0])
U += tq.gates.H(target=1)
U += tq.gates.Ry(target=0, angle=a)
U += tq.gates.Rz(target=1, angle=b)
U += tq.gates.Z(target=1,control=0)
U += tq.gates.Rx(target=0, angle=c)
U += tq.gates.Rx(target=1,angle=d)
U += tq.gates.Z(target=1,control=0)


### once we have a circuit, we pick a hamiltonian to optimize over
H=(tq.paulis.Y(0)+tq.paulis.Qm(0))*tq.paulis.X(1)
O=tq.ExpectationValue(U=U,H=H)

### we use the .draw function to pretty-print circuits via backend printers.
tq.draw(U,backend='qiskit')
print(O)

## We are ready to optimize, now!
## like all tequila optimizers, the GD optimizer has a minimize function and most of the arguments are the same. However, there is one important difference: the GD optimizer takes a learning rate, lr. This parameter mediates step size in all of the GD optimizer methods; it is a positive float which scales the step in the direction of the gradient. 

## There are several available optimization methods available to the GD optimizer, including basic SGD, SGD with momentum, and more advanced optimization strategies like Adam or RMS-prop. 


In [None]:
print('the following methods are available for Gradient Descent optimization:\n')
print(tq.optimizers.optimizer_gd.OptimizerGD.available_methods())

## We will now optimize our chosen expectationvalue, chosing starting angles equivalent to $\frac{1}{4}\pi$ for all four variables, and optimizing via the ['Adam'](https://towardsdatascience.com/_adam-latest-trends-in-deep-learning-optimization-6be9a291375c) method.

In [None]:
init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}
lr=0.1

### For even more fun, try using sampling with the samples keyword, 
### or pick your favorite backend with the 'backend' keyword!

result=gd_min(O,lr=lr,
              method='adam',
              maxiter=80,
              initial_values=init,
              silent=True)

## The plots below show the trajectory of both the value of the objective and the values of the angles as a function of time.

In [None]:
result.history.plot('energies')
result.history.plot('angles')
print('best energy: ',result.energy)
print('optimal angles: ',result.angles)

### We see that, minus a few hiccups, all the angles converge to optimimum values.
#### Excercise: is this truly the best performance possible, or are we stuck in a local minimum?

## Let's repeat what we did above, but with a few of the other methods! Here's RMSprop:

In [None]:
init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}
lr=0.01
result=gd_min(O,lr=lr,
              method='rmsprop',
              maxiter=80,
              initial_values=init,
              silent=True)
print('RMSprop optimization results:')
result.history.plot('energies')
result.history.plot('angles')
print('best energy: ',result.energy)
print('optimal angles: ',result.angles)

## ... And here's Momentum:

In [None]:
init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}
lr=0.1
result=gd_min(O,lr=lr,
              method='momentum',
              maxiter=80,
              initial_values=init,
              silent=True)

print('momentum optimization results:')
result.history.plot('energies')
result.history.plot('angles')
print('best energy: ',result.energy)
print('optimal angles: ',result.angles)

## Note that when using the [RMSprop](https://towardsdatascience.com/understanding-rmsprop-faster-neural-network-learning-62e116fcf29a) method, we reduced the learning rate from 0.1 to 0.01. Different methods may be more or less sensitive to choices of initial learning rate. Try going back to the previous examples, and choosing different learning rates, or different initial parameters, to gain a feel for how sensitive different methods are.

# 1.1: The GD optimizer, with the Quantum Natural Gradient.

### The Quantum Natural Gradient, or QNG, is a novel method of calculating gradients for quantum systems, inspired by the natural gradient sometimes employed in classical machine learning. The usual gradient we employ is with respect to a euclidean manifold, but this is not the only geometry -- nor even, the optimal geometry -- of quantum space. The QNG is, in essence, a method of taking gradients with respect to (an approximation to) the Fubini-Study metric. For information on how (and why) the QNG is used, see [Stokes et.al](https://arxiv.org/abs/1909.02108). 

### Using the qng in Tequila is as simple as passing in the keyword gradient='qng' to optimizers which support it, such as the GD optimizer.  We will use it to optimize a more complicated circuit below, and then compare the results to optimizing the same circuit with the regular gradient.

In [None]:
### this time, don't scale by pi

H = tq.paulis.Y(0)*tq.paulis.X(1)*tq.paulis.Y(2)
U = tq.gates.Ry(tq.numpy.pi/2,0) +tq.gates.Ry(tq.numpy.pi/3,1)+tq.gates.Ry(tq.numpy.pi/4,2)
U += tq.gates.Rz('a',0)+tq.gates.Rz('b',1)
U += tq.gates.CNOT(control=0,target=1)+tq.gates.CNOT(control=1,target=2)
U += tq.gates.Ry('c',1) +tq.gates.Rx('d',2)
U += tq.gates.CNOT(control=0,target=1)+tq.gates.CNOT(control=1,target=2)
E = tq.ExpectationValue(H=H, U=U)

print('drawing a more complicated circuit. Hope you like it!')
tq.draw(U)
### the keyword stop_count, below, stops optimization if no improvement occurs after 50 epochs.
### let's use a random initial starting point:
init={k:np.random.uniform(-2,2) for k in ['a','b','c','d']}

lr=0.01
result = tq.minimize(objective=E,
                     gradient='qng',
                     method='sgd', maxiter=200,lr=lr,
                     initial_values=init, silent=True)


In [None]:
result.history.plot('energies')
result.history.plot('angles')
print('best energy with qng: ',result.energy)
print('optimal angles without qng: ',result.angles)

## To gain appreciation for why one might use the QNG, let's optimize the same circuit with the same learning rate and the same method, but without QNG.

In [None]:
lr=0.01
result = tq.minimize(objective=E,
                     gradient=None,
                     method='sgd', maxiter=200,lr=lr,
                     initial_values=init, silent=True)
print('plotting what happens without QNG')
result.history.plot('energies')
result.history.plot('angles')
print('best energy without qng: ',result.energy)
print('optimal angles without qng: ',result.angles)

## Though the starting point was random (and so I, your humble tutorial writer, do not know what your graphs look like), you will most likely see that the QNG run achieved a greater degree of improvement, and that the trajectories followed by angles there was different from that followed by angles in the straight-gd optimization. Feel free to play around with other methods, learning rates, or circuits in the space below!


In [None]:
### have fun!

# 2. Bayesian Optimization

## [Bayesian optimization](https://arxiv.org/abs/1807.02811) is a method of global optimization, often used to tune hyperparameters in classical learning. It has also seen use in the optimization of [quantum circuits](https://arxiv.org/pdf/1812.08862.pdf). Tequila currently supports 2 different bayesian optimization algorithms: [Phoenics](https://github.com/aspuru-guzik-group/phoenics) and [GPyOpt](https://github.com/SheffieldML/GPyOpt), optimizers originally developed for optimizing expensive experimental procedures in chemistry. Click the links to get to the respective github pages, and download the optimizers before continuing this tutorial.

# 2.1: GPyOpt

## GPyOpt can be used like any of our other optimizers. Like the GD and SciPy optimizers, it also takes a 'method' keyword. 3 methods are supported: 'lbfgs','DIRECT', and 'CMA'. See the GPyOpt github for more info.

In [None]:
from tequila.optimizers.optimizer_gpyopt import minimize as gpy_min

## we will use GPyOpt to optimize the same circuits as seen above.

In [None]:
### optimizing the circuit in terms of pi makes the result of the optimization easier to interpret.

a = tq.Variable(name="a")*tq.numpy.pi
b = tq.Variable(name="b")*tq.numpy.pi
c = tq.Variable(name="c")*tq.numpy.pi
d = tq.Variable(name='d')*tq.numpy.pi

U = tq.gates.H(target=[0])
U += tq.gates.H(target=1)
U += tq.gates.Ry(target=0, angle=a)
U += tq.gates.Rz(target=1, angle=b)
U += tq.gates.Z(target=1,control=0)
U += tq.gates.Rx(target=0, angle=c)
U += tq.gates.Rx(target=1,angle=d)
U += tq.gates.Z(target=1,control=0)


### once we have a circuit, we pick a hamiltonian to optimize over
H=(tq.paulis.Y(0)+tq.paulis.Qm(0))*tq.paulis.X(1)
O=tq.ExpectationValue(U=U,H=H)

### we use the .draw function to pretty-print circuits via backend printers.
tq.draw(U,backend='qiskit')
print(O)

In [None]:
### let's use the lbfgs method.
init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}
### note: no lr is passed here! there are fewer tunable keywords for this optimizer.
result=gpy_min(O,
              method='lbfgs',
              maxiter=80,
              initial_values=init)

print('GPyOpt optimization results:')
result.history.plot('energies')
result.history.plot('angles')
print('best energy: ',result.energy)
print('optimal angles: ',result.angles)

## Perhaps you are looking at the plots above in horror. But, do take note: bayesian optimization is a global, exploratory optimization method, designed to explore large portions of parameter space while still seeking out optimality. Look at the optimal energy again, and one sees that the best performance of this optimization method matched that of all the gradient descent methods. We will apply gpyopt, next, to the QNG example circuit above, and see how bayesian optimization compares to QNG and SGD.

In [None]:
### this time, don't scale by pi

H = tq.paulis.Y(0)*tq.paulis.X(1)*tq.paulis.Y(2)
U = tq.gates.Ry(tq.numpy.pi/2,0) +tq.gates.Ry(tq.numpy.pi/3,1)+tq.gates.Ry(tq.numpy.pi/4,2)
U += tq.gates.Rz('a',0)+tq.gates.Rz('b',1)
U += tq.gates.CNOT(control=0,target=1)+tq.gates.CNOT(control=1,target=2)
U += tq.gates.Ry('c',1) +tq.gates.Rx('d',2)
U += tq.gates.CNOT(control=0,target=1)+tq.gates.CNOT(control=1,target=2)
E = tq.ExpectationValue(H=H, U=U)

print('Hey, remember me?')
tq.draw(U)
### the keyword stop_count, below, stops optimization if no improvement occurs after 50 epochs.
### let's use a random initial starting point:
init={k:np.random.uniform(-2,2) for k in ['a','b','c','d']}

result = gpy_min(objective=E,maxiter=25,method='lbfgs',
                     initial_values=init)

result.history.plot('energies')
print('best energy: ',result.energy)
print('optimal angles: ',result.angles)

## In a very, very small number of step, GPyOpt is able to match the performance of SGD with the QNG, and discovers the hidden truth: the optimil circuit, here, is one where all angles are zero (modulo 2 $\pi$)
## Feel free to play around more with other circuits in the space below!

# 2.2 Phoenics

## Finally, we turn to Phoenics. This algorithm, originally developed within the Aspuru-Guzik group (Hey, just like Tequila!), can be accessed in the usual fashion. It's performance on the two-qubit optimization circuit is shown below. Note that the number of datapoints exceeds the provided maxiter; maxiter here controls the number of parameter __batches__ suggested by phoenics. phoenics suggests a number of parameter sets to try out, per batch, that scales with the number of parameters (in a nonlinear fashion), so you may want to set maxiter lower if you are only playing around.

In [None]:
from tequila.optimizers.optimizer_phoenics import minimize as p_min

In [None]:
### optimizing the circuit in terms of pi makes the result of the optimization easier to interpret.

a = tq.Variable(name="a")*tq.numpy.pi
b = tq.Variable(name="b")*tq.numpy.pi
c = tq.Variable(name="c")*tq.numpy.pi
d = tq.Variable(name='d')*tq.numpy.pi

U = tq.gates.H(target=[0])
U += tq.gates.H(target=1)
U += tq.gates.Ry(target=0, angle=a)
U += tq.gates.Rz(target=1, angle=b)
U += tq.gates.Z(target=1,control=0)
U += tq.gates.Rx(target=0, angle=c)
U += tq.gates.Rx(target=1,angle=d)
U += tq.gates.Z(target=1,control=0)

H=(tq.paulis.Y(0)+tq.paulis.Qm(0))*tq.paulis.X(1)
O=tq.ExpectationValue(U=U,H=H)

init={'a':0.25,'b':0.25,'c':0.25,'d':0.25}
### geez! even fewer keywords! 
### to see what you can pass down to phoenics, see the tequila documentation for that module.
result=p_min(O,
            maxiter=5,
            initial_values=init,
            silent=True)

print('Phoenics optimization results on 2 qubit circuit:')
result.history.plot('energies')
result.history.plot('angles')
print('best energy: ',result.energy)
print('optimal angles: ',result.angles)

## This concludes our tutorial. Hope you had fun! Happy optimizing!