##### Copyright 2020 The TensorFlow Quantum Authors.

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Layerwise learning for quantum neural networks

Author : Andrea Skolik

Contributors : Masoud Mohseni

Created : 2019

Last updated : 2020-Feb-27

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tensorflow/quantum/blob/research/layerwise_learning/layerwise_learning.ipynb)

In this notebook, we will use the technique introduced in [1] to efficiently train a quantum neural network without making any initial guesses about the structure that's neccessary to solve a certain learning task. To do this, we successively add layers to a QNN during training, which does not only make training faster, but also ensures a better signal-to-noise ratio compared to training the full circuit when done on real hardware.

It is well known that randomly initialized parametrized quantum circuits suffer from exponentially decaying gradients as circuits grow in size [2]. 
One strategy to avoid this is finding clever initialization schemes for deep circuits. Another approach which we take here instead focuses on the structure of the circuit, and shows how a deep parametrized circuit can be constructed during training. By training individual partitions of the circuit as it grows, we avoid the randomization effect that causes barren plateaus. This is mainly of importance on noisy intermediate-scale quantum (NISQ) devices, as these will suffer most from the unfavorable signal-to-noise ratio when running variational algorithms. As the gradients produced by circuits grow smaller, we need more and more measurements from a quantum device to accurately estimate them. When using layerwise learning (LL), gradients stay larger during training and we therefore need less measurements to get sufficient training signal for the optimizer. Additionally, we decrease the overall number of parameter updates, so that LL provides an efficient strategy to run variational algorithms on NISQ devices.

LL works in two phases as shown in the figure below:

![Two phases of layerwise learning](https://github.com/tensorflow/quantum/blob/research/layerwise_learning/images/layers.png)

In the first phase, we start training with a small number of layers and train those for a fixed number of epochs. After that, we add another set of layers and freeze the parameters of the previous step's layers. We repeat this process until the desired depth is reached. In phase two, we perform additional optimization sweeps over larger subsets of the layers using the final circuit configuration from phase one. The parameters from this circuit give us a good starting point to optimize quarters, halves, or even the full circuit without initializing on a barren plateau.

This kind of learning scheme can be used for various types of learning tasks and input data, so long as the QNN structure allows iteratively building the circuits. In this notebook we look at a simple example of classifying MNIST digits with randomly generated layers.



[1] Layerwise learning for quantum neural networks,  A. Skolik, J. R. McClean, M. Mohseni, P. van der Smagt, and M. Leib, in preparation.

[2] Barren plateaus in quantum neural network training landscapes,   J.  R.  McClean,  S.  Boixo,  V.  N.  Smelyanskiy,  R.  Babbush,  and H. Neven, Nature Communications 9 (2018)

In [0]:
!pip install --upgrade cirq==0.7.0

In [0]:
!pip install --upgrade tensorflow==2.1.0

In [0]:
!pip install tfq-nightly

In [0]:
import collections
import itertools
import random

import cirq
import sympy
import numpy as np
import tensorflow_quantum as tfq
import tensorflow as tf
import matplotlib.pyplot as plt

First, we need to create the layers we want to use in our circuit. We construct layers that apply a randomly chosen X, Y, or Z gate on each qubit, and a ladder of CZ gates that connect them. This is the same structure as used in [2].

In [0]:
def create_layer(qubits, layer_id):
    symbols = [sympy.Symbol(layer_id + '-' + str(i)) for i in range(len(qubits))]
    gate_set = [cirq.Rx, cirq.Ry, cirq.Rz]
    gates = [random.choice(gate_set)(symbols[i])(q) for i, q in enumerate(qubits)]

    for control, target in zip(qubits, qubits[1:]):
        gates.append(cirq.CZ(control, target))

    return gates, symbols

We also need to prepare the training data. For simplicity, we borrow the training data and data input scheme from the MNIST classification example in the TFQ docs [TODO: add link to TFQ notebook]. Namely we downsample and flatten the images, such that we have vectors with binary entries. These bitstrings are then fed to the circuit by applying a layer of X gates to qubits that correspond to ones in the image vector.

In [0]:
def reduce_image(x):
    x = tf.reshape(x, [1, 28, 28, 1])
    x = tf.image.resize(x, [4, 4])
    x = tf.reshape(x, [4, 4])
    x = x / 255
    return x.numpy()

def remove_contradicting(xs, ys):
    mapping = collections.defaultdict(set)
    for x, y in zip(xs, ys):
        mapping[str(x)].add(y)

    return zip(*((x, y) for x, y in zip(xs, ys) if len(mapping[str(x)]) == 1))

def convert_to_circuit(image):
    values = np.ndarray.flatten(image)
    qubits = cirq.GridQubit.rect(1, len(values))
    circuit = cirq.Circuit()

    for i, value in enumerate(values):
        if value > 0.5:
            circuit.append(cirq.X(qubits[i]))

    return circuit

def convert_label(y):
    if y == 3:
        return 1.0
    else:
        return -1.0


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

print("Number of original training examples:", len(x_train))
print("Number of original test examples:", len(x_train))

x_train, y_train = zip(*((x, y) for x, y in zip(x_train, y_train) if y in [3, 6]))
x_test, y_test = zip(*((x, y) for x, y in zip(x_test, y_test) if y in [3, 6]))

x_train = [reduce_image(x) for x in x_train]
x_test = [reduce_image(x) for x in x_test]

x_train, y_train = remove_contradicting(x_train, y_train)
x_test, y_test = remove_contradicting(x_test, y_test)

print("Number of filtered training examples:", len(x_train))
print("Number of filtered test examples:", len(x_test))

x_train = [convert_to_circuit(x) for x in x_train]
x_test = [convert_to_circuit(x) for x in x_test]

y_train = [convert_label(y) for y in y_train]
y_test = [convert_label(y) for y in y_test]

In [0]:
# increase for more accurate results
NUM_EXAMPLES = 128
x_train = x_train[:NUM_EXAMPLES]
y_train = y_train[:NUM_EXAMPLES]

x_train = tfq.convert_to_tensor(x_train)
x_test = tfq.convert_to_tensor(x_test)
y_train = np.array(y_train)
y_test = np.array(y_test)

Now we will set up our training loop. We specify the number of qubits in the circuit, how many layer addition steps to perform, and how many layers to add in each step. The latter is a hyperparameter of our model that can be tuned for the learning task at hand. There is a trade-off between keeping the trained partitions as small as possible, but at the same time not too small to make significant progress on the learning task. You can play with the hyperparameters below to notice this difference.

In [0]:
n_qubits = 6
n_layer_steps = 5
n_layers_to_add = 2
data_qubits = cirq.GridQubit.rect(1, n_qubits)
readout = cirq.GridQubit(0, n_qubits-1)

symbols = []
layers = []
weights = []

training_history = []

for layer_id in range(n_layer_steps):
    print("\nLayer:", layer_id)
    circuit = cirq.Circuit()
    for i in range(n_layers_to_add):
        layer, layer_symbols = create_layer(data_qubits, f'layer_{layer_id}_{i}')
        layers.append(layer)
        symbols.append(layer_symbols)

    circuit += layers

    # prepare the readout qubit
    circuit.append(cirq.X(readout))
    circuit.append(cirq.H(readout))
    circuit.append(cirq.X(readout))
    readout_op = cirq.Z(readout)

    # setup the Keras model
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Input(shape=(), dtype=tf.dtypes.string))
    model.add(
        tfq.layers.PQC(
            model_circuit=circuit,
            operators=readout_op,
            differentiator=tfq.differentiators.ParameterShift(),
            initializer=tf.keras.initializers.Zeros))

    print(model.summary())

    model.compile(loss=tf.keras.losses.squared_hinge,
                  optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))

    # set parameters to 0 for new layers
    model.set_weights([np.pad(weights, (n_qubits*n_layers_to_add, 0))])

    model.fit(x_train,
              y_train,
              batch_size=128,
              epochs=20,
              verbose=2,
              validation_data=(x_test, y_test))

    qnn_results = model.evaluate(x_test, y_test)
    training_history.append(qnn_results)

    weights = model.get_weights()[0]

In [0]:
plt.plot(training_history)

As already pointed out in the MNIST example notebook, a classical neural network is hard to beat on a simple learning task like this, especially with a basic data encoding scheme as used above. In general, layerwise learning can be used in arbitrary configurations that allow successively stacking and training layers, and it is independent of the data encoding scheme used - so feel free to play with more elaborate data sets as well!