# Process Overview

Start with a Keras model, which is then converted into a tflite model. The tflite model is then run through the xformer compiler to make an xmos optimised tflite file.

We can use the relavent interpreters for each model to verify that given the same input, they both produce the same output. 

<img src="./conversion_process.jpg" alt="Diagram of conversion Process" style="width: 500px; height: auto; margin: 1rem auto 2rem;" />

In [None]:
! pip install xmos_ai_tools

In [None]:
! pip install tensorflow

In [None]:
import tensorflow as tf
import numpy as np
from xmos_ai_tools import xformer
from xmos_ai_tools.xinterpreters import xcore_tflm_host_interpreter

# Make a Model to convert
Use Keras to make a model of arbiraty size and shape

In [None]:
pool_size = (2, 2)
input_shape = (3, 3, 4)
model = tf.keras.Sequential(
    [tf.keras.layers.AveragePooling2D(pool_size=pool_size, input_shape=input_shape)]
)
model.compile()

## Convert keras model into a tflite model
The xcore converter cannot optimise a keras model directly to run on xcore devices, so it must first be converted into a tflite file(a flatbuffer).

In [None]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)

### Representitive Dataset

Tensorflow can optimise the converted model to int8 if you pass it a representative dataset. This dataset can be a small subset (around ~100-500 samples) of the training or validation data

The below function randomly generates this, but see [the tensorflow ducumentation](https://www.tensorflow.org/lite/performance/post_training_quantization) to see how to do this in practice.

In [None]:
# As an example use a random dataset
def representative_dataset():
    batch_size = 8
    for _ in range(100):
        data = np.random.uniform(-0.1, 0.001, (batch_size, *input_shape))
        yield [data.astype(np.float32)]

* **tf.lite.Optimize.DEFAULT:** Default optimization strategy that quantizes model weights. Enhanced optimizations are gained by providing a representative dataset that quantizes biases and activations as well. Converter will do its best to reduce size and latency, while minimizing the loss in accuracy.

* **target_spec.supported_ops:** Import TFLITE ops. [Tensorflow docs](https://www.tensorflow.org/lite/guide/ops_select)

In [None]:
# Set up the converter to convert our float model into int8 quantised model
# explain  https://www.tensorflow.org/lite/performance/post_training_quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

tflite_model = converter.convert()

# Save the model.
tflite_model_path = "avgpooling2d.tflite"
with open(tflite_model_path, "wb") as f:
    f.write(tflite_model)

# Optimise model for XCore
Use `xcore_conv.convert(input_path, output_path)` to make an xcore optimised version of the model.

In [None]:
xcore_optimised_path = "xcore_model.tflite"
xformer.convert(tflite_model_path, xcore_optimised_path, None)

# Check it worked
To check if it worked, we can use the interpreters to run the models and make sure that they produce the same output.

For normal tensorflow tflite models, use `tensorflow.lite.Interpreter`. For XCore optimised models, the `xmos_ai_tools.xinterpreters.xcore_tflm_host_interpreter` must be used.

In [None]:
tf_interpreter = tf.lite.Interpreter(model_path=tflite_model_path)
tf_interpreter.allocate_tensors()

tf_input_details = tf_interpreter.get_input_details()
tf_output_details = tf_interpreter.get_output_details()

tf_input_shape = tf_input_details[0]["shape"]
# Fill with 126 so that xcore can be given same input
tf_input_data = np.array(np.random.randint(126, 127, tf_input_shape), dtype=np.int8)

tf_interpreter.set_tensor(tf_input_details[0]["index"], tf_input_data)

tf_interpreter.invoke()
tf_output_data = tf_interpreter.get_tensor(tf_output_details[0]["index"])

In [None]:
xcore_interpreter = xcore_tflm_host_interpreter()
xcore_interpreter.set_model(model_path=xcore_optimised_path)

xcore_input_details = xcore_interpreter.get_input_details()
xcore_output_details = xcore_interpreter.get_output_details()

xcore_input_shape = xcore_input_details[0]["shape"]
# Fill with 126 so that xcore can be given same input
xcore_input_data = np.array(
    np.random.randint(126, 127, xcore_input_shape), dtype=np.int8
)

xcore_interpreter.set_tensor(xcore_input_details[0]["index"], xcore_input_data)

xcore_interpreter.invoke()
xcore_output_data = xcore_interpreter.get_tensor(xcore_output_details[0]["index"])

In [None]:
print("Both models' output the same result?")
print("yes" if np.array_equal(xcore_output_data[0], tf_output_data[0]) else "no")