<a href="https://colab.research.google.com/github/xychong/edgeaimonitoring/blob/main/Post_Training_Quantization_II.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this PTQ, we are using float input and output tensors. Hence, the compiler leaves a quantized operation at the beginning of the graph and dequantize the output at the end. 

Both quantization and dequantization runs on the CPU instead of the Edge TPU. Some amount of latency is added due to data format conversion.

Ref: https://coral.ai/docs/edgetpu/models-intro/#compiling

In [None]:
! pip install tensorflow==2.6.0
! pip install keras==2.6.0

Collecting tensorflow==2.6.0
  Downloading tensorflow-2.6.0-cp37-cp37m-manylinux2010_x86_64.whl (458.3 MB)
[K     |████████████████████████████████| 458.3 MB 10 kB/s 
Collecting clang~=5.0
  Downloading clang-5.0.tar.gz (30 kB)
Collecting wrapt~=1.12.1
  Downloading wrapt-1.12.1.tar.gz (27 kB)
Collecting flatbuffers~=1.12.0
  Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
Collecting typing-extensions~=3.7.4
  Downloading typing_extensions-3.7.4.3-py3-none-any.whl (22 kB)
Building wheels for collected packages: clang, wrapt
  Building wheel for clang (setup.py) ... [?25l[?25hdone
  Created wheel for clang: filename=clang-5.0-py3-none-any.whl size=30692 sha256=9aead50a4d5fb3204b23c02ec5d38a818f53c9d4a38381d744cd188f489acb6a
  Stored in directory: /root/.cache/pip/wheels/98/91/04/971b4c587cf47ae952b108949b46926f426c02832d120a082a
  Building wheel for wrapt (setup.py) ... [?25l[?25hdone
  Created wheel for wrapt: filename=wrapt-1.12.1-cp37-cp37m-linux_x86_64.whl size=68724 

In [None]:
import tensorflow as tf
#assert float(tf.__version__[:3]) >= 2.3
import os
import numpy as np
import matplotlib.pyplot as plt
import tempfile
from tensorflow.keras.metrics import Accuracy
from google.colab import files

In [None]:
print(tf.__version__)

2.6.0


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Load Data and Model

In [None]:
train_data = np.load("/content/drive/MyDrive/FYP Data/Train and Test/train_data.npy", allow_pickle = True)
test_data = np.load("/content/drive/MyDrive/FYP Data/Train and Test/test_data.npy", allow_pickle = True)
train_label = np.load("/content/drive/MyDrive/FYP Data/Train and Test/train_label.npy", allow_pickle = True)
test_label = np.load("/content/drive/MyDrive/FYP Data/Train and Test/test_label.npy", allow_pickle = True)

In [None]:
#model = tf.keras.models.load_model('/content/drive/MyDrive/FYP/mobilenetv2.h5')
#model = tf.keras.models.load_model('/content/drive/MyDrive/FYP/mobilenetv2_NEW.h5')
#model = tf.keras.models.load_model('/content/drive/MyDrive/FYP/mobilenetv2_NEW1.h5')
#model = tf.keras.models.load_model('/content/drive/MyDrive/FYP/mobilenetv2_NEW2.h5')
model = tf.keras.models.load_model('/content/drive/MyDrive/FYP/mobilenetv2_NEW3.h5')
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
Conv1 (Conv2D)                  (None, 112, 112, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization)   (None, 112, 112, 32) 128         Conv1[0][0]                      
__________________________________________________________________________________________________
Conv1_relu (ReLU)               (None, 112, 112, 32) 0           bn_Conv1[0][0]                   
______________________________________________________________________________________________

In [None]:
print('Number of trainable weights = {}'.format(len(model.trainable_weights)))

Number of trainable weights = 4


### Conversion to TFLite

In [None]:
test_data.dtype

dtype('float32')

In [None]:
# Generate a representative dataset with 100 samples
# Quantize variable data (e.g. model input/output and intermediates between layers)
# Allows quantization process to measure dynamic range of activations and inputs
# To find an accurate 8-bit representation of each weight and activation value
def representative_data_gen():
  for i in range(100):
    data_list = test_data[5*i] # step size of 5 to ensure we get 20 samples from each sound class
    rep_data = np.expand_dims(data_list, axis=0)
    yield [rep_data]

In [None]:
# Create an un-quantized tflite model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
# Enable quantization of fixed parameters (e.g. weights)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Set the representative dataset for quantization
converter.representative_dataset = representative_data_gen
# If any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# For full integer quantization, though supported types defaults to int8 only, we explicitly declare it for clarity.
converter.target_spec.supported_types = [tf.int8]
# Set the input and output tensors to uint8 (added in r2.3)
#converter.inference_input_type = tf.uint8
#converter.inference_output_type = tf.uint8
quant_tflite_model = converter.convert()

INFO:tensorflow:Assets written to: /tmp/tmp52z2e257/assets




In [None]:
# 'wb': file is opened for writing in binary mode
with open('mobilenet_v2_sound_classification_float_ptq.tflite', 'wb') as f:
  f.write(quant_tflite_model)

### Compare Accuracy

In [None]:
# Obtain the predicted labels and ground truth labels of validation set
logits = model(test_data)
prediction = np.argmax(logits, axis=1)
truth = test_label

# Computes the frequency with which prediction matches ground truth
acc = Accuracy()
acc.update_state(truth, prediction)

print("Raw model accuracy: {:.5%}".format(acc.result().numpy()))

Raw model accuracy: 94.40000%


In [None]:
# Define following functions to obtain predictions from tflite model

def set_input_tensor(interpreter, input):
  input_details = interpreter.get_input_details()[0] # for one input data
  input = np.expand_dims(input, axis=0).astype(input_details["dtype"])
  tensor_index = input_details['index'] # tensor index in the interpreter
  input_tensor = interpreter.tensor(tensor_index)()[0]
  input_tensor[:, :] = input

def classify_audio(interpreter, input):
  set_input_tensor(interpreter, input)
  interpreter.invoke() # invoke the interpreter; must set input size, allocate tensor and fill values before  calling this
  output_details = interpreter.get_output_details()[0] # for one output data
  output = interpreter.get_tensor(output_details['index']) # obtains output tensor in numpy array
  top_1 = np.argmax(output) # obtain most probable output
  return top_1

In [None]:
interpreter = tf.lite.Interpreter('mobilenet_v2_sound_classification_float_ptq.tflite')
interpreter.allocate_tensors() # tflite pre-plans tensor allocations to optimize inference

In [None]:
# Obtain tflite prediction outputs

tflite_pred = []
ground_truth = test_label

for i in range(len(test_data)):
  prediction = classify_audio(interpreter, test_data[i])
  tflite_pred.append(prediction)

print(tflite_pred)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 0, 3, 4, 3, 2, 3, 3, 3, 3, 0, 3, 3, 4, 3, 3, 3, 3, 3, 0, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 

In [None]:
# Computes the frequency with which prediction matches ground truth
tflite_acc = Accuracy()
tflite_acc(ground_truth, tflite_pred)

<tf.Tensor: shape=(), dtype=float32, numpy=0.912>

In [None]:
print("Quant TF Lite accuracy: {:.5%}".format(tflite_acc.result()))

Quant TF Lite accuracy: 91.20000%


We observe only a slight decrease in accuracy between the un-quantized  TF model and quantized TFLite model. This is because we are still using float input and output tensors. 

### Compare size of float TFLite model and quantized TFLite model

Size of quantized model is slightly smaller (2.5990829467773438 Mb) as compared to when we use uint8 input and output tensors (2.5994796752929688 Mb) because we have 2 operations (quantize and dequantize) that cannot be mapped to Edge TPU.

In [None]:
# Create float TFLite model.
float_converter = tf.lite.TFLiteConverter.from_keras_model(model)
float_tflite_model = float_converter.convert()

# Measure sizes of models.
_, float_file = tempfile.mkstemp('.tflite')
_, quant_file = tempfile.mkstemp('.tflite')

with open(quant_file, 'wb') as f:
  f.write(quant_tflite_model)

with open(float_file, 'wb') as f:
  f.write(float_tflite_model)

print("Float model in Mb:", os.path.getsize(float_file) / float(2**20))
print("Quantized model in Mb:", os.path.getsize(quant_file) / float(2**20))

INFO:tensorflow:Assets written to: /tmp/tmplzr3fvex/assets


INFO:tensorflow:Assets written to: /tmp/tmplzr3fvex/assets


Float model in Mb: 8.527118682861328
Quantized model in Mb: 2.5990829467773438


### Compiling Model for Edge TPU

In [None]:
! curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

! echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list

! sudo apt-get update

! sudo apt-get install edgetpu-compiler	

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  2537  100  2537    0     0  44508      0 --:--:-- --:--:-- --:--:-- 44508
OK
deb https://packages.cloud.google.com/apt coral-edgetpu-stable main
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Get:4 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease [15.9 kB]
Hit:5 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:6 https://packages.cloud.google.com/apt coral-edgetpu-stable InRelease [6,722 B]
Get:7 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ Packages [73.9 kB]
Get:8 http://archive.ubuntu.com/ubuntu bionic-

In [None]:
# Compile model
! edgetpu_compiler mobilenet_v2_sound_classification_float_ptq.tflite

Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.

Model compiled successfully in 1212 ms.

Input model: mobilenet_v2_sound_classification_float_ptq.tflite
Input size: 2.60MiB
Output model: mobilenet_v2_sound_classification_float_ptq_edgetpu.tflite
Output size: 2.78MiB
On-chip memory used for caching model parameters: 2.71MiB
On-chip memory remaining for caching model parameters: 4.98MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 72
Operation log: mobilenet_v2_sound_classification_float_ptq_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 70
Number of operations tha

In [None]:
! edgetpu_compiler -s mobilenet_v2_sound_classification_float_ptq.tflite

Edge TPU Compiler version 16.0.384591198
Started a compilation timeout timer of 180 seconds.

Model compiled successfully in 1173 ms.

Input model: mobilenet_v2_sound_classification_float_ptq.tflite
Input size: 2.60MiB
Output model: mobilenet_v2_sound_classification_float_ptq_edgetpu.tflite
Output size: 2.78MiB
On-chip memory used for caching model parameters: 2.71MiB
On-chip memory remaining for caching model parameters: 4.98MiB
Off-chip memory used for streaming uncached model parameters: 0.00B
Number of Edge TPU subgraphs: 1
Total number of operations: 72
Operation log: mobilenet_v2_sound_classification_float_ptq_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 70
Number of operations tha

### Download model compiled for Edge TPU

In [None]:
files.download('mobilenet_v2_sound_classification_float_ptq_edgetpu.tflite')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>