[![Open In Colab](https://colab.research.google.com/github/toby-htx/ONNX-Sharing-Session/blob/main/Demo2_HardwareAccess.ipynb)

# **Demo 2: Hardware Access (ONNX Runtime)**

**Tensorflow -> ONNX**

In this demo, we are going to convert a model written in the Tensorflow framework to ONNX format and run it with ONNX Runtime. Specifically, we will convert a Tensorflow BiLSTM model that takes in Google's Word2Vec embeddings as input. This model had been used for the Fine Grained Sentiment Analysis workstream.

You will need to change the **Runtime** to have a **GPU hardware accelerator**, then select '**Run All**'.


##Secton 1: Tensorflow Model##


1) We need to **import the Word2Vec embeddings**. This will take around **20 minutes** as it is huge.

In [None]:
import gensim.downloader as api

w2v = api.load("word2vec-google-news-300") 



2) Import the dataset and preprocess it.

In [1]:
!git clone https://github.com/toby-htx/Onnx-Sharing-Session.git

Cloning into 'Onnx-Sharing-Session'...
remote: Enumerating objects: 8, done.[K
remote: Counting objects: 100% (8/8), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 8 (delta 0), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (8/8), done.
/bin/bash: line 0: cd: Data: No such file or directory


In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, precision_recall_curve
from sklearn import preprocessing

import re
import nltk

import tensorflow as tf
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential, Model
from keras.callbacks import EarlyStopping
from keras.layers import Dense, Bidirectional, LayerNormalization, LSTM, Dropout, BatchNormalization, Input, GlobalMaxPool1D, GlobalAveragePooling1D, concatenate
from keras.layers.embeddings import Embedding

def process_text(document):
     
    # Remove extra white space from text
    document = re.sub(r'\s+', ' ', document, flags=re.I)
         
    # Remove all the special characters from text
    document = re.sub(r'\W', ' ', str(document))
 
    return document

df = pd.read_csv('./Onnx-Sharing-Session/Data/Isear(Fear&Joy).csv')
df = df[['Emotion','Statement']]
df['preprocessedStatement'] = df.Statement.apply(process_text)
display(df.head())

le = preprocessing.LabelEncoder()
# Encode labels in column 'Emotion'. 
df['Emotion'] = le.fit_transform(df['Emotion']) 
y = df.pop('Emotion')
y_new = tf.keras.utils.to_categorical(y, num_classes=2)

Unnamed: 0,Emotion,Statement,preprocessedStatement
0,fear,When I was left alone at home one night by my ...,When I was left alone at home one night by my ...
1,fear,When I was a child I was afraid of big dogs. O...,When I was a child I was afraid of big dogs O...
2,fear,When I forgot the lines of the play during an ...,When I forgot the lines of the play during an ...
3,joy,The day I learnt that I had been admitted to t...,The day I learnt that I had been admitted to t...
4,joy,When I was a student at the Institute doing my...,When I was a student at the Institute doing my...


3) Like in the PyTorch model, we have to build the **Vocab** and **embedding matrix**, but in the **Tensorflow** way.

In [None]:
max_length = df.preprocessedStatement.apply(lambda x: len(x.split())).max()

t = Tokenizer()
t.fit_on_texts(df['preprocessedStatement'] )
vocab_size = len(t.word_index) + 1
encoded_text = t.texts_to_sequences(df['preprocessedStatement'] )
X = pad_sequences(encoded_text, maxlen=max_length, padding='post')

In [None]:
embedding_matrix = np.zeros((vocab_size, 300))
for word, i in t.word_index.items():
    try:
      embedding_vector = w2v[word]
    except KeyError:
      pass
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector

4) Split the data into training, validation, and test sets.


In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y_new, test_size=0.05, stratify=y)

x_val = x_train[:100]
y_val = y_train[:100]
x_train = x_train[100:]
y_train = y_train[100:]

5) Prepare the BiLSTM model architecture. 
The model architecture was inspired by that used in *Z. Hameed and B. Garcia-Zapirain, "Sentiment classification using a single-layered BiLSTM model", IEEE Access, vol. 8, pp. 73992-74001, 2020.*


In [None]:
input_layer = Input(shape=(max_length), )
x = Embedding(vocab_size, 300, weights=[embedding_matrix], trainable=False)(input_layer)
x = Bidirectional(LSTM(32, return_sequences=True))(x)
x_a = GlobalMaxPool1D()(x)
x_b = GlobalAveragePooling1D()(x)
x = concatenate([x_a,x_b])
x = Dense(64, activation="relu")(x)
x = Dense(2, activation='softmax')(x)
model_w2v_tf = Model(inputs=input_layer, outputs=x)
model_w2v_tf.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_w2v_tf.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 122)]        0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 122, 300)     1337400     input_1[0][0]                    
__________________________________________________________________________________________________
bidirectional (Bidirectional)   (None, 122, 64)      85248       embedding[0][0]                  
__________________________________________________________________________________________________
global_max_pooling1d (GlobalMax (None, 64)           0           bidirectional[0][0]              
______________________________________________________________________________________________

6) Train the model. Note that model performance will not be ideal as we are trying to train it as fast as we can, hence the number of epochs is only 1.

In [None]:
model_w2v_tf.fit(x_train, y_train, epochs = 1, validation_data=(x_val, y_val))



<keras.callbacks.History at 0x7fbfc053d550>

7) See how long it takes for the Tensorflow model to infer as well as its performance.

In [None]:
import time

start_time = time.time()

y_pred = model_w2v_tf.predict(x_test)

print("Time taken by TensorFlow model: ", time.time() - start_time)

Time taken by TensorFlow model:  0.8981485366821289


In [None]:
y_pred_clean = np.argmax(y_pred, 1)
y_test_clean = np.argmax(y_test, 1)

In [None]:
print(classification_report(y_test_clean, y_pred_clean))

              precision    recall  f1-score   support

           0       0.95      0.78      0.86        54
           1       0.82      0.96      0.88        55

    accuracy                           0.87       109
   macro avg       0.88      0.87      0.87       109
weighted avg       0.88      0.87      0.87       109



In [None]:
model_w2v_tf.save('model_w2v_tf')



INFO:tensorflow:Assets written to: model_w2v_tf/assets


INFO:tensorflow:Assets written to: model_w2v_tf/assets


8) Download the required packages to use **ONNX** and **ONNX Runtime**. Note that specific Tensorflow versions are needed. This is a disadvantage of using ONNX: **you need to make sure the versions between ONNX and the DL frameworks are compatible**.


In [None]:
!pip install tensorflow==2.5.0

Collecting tensorflow==2.5.0
  Downloading tensorflow-2.5.0-cp37-cp37m-manylinux2010_x86_64.whl (454.3 MB)
[K     |████████████████████████████████| 454.3 MB 16 kB/s 
[?25hCollecting tensorflow-estimator<2.6.0,>=2.5.0rc0
  Downloading tensorflow_estimator-2.5.0-py2.py3-none-any.whl (462 kB)
[K     |████████████████████████████████| 462 kB 53.6 MB/s 
Collecting keras-nightly~=2.5.0.dev
  Downloading keras_nightly-2.5.0.dev2021032900-py2.py3-none-any.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 49.5 MB/s 
Collecting grpcio~=1.34.0
  Downloading grpcio-1.34.1-cp37-cp37m-manylinux2014_x86_64.whl (4.0 MB)
[K     |████████████████████████████████| 4.0 MB 37.6 MB/s 
Installing collected packages: grpcio, tensorflow-estimator, keras-nightly, tensorflow
  Attempting uninstall: grpcio
    Found existing installation: grpcio 1.41.0
    Uninstalling grpcio-1.41.0:
      Successfully uninstalled grpcio-1.41.0
  Attempting uninstall: tensorflow-estimator
    Found existing inst

In [None]:
!pip install -U tf2onnx onnxruntime

Collecting tf2onnx
  Downloading tf2onnx-1.9.2-py3-none-any.whl (430 kB)
[K     |████████████████████████████████| 430 kB 5.2 MB/s 
[?25hCollecting onnxruntime
  Downloading onnxruntime-1.9.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB)
[K     |████████████████████████████████| 4.8 MB 40.4 MB/s 
Collecting onnx>=1.4.1
  Downloading onnx-1.10.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (12.3 MB)
[K     |████████████████████████████████| 12.3 MB 35.3 MB/s 
Installing collected packages: onnx, tf2onnx, onnxruntime
Successfully installed onnx-1.10.1 onnxruntime-1.9.0 tf2onnx-1.9.2


9) Convert the **Tensorflow** model into an **ONNX** model

In [None]:
!python -m tf2onnx.convert --saved-model 'model_w2v_tf' --opset 9 --output model.onnx 

2021-10-22 13:22:11.300393: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-22 13:22:13.891276: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-10-22 13:22:13.902155: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-22 13:22:13.902993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:00:04.0 name: Tesla K80 computeCapability: 3.7
coreClock: 0.8235GHz coreCount: 13 deviceMemorySize: 11.17GiB deviceMemoryBandwidth: 223.96GiB/s
2021-10-22 13:22:13.903037: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-22 13:22:13.927375: I tensorflow/stream_executor/platform/defa

##Secton 2: ONNX Model##


10) Load the ONNX Model and compare its **inference speed** and **performance** with the Tensorflow model's. You should see that the **ONNX model is faster** while having the **same performance as the Tensorflow model**. This is because ONNX Runtime automatically optimises the use of the GPU in Google Colab to run your ONNX model, whereas the Tensorflow model is not yet transformed and optimised for inference on GPU.  

In [None]:
import onnx
onnx_model = onnx.load("model.onnx")

In [None]:
import onnxruntime as rt
import numpy as np
import time

model = ('model.onnx')
start_time = time.time()
session = rt.InferenceSession(model)
input_name = session.get_inputs()[0].name
label_name = session.get_outputs()[0].name
onnx_predictions = session.run([label_name], {input_name: x_test.astype(np.float32)})[0]
print("Time taken by ONNX model: ", time.time() - start_time)

Time taken by ONNX model:  0.18747282028198242


In [None]:
onnx_pred_clean = np.argmax(onnx_predictions, 1)
y_test_clean = np.argmax(y_test, 1)

In [None]:
print(classification_report(y_test_clean, onnx_pred_clean))

              precision    recall  f1-score   support

           0       0.95      0.78      0.86        54
           1       0.82      0.96      0.88        55

    accuracy                           0.87       109
   macro avg       0.88      0.87      0.87       109
weighted avg       0.88      0.87      0.87       109



#Extra: Convert the ONNX model to a PyTorch model##


In [None]:
!pip install onnx2pytorch

Collecting onnx2pytorch
  Downloading onnx2pytorch-0.4.0-py3-none-any.whl (44 kB)
[?25l[K     |███████▍                        | 10 kB 29.6 MB/s eta 0:00:01[K     |██████████████▊                 | 20 kB 9.2 MB/s eta 0:00:01[K     |██████████████████████▏         | 30 kB 8.0 MB/s eta 0:00:01[K     |█████████████████████████████▌  | 40 kB 7.5 MB/s eta 0:00:01[K     |████████████████████████████████| 44 kB 1.6 MB/s 
Installing collected packages: onnx2pytorch
Successfully installed onnx2pytorch-0.4.0


In [None]:
import onnx
from onnx2pytorch import ConvertModel

onnx_model = onnx.load('model.onnx')
pytorch_model = ConvertModel(onnx_model)

  torch.from_numpy(numpy_helper.to_array(tensor)),


NotImplementedError: ignored