<a href="https://colab.research.google.com/github/vasudevgupta7/gsoc-wav2vec2/blob/deploy/notebooks/wav2vec2_onnx.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Wav2Vec2 ONNX

In this notebook, we will be exporting TF Wav2Vec2 model into ONNX and will compare ONNX exported and TF model latency on CPU.

In [1]:
!pip3 install -qU tf2onnx onnxruntime
!pip3 install -q git+https://github.com/vasudevgupta7/gsoc-wav2vec2@main

[K     |████████████████████████████████| 398 kB 5.5 MB/s 
[K     |████████████████████████████████| 4.5 MB 30.9 MB/s 
[K     |████████████████████████████████| 12.3 MB 166 kB/s 
[K     |████████████████████████████████| 1.6 MB 5.4 MB/s 
[K     |████████████████████████████████| 43 kB 2.1 MB/s 
[K     |████████████████████████████████| 50 kB 6.8 MB/s 
[K     |████████████████████████████████| 170 kB 67.0 MB/s 
[K     |████████████████████████████████| 133 kB 53.4 MB/s 
[K     |████████████████████████████████| 97 kB 6.1 MB/s 
[K     |████████████████████████████████| 63 kB 1.8 MB/s 
[?25h  Building wheel for wav2vec2 (setup.py) ... [?25l[?25hdone
  Building wheel for python-Levenshtein (setup.py) ... [?25l[?25hdone
  Building wheel for subprocess32 (setup.py) ... [?25l[?25hdone
  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


## Exporting TF model to ONNX

Exporting to ONNX is quite straight forward. One can use `tf2onnx.convert.from_keras(...)` method.

In [2]:
import tensorflow as tf
from wav2vec2 import Wav2Vec2ForCTC

model_id = "vasudevgupta/gsoc-wav2vec2-960h"
model = Wav2Vec2ForCTC.from_pretrained(model_id)

Downloading model weights from https://huggingface.co/vasudevgupta/gsoc-wav2vec2-960h ... Done
Total number of loaded variables: 213


In [3]:
AUDIO_MAXLEN = 50000
ONNX_PATH = "onnx-wav2vec2.onnx"

In [4]:
import tf2onnx

input_signature = (tf.TensorSpec((None, AUDIO_MAXLEN), tf.float32, name="speech"),)
_ = tf2onnx.convert.from_keras(model, input_signature=input_signature, output_path=ONNX_PATH)

Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`


In [5]:
ls

onnx-wav2vec2.onnx  [0m[01;34msample_data[0m/  [01;34mvasudevgupta[0m/


## Inference using ONNX exported model

For running inference with the onnx-exported model, we will first download some speech sample and then apply some pre-processing.

In [6]:
!wget https://github.com/vasudevgupta7/gsoc-wav2vec2/raw/main/data/sample.wav

--2021-08-13 01:14:54--  https://github.com/vasudevgupta7/gsoc-wav2vec2/raw/main/data/sample.wav
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/vasudevgupta7/gsoc-wav2vec2/main/data/sample.wav [following]
--2021-08-13 01:14:54--  https://raw.githubusercontent.com/vasudevgupta7/gsoc-wav2vec2/main/data/sample.wav
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 93638 (91K) [audio/wav]
Saving to: ‘sample.wav’


2021-08-13 01:14:54 (6.66 MB/s) - ‘sample.wav’ saved [93638/93638]



In [7]:
from wav2vec2 import Wav2Vec2Processor

processor = Wav2Vec2Processor(is_tokenizer=False)

Instance of `Wav2Vec2Processor(is_tokenizer=False)` is going to normalize the speech along the time axis. This preprocessing was applied during training also.

In [9]:
import soundfile as sf

FILENAME = "sample.wav"

speech, _ = sf.read(FILENAME)
speech = tf.constant(speech, dtype=tf.float32)
speech = processor(speech)[None]

padding = tf.zeros((speech.shape[0], AUDIO_MAXLEN - speech.shape[1]))
speech = tf.concat([speech, padding], axis=-1)
speech.shape

TensorShape([1, 50000])

Now we will initiate ONNX runtime session and use that session to make predictions.

In [10]:
import onnxruntime as rt
session = rt.InferenceSession(ONNX_PATH)

In [11]:
@tf.function(jit_compile=True)
def jitted_forward(speech):
    return model(speech)

In [12]:
import numpy as np

onnx_outputs = session.run(None, {"speech": speech.numpy()})[0]
tf_outputs = jitted_forward(speech)

assert np.allclose(onnx_outputs, tf_outputs.numpy(), atol=1e-2)

In [13]:
tokenizer = Wav2Vec2Processor(is_tokenizer=True)
prediction = np.argmax(onnx_outputs, axis=-1)
prediction = tokenizer.decode(prediction.squeeze().tolist())

Downloading `vocab.json` from https://github.com/vasudevgupta7/gsoc-wav2vec2/raw/main/data/vocab.json ... DONE


Instance of `Wav2Vec2Processor(is_tokenizer=True)` is used for decoding model outputs to string.

In [14]:
from IPython.display import Audio
print("prediction:", prediction)
Audio(filename=FILENAME)

prediction: SHE HAD YOUR DUCK SOUP AND GREASY WASHWATER ALL YEAR


## Comparing latency of TF model & ONNX exported model

Now we will be comparing latency for jitted model & ONNX exported model.

In [15]:
import time
from contextlib import contextmanager

@contextmanager
def timeit(prefix="Time taken:"):
  start = time.time()
  yield
  time_taken = time.time() - start
  print(prefix, time_taken, "seconds")

In [17]:
with timeit(prefix="JIT Compiled Wav2vec2 time taken:"):
  jitted_forward(speech)

with timeit(prefix="Eager mode time taken:"):
  model(speech)

with timeit(prefix="ONNX-Wav2Vec2 time taken:"):
  session.run(None, {"speech": speech.numpy()})

JIT Compiled Wav2vec2 time taken: 2.8493058681488037 seconds
Eager mode time taken: 1.2120401859283447 seconds
ONNX-Wav2Vec2 time taken: 0.8415746688842773 seconds
