<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# 8.0 Deploy a Custom ASR Model
In this notebook you'll combine what you've learned to deploy our fine-tuned Nigerian English ASR model.

**[8.1 Riva ServiceMaker and Your `.riva` Custom Model](#8.1-Riva-ServiceMaker-and-Your-.riva-Custom-Model)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[8.1.1 Identify the File Paths Required](#8.1.1-Identify-the-File-Paths-Required)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[8.1.2 `riva-build`](#8.1.2-riva-build)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[8.1.3 `riva-deploy`](#8.1.3-riva-deploy)<br>
**[8.2 Start the Riva Server](#8.2-Start-the-Riva-Server)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[8.2.1 Exercise: Update `config.sh`](#8.2.1-Exercise:-Update-config.sh)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[8.2.2 Run `riva_start.sh`](#8.2.2-Run-riva_start.sh)<br>
**[8.3 Run Inference](#8.3-Run-Inference)<br>**
**[8.4 Run Inference with Word Boosting](#8.4-Run-Inference-with-Word-Boosting)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[8.4.1 Exercise: Word Boost the ASR Inference](#8.4.1-Exercise:-Word-Boost-the-ASR-Inference)<br>
**[8.5 Stop the Riva Server](#8.5-Stop-the-Riva-Server)<br>**

### Notebook Dependencies
The steps in this notebook assume that you have:

1. **NGC Credentials Installed**<br>Be sure you have added your NGC credential using the [NGC Setup notebook](003_Intro_NGC_Setup.ipynb)
1. **Riva Quick Start resources folder and ASR models have been downloaded from NGC**<br>This notebook is dependent on the ["Build and Deploy an ASR Pipeline with Riva" notebook](005_Build_and_Deploy_ASR_Pipeline.ipynb). Run the following cell to see if anything is missing.  You should have no output if all the models have been downloaded already.

In [1]:
import os
if not os.path.exists("riva_quickstart_v2.11.0"):
    print("Quick Start missing!! Go back and run the Build and Deploy ASR Pipeline notebook.")
if not os.path.exists("asr-models/default-models/speechtotext_en_us_conformer_vdeployable_v4.0"):
    print("Acoustic Model missing!! Go back and run the Build and Deploy ASR Pipeline notebook.")
if not os.path.exists("asr-models/default-models/inverse_normalization_en_us_vdeployable_v2.0"):
    print("Inverse Normalization Model missing!! Go back and run the Build and Deploy ASR Pipeline notebook.")
if not os.path.exists("asr-models/default-models/punctuationcapitalization_en_us_bert_base_vdeployable_v3.0"):
    print("Punctuation Model missing!! Go back and run the Build and Deploy ASR Pipeline notebook.")
if not os.path.exists("asr-models/default-models/rmir/p_and_c.rmir"):
    print("Punctuation RMIR missing!! Go back and run the Build and Deploy ASR Pipeline notebook.")

Punctuation RMIR missing!! Go back and run the Build and Deploy ASR Pipeline notebook.


---
# 8.1 Riva ServiceMaker and Your `.riva` Custom Model

In an earlier notebook, you learned to deploy an out-of-the-box `.binary` model in Riva using Riva ServiceMaker.  The process is the same for a custom model, but we begin a the `.riva` model fine-tuned in and exported from NeMo instead. As before, we'll use `riva-build` and `riva-deploy` to prepare the model.
<img src=images/riva/servicemaker.png width=1000>

In [2]:
# Set the Riva Quick Start directory
RIVA_DIR = "/dli/task/riva_quickstart_v2.11.0"

# make sure scripts are executable
!cd $RIVA_DIR && chmod +x *.sh

# make sure custom .riva model is present
CUSTOM_MODEL_DIR = '/dli/task/asr-models/custom-models'
TRAINED_MODEL_DIR = '/dli/task/asr-models/custom-models/trained_en-ng'
!cp -n $TRAINED_MODEL_DIR/Conformer-CTC-BPE-43-epochs.riva $CUSTOM_MODEL_DIR/

## 8.1.1 Identify the File Paths Required
As before, set up the file paths.  Note that the only difference between this setup and the one we had for the out-of-the-box example is the acoustic model.  Previously, we deployed `Conformer-CTC-L-en-US-ASR-set-4p0.riva`, but here deploy the fine-tuned model `Conformer-CTC-BPE-43-epochs.riva`.  The model linked here was fine-tuned for 43 epochs on the en-NG dataset.

In [3]:
import os

# ServiceMaker Docker container
RIVA_SM_CONTAINER = "nvcr.io/nvidia/riva/riva-speech:2.11.0-servicemaker"

# All model paths relative to Riva ServiceMaker Docker container include the _SM suffix

# Model base directory w.r.t. both the host and the ServiceMaker container
ASR_MODEL_DIR = os.path.abspath("asr-models")
ASR_MODEL_DIR_SM = "/servicemaker-dev" # Path where we mount the downloaded ASR models in the ServiceMaker container
CUSTOM_MODEL_DIR = os.path.join(ASR_MODEL_DIR, "custom-models")
DEFAULT_MODEL_DIR = os.path.join(ASR_MODEL_DIR, "default-models")

# Relative path to CUSTOMIZED Acoustic Model
AM_SM = os.path.join(ASR_MODEL_DIR_SM, "custom-models", "trained_en-ng", "Conformer-CTC-BPE-43-epochs.riva")

# Relative path to LM model artifacts
LM_DIR = os.path.join("default-models", "speechtotext_en_us_lm_vdeployable_v4.1")
DECODING_LM_BINARY_SM = os.path.join(ASR_MODEL_DIR_SM, LM_DIR, "riva_asr_train_datasets_3gram.binary")
DECODING_VOCAB_SM = os.path.join(ASR_MODEL_DIR_SM, LM_DIR, "flashlight_decoder_vocab.txt")

# Relative path to WSFT artifacts for inverse text normalization
ITN_DIR = os.path.join("default-models", "inverse_normalization_en_us_vdeployable_v2.0")
WFST_TOKENIZER_MODEL_SM  = os.path.join(ASR_MODEL_DIR_SM, ITN_DIR, "tokenize_and_classify.far")
WFST_VERBALIZER_MODEL_SM = os.path.join(ASR_MODEL_DIR_SM, ITN_DIR, "verbalize.far")

# Relative paths where the generated .rmir file will be stored
!mkdir -p $ASR_MODEL_DIR/custom-models/rmir
ASR_RMIR_DIR_SM = os.path.join(ASR_MODEL_DIR_SM, "custom-models", "rmir")
ASR_RMIR_SM = os.path.join(ASR_RMIR_DIR_SM, "en_ng_asr_lm_itn_offline.rmir")

# Key that model is encrypted with
KEY = "tlt_encode"

In [4]:
# Get the ServiceMaker Docker - this should take about 3 minutes if not previously pulled
! docker pull $RIVA_SM_CONTAINER

2.11.0-servicemaker: Pulling from nvidia/riva/riva-speech
Digest: sha256:7831bcd8deb4e18f6af937730833c93ee10e706add3b9da0572f56c94d292074
Status: Image is up to date for nvcr.io/nvidia/riva/riva-speech:2.11.0-servicemaker
nvcr.io/nvidia/riva/riva-speech:2.11.0-servicemaker


## 8.1.2 `riva-build`
We can build the RMIR files for the ASR in the same way we did for the out-of-the-box example. The command is identical except for the acoustic model and its exported RMIR name: `en_ng_asr_offline_binary_ngram_lm.rmir`.  _Note that for the course, this RMIR files have been preloaded for the sake of time_

In [5]:
%%time
# Syntax: 
# riva-build <task-name> \
#     output-dir-for-rmir/model.rmir:key \
#     dir-for-riva/acoustic_model.riva:key \
#     --decoding_language_model_binary=lm_model.binary
! docker run --rm --gpus 1 -v $ASR_MODEL_DIR:$ASR_MODEL_DIR_SM $RIVA_SM_CONTAINER -- \
    riva-build speech_recognition \
        $ASR_RMIR_SM:$KEY \
        $AM_SM:$KEY \
        --decoding_language_model_binary=$DECODING_LM_BINARY_SM \
        --decoding_vocab=$DECODING_VOCAB_SM \
        --wfst_tokenizer_model=$WFST_TOKENIZER_MODEL_SM \
        --wfst_verbalizer_model=$WFST_VERBALIZER_MODEL_SM \
        --name=conformer-ctc-en-NG-asr-lm-itn-offline \
        --featurizer.use_utterance_norm_params=False \
        --featurizer.precalc_norm_time_steps=0 \
        --featurizer.precalc_norm_params=False \
        --ms_per_timestep=40 \
        --endpointing.start_history=200 \
        --endpointing.residue_blanks_at_start=-2 \
        --nn.fp16_needs_obey_precision_pass \
        --chunk_size=4.8 \
        --left_padding_size=1.6 \
        --right_padding_size=1.6 \
        --max_batch_size=16 \
        --featurizer.max_batch_size=512 \
        --featurizer.max_execution_batch_size=512 \
        --decoder_type=flashlight \
        --flashlight_decoder.asr_model_delay=-1 \
        --flashlight_decoder.lm_weight=0.8 \
        --flashlight_decoder.word_insertion_score=1.0 \
        --flashlight_decoder.beam_size=32 \
        --flashlight_decoder.beam_threshold=20. \
        --flashlight_decoder.num_tokenization=1 \
        --language_code=en-US \
        --offline 


=== Riva Speech Skills ===

NVIDIA Release  (build 59018721)
Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://gi

Since American English (on which our baseline acoustic model was trained) and Nigerian English (on which we've fine-tuned the acoustic model) are the same language, we can use the same punctuation and capitalization model as before. Let's copy it from the default to the custom model directory.

In [6]:
# Copy p_and_c.rmir to the appropriate directory
!cp -n $DEFAULT_MODEL_DIR/rmir/p_and_c.rmir $CUSTOM_MODEL_DIR/rmir/p_and_c.rmir

cp: cannot stat '/dli/task/asr-models/default-models/rmir/p_and_c.rmir': No such file or directory


## 8.1.3 `riva-deploy`

The deployment tool takes as input one or more RMIR files and a target model repository directory. This is just the same as the out-of-the-box example, but the name of the `rmir` file for the acoustic model has changed.  Similarly, it takes some time to run, so the files have been preloaded for convenience.  You can run this step if you wish by changing the following cell from "raw" to "code" and executing, but it is not required since the files were preloaded for the course in $MODEL_LOC.

In [7]:
%%time
# The models have been preloaded for this course; Without the preloading, 
# This step takes about 15 minutes; you can fully run it to overwrite by adding the -f option

# Syntax: 
# riva-deploy -f \
#     dir-for-rmir/asr_model.rmir:key \
#     dir-for-rmir/p_and_cmodel.rmir:key \
#     output-dir-for-repository
! docker run --rm --gpus 1 -v $CUSTOM_MODEL_DIR:/data $RIVA_SM_CONTAINER -- \
    riva-deploy \
        /data/rmir/en_ng_asr_lm_itn_offline.rmir:$KEY \
        /data/rmir/p_and_c.rmir:$KEY \
        /data/models/


=== Riva Speech Skills ===

NVIDIA Release  (build 59018721)
Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://gi

In [8]:
# Check your work - the new model files should be in the MODEL_LOC/models directory now
!ls $CUSTOM_MODEL_DIR/models

conformer-ctc-en-NG-asr-lm-itn-offline
conformer-ctc-en-NG-asr-lm-itn-offline-ctc-decoder-cpu-streaming-offline
conformer-ctc-en-NG-asr-lm-itn-offline-endpointing-streaming-offline
conformer-ctc-en-NG-asr-lm-itn-offline-feature-extractor-streaming-offline
p_and_c_pipeline
riva-trt-conformer-ctc-en-NG-asr-lm-itn-offline-am-streaming-offline
riva-trt-p_and_c_pipeline-nn-bert-base-uncased


---
# 8.2 Start the Riva Server
After the model repository is generated, we are ready to start the Riva server. We've already downloaded the [Riva Quick Start](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/resources/riva_quickstart) resource from NGC. <br>
Set the path to the directory here:

In [9]:
# Set the Riva Quick Start directory
RIVA_DIR = "/dli/task/riva_quickstart_v2.11.0"

## 8.2.1 Exercise: Update `config.sh`
The `config.sh` file should be modified to run ASR services and point to the correct `MODEL_LOC` directory.  Previously, we already set this up, but the `MODEL_LOC` pointed to the out-of-the-box project folder (`/dli/task/model_files/OOTB_model_loc`.  Now it needs to point it to our custom project folder (`/dli/task/asr-models/custom-models`).  

Modify [config.sh](riva_quickstart_v2.11.0/config.sh) to point to the correct `MODEL_LOC` folder for our current project.  If you get stuck, check the [solution](solutions/ex8.2.1_config.sh).

Do a quick check below to make sure the file is correct.  

In [None]:
# quick fix!
! cp solutions/ex8.2.1_config.sh $RIVA_DIR/config.sh

In [10]:
# Check your work.  Compare with the solution.  Exact matches provide no output.
! diff solutions/ex8.2.1_config.sh $RIVA_DIR/config.sh

## 8.2.2 Run `riva_start.sh`

In [11]:
# Ensure you have permission to execute these scripts
! cd $RIVA_DIR && chmod +x *.sh

In [12]:
# Start the server.  This should take about 30 seconds.
! cd $RIVA_DIR && bash riva_start.sh config.sh

Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Riva server is ready...


---
# 8.3 Run Inference
After the Riva server is up and running with your models, you can send inference requests querying the server. We've already installed the Riva Python API bindings for the client.  Next we just need to connect to the Riva server and query it.


Once we import the necessary modules, we define and call a function to run inference on an audio file from the validation set of the Nigerian English Speech Dataset. 

In [13]:
import riva.client

def run_asr_inference(audio_file, server='localhost:50051', print_full_response=False):
    with open(audio_file, 'rb') as fh:
        data = fh.read()
    
    auth = riva.client.Auth(uri=server)
    client = riva.client.ASRService(auth)
    config = riva.client.RecognitionConfig(
        language_code="en-US",
        max_alternatives=1,
        enable_automatic_punctuation=True,
    )
    
    response = client.offline_recognize(data, config)
    if print_full_response: 
        print(response)
    else:
        print("ASR transcript:")
        print(response.results[0].alternatives[0].transcript)

Let's play the audio file before we run inference for comparison.

In [14]:
import io
import IPython.display as ipd

audio_file = "/dli/task/data/en_ng_female/ngf_13397_00016698686.wav"

# Load a sample audio file from local disk
# This example uses a .wav file with LINEAR_PCM encoding.
with io.open(audio_file, 'rb') as fh:
    content = fh.read()
ipd.Audio(audio_file)

In [15]:
# Run inference on the audio file
print("Expected Transcript:\nThe Noresmen considered the rainbow as a bridge over which the gods passed from earth to their home in the sky.")
run_asr_inference(audio_file)

Expected Transcript:
The Noresmen considered the rainbow as a bridge over which the gods passed from earth to their home in the sky.
ASR transcript:
The Norsemen considered the rainbow as aa bridge over which the gods passed from earth to their home in the sky. 


Very nice! Try some other audio files and see how they do.  You can find the expected transcripts in the [en_ng_male index](data/en_ng_male/line_index_male.tsv) and [en_ng_female_index](data/en_ng_female/line_index_female.tsv) `.tsv` files.

In [16]:
audio_file = "/dli/task/data/en_ng_male/ngm_14310_01582379950.wav"
with io.open(audio_file, 'rb') as fh:
    content = fh.read()
ipd.Audio(audio_file)

In [17]:
print("Expected Transcript:\nIt isn't snowing in Warsaw. It is minus eleven and cloudy.")
run_asr_inference(audio_file)

Expected Transcript:
It isn't snowing in Warsaw. It is minus eleven and cloudy.
ASR transcript:
Itt isn't snowing in Warsaw. it is minus eleven and cloudy. 


In [18]:
audio_file = "/dli/task/data/en_ng_female/ngf_02121_00850439910.wav"
with io.open(audio_file, 'rb') as fh:
    content = fh.read()
ipd.Audio(audio_file)

In [19]:
print("Expected Transcript:\nKossam can be the general term for both fresh milk miradam and yoghurt known as pendidan in Fulfulde.")
run_asr_inference(audio_file)

Expected Transcript:
Kossam can be the general term for both fresh milk miradam and yoghurt known as pendidan in Fulfulde.
ASR transcript:
Kasa can be the general term for both fresh milk mira dam and yoghurt, known as pendik in Fulfulde. 


Some special words did not come through correctly in the transcript.  We can improve the transcript with word boosting!

---
# 8.4 Run Inference with Word Boosting

Start by creating a boosting function as we did in the [word boosting notebook](006_Word_Boosting.ipynb).

In [20]:
def run_asr_inference_with_word_boosting(audio_file, boost_dict, server='localhost:50051', print_full_response=False):
    with open(audio_file, 'rb') as fh:
        data = fh.read()

    auth = riva.client.Auth(uri=server)
    client = riva.client.ASRService(auth)
    config = riva.client.RecognitionConfig(
        language_code="en-US",
        max_alternatives=1,
        enable_automatic_punctuation=True,
    )
    
    for word, score in boost_dict.items():
        riva.client.add_word_boosting_to_config(config, [word], score)
    
    response = client.offline_recognize(data, config)
    if print_full_response: 
        print(response)
    else:
        print("ASR transcript:")
        print(response.results[0].alternatives[0].transcript)

## 8.4.1 Exercise: Word Boost the ASR Inference

In the previous section, we saw that the transcript was not correct:

```text
Expected Transcript:
Kossam can be the general term for both fresh milk miradam and yoghurt known as pendidan in Fulfulde.
ASR transcript:
Kasa can be the general term for both fresh milk mira dam and yoghurt, known as pendik in Fulfulde. 
```

In this exercise, you'll use word boosting to correct the transcript.

In [21]:
# Play the audio file once more
audio_file = "/dli/task/data/en_ng_female/ngf_02121_00850439910.wav"
with io.open(audio_file, 'rb') as fh:
    content = fh.read()
ipd.Audio(audio_file)

Complete the #TODO section below.  
1. Create a `boost_dict` to positively boost the OOV (out-ov-vocabulary) words: kossam, mirada, pendidan
1. Call the `run_asr_inference_with_word_boosting` function to run the boosted inference

If you get stuck, check out the [solution](solutions/ex8.4.1.ipynb).

In [22]:
print("Expected Transcript:\nKossam can be the general term for both fresh milk miradam and yoghurt known as pendidan in Fulfulde.")

boost_dict = {'kossam': 20.0, 'miradam': 20.0, 'pendidan': 20.0}
run_asr_inference_with_word_boosting(audio_file, boost_dict)

Expected Transcript:
Kossam can be the general term for both fresh milk miradam and yoghurt known as pendidan in Fulfulde.
ASR transcript:
Kossam can be the general term for both fresh milk miradam and yoghurt, known as pendidan in Fulfulde. 


---
# 8.5 Stop the Riva Server

In [None]:
! cd $RIVA_DIR && ./riva_stop.sh

---
<h2 style="color:green;">Congratulations!</h2>

You did it!  You've completed the course!

In this course, you learned:
- How to deploy out-of-the-box ASR models with Riva
- How to customize a model with NeMo
- How to export the model to Riva format (`.riva`) with NeMo
- How to deploy a Riva model for inference
- How to add word boosting to improve transcription

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>