<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# 7.0 ASR Fine-Tuning with NVIDIA NeMo
In this notebook, you'll fine-tune an US English (en-US) NVIDIA Riva ASR acoustic model model to Nigerian English (en-NG) using NVIDIA NeMo.

**[7.1 NeMo](#7.1-NeMo)<br>**
**[7.2 ASR Conformer-CTC Model](#7.2-ASR-Conformer-CTC-Model)<br>**
**[7.3 Set Relevant Paths and Install NeMo](#7.3-Set-Relevant-Paths-and-Install-NeMo)<br>**
**[7.4 Prepare the Dataset](#7.4-Prepare-the-Dataset)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[7.4.1 Download and Preprocess the Nigerian English Dataset](#7.4.1-Download-and-Preprocess-the-Nigerian-English-Dataset)<br>
**[7.5 ASR Fine-Tuning](#7.5-ASR-Fine-Tuning)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[7.5.1 Create a Tokenizer](#7.5.1-Create-a-Tokenizer)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[7.5.2 Fine-Tune Conformer-CTC](#7.5.2-Fine-Tune-Conformer-CTC)<br>
**[7.6 ASR Model Export](#7.6-ASR-Model-Export)<br>**

### Notebook Dependencies
The steps in this notebook assume that you have:

1. **NGC Credentials Installed**<br>Be sure you have added your NGC credential using the [NGC Setup notebook](003_NGC_Setup.ipynb)
1. **Riva Quick Start resources folder has been downloaded**<br>Execute the following cell to make sure you have this folder.

In [1]:
import os

# Set the path to the Riva Skills Quick Start resource folder
RIVA_DIR = "riva_quickstart_v2.11.0"

# Downloads the Riva Skills Quick Start resource folder (overwrite if necessary)
if os.path.exists(RIVA_DIR):
    print("Riva Riva Skills Quick Start resource folder already downloaded")
else:
    print("Downloading the Riva Skills Quick Start resource folder")
    !ngc registry resource download-version "nvidia/riva/riva_quickstart:2.11.0"
    # Make special modification required for our docker-in-docker course environment
    !sed -i '/--name riva-service-maker*/i \              --network host \\' $RIVA_DIR/riva_init.sh

Riva Riva Skills Quick Start resource folder already downloaded


---
# 7.1 NeMo

[NVIDIA NeMo](https://github.com/NVIDIA/NeMo) is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), text-to-speech synthesis (TTS), large language models (LLMs), and natural language processing (NLP). The primary objective of NeMo is to help researchers from industry and academia to reuse prior work (code and pretrained models) and make it easier to create new conversational AI models. 

Transfer learning extracts learned features from an existing neural network into a new one. Transfer learning is often used when creating a large training dataset is not feasible. One of NeMo's goals is to reduce that 80-hour workload to an 8-hour workload, which can enable data scientists to have considerably more train-test iterations in the same time frame.

Let's see this in action with a use case for the ASR acoustic model.

---
# 7.2 ASR Conformer-CTC Model

Automatic Speech Recognition (ASR) is often the first step in building a speech AI model. An ASR model converts audible speech into text. The main metric for these models is to reduce Word Error Rate (WER) while transcribing the text. Simply put, the goal is to take an audio file and transcribe it.

For our Nigerian English project, we'll start with the [Conformer-CTC model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-ctc), which is an end-to-end ASR model that takes in audio and produces text.

Conformer-CTC supports both character-level and sub-word-level encodings. It employs a combination of self-attention and convolution modules to achieve the best of the two approaches. The self-attention layers support both absolute and relative positional encodings and can learn the global interaction, while the convolutions efficiently capture the local correlations.

<img src=images/nemo/conformer_ctc.png width=600)>

---
# 7.3 Set Relevant Paths

In [2]:
# The following paths are set from the perspective of the host (this instance).
DATA_DIR = '/dli/task/data'
MODEL_DIR = '/dli/task/asr-models'
RESULTS_DIR = '/dli/task/results'
CUSTOM_MODEL_DIR = '/dli/task/asr-models/custom-models'

# Set the encryption key and use the same key for all commands.
KEY = 'tlt_encode'

In [3]:
# Create the new results directory mapped (data and model directories should already exist)
!mkdir -p /dli/task/results

This course container is based on the NGC NeMo container.  In addition to the `nemo` installation, the container includes useful NeMo resources in the `/workspace/nemo` folder, which is also linked in the JupyterLab file browser as `nemo`.

In [4]:
NEMO_DIR = "/workspace/nemo"
! ls -d $NEMO_DIR/*/

/workspace/nemo/examples/  /workspace/nemo/tests/
/workspace/nemo/scripts/   /workspace/nemo/tutorials/


---
# 7.4 Prepare the Dataset
We have a US-English acoustic model downloaded from NGC, which was trained on the [LibriSpeech ASR train-clean-100 Dataset](https://www.openslr.org/12/), which is quite large.  Now we want to fine-tune it to create a Nigerian English acoustic model. To do that, we need labeled examples of Nigerian English to train the model. For our project, we'll use the [Open SLR crowdsourced high-quality Nigerian English speech dataset](https://www.openslr.org/70/).  

To get an idea of what the LibriSpeech audio files sound like, play the following sample from the dataset.

In [5]:
# Play an audio file
import IPython.display as ipd
path = 'audio_samples/LibriSpeech/163-121908-0000.wav'
ipd.Audio(path)

## 7.4.1 Download and Preprocess the Nigerian English Dataset

The evaluation/fine-tuning data for the crowdsourced en-NG dataset is publicly available in several files [here](https://www.openslr.org/resources/70/).To save time, the dataset has already been downloaded.  Here are the steps used:
```bash
# Download the audio data
!wget 'https://www.openslr.org/resources/70/en_ng_female.zip' -P $HOST_DATA_DIR
!wget 'https://www.openslr.org/resources/70/en_ng_male.zip'   -P $HOST_DATA_DIR

# Extract the evaluation/finetuning data
!unzip -nq $HOST_DATA_DIR/en_ng_female.zip -d $HOST_DATA_DIR/en_ng_female
!mv $HOST_DATA_DIR/en_ng_female/line_index.tsv $HOST_DATA_DIR/en_ng_female/line_index_female.tsv
!unzip -nq $HOST_DATA_DIR/en_ng_male.zip -d $HOST_DATA_DIR/en_ng_male
!mv $HOST_DATA_DIR/en_ng_male/line_index.tsv $HOST_DATA_DIR/en_ng_male/line_index_male.tsv

# Remove the archive files no longer needed
!rm $HOST_DATA_DIR/en_ng_female.zip
!rm $HOST_DATA_DIR/en_ng_male.zip
```

Now that we have the data, we need to create a manifest for NeMo to parse through the data when fine-tuning. Execute the next cell to define a function to extract the relevant information from the `.tsv` metadata files included with this dataset.

In [6]:
import os
import subprocess

def process_en_ng_tsvs(data_dir):
    genders = ['female','male']
    entries = []
    # Extract the relevant information from the tsv files
    for gender in genders: 
        dataset  = f'en_ng_{gender}'
        tsv_name = f'line_index_{gender}.tsv'
        tsv_file = os.path.join(data_dir, dataset, tsv_name)
        with open(tsv_file, encoding='utf-8') as fin:
            for line in fin:
                label, text = line[: line.index("\t")], line[line.index("\t") + 1 :]
                speaker_id  = label.split('_')[1]
                host_wav_file = os.path.join(data_dir, dataset, label + '.wav')
                wav_file = os.path.join(data_dir, dataset, label + '.wav')
                transcript_text = text.lower().strip()

                # check duration
                duration = subprocess.check_output("soxi -D {0}".format(host_wav_file), shell=True)

                entry = {}
                entry['audio_filepath'] = wav_file
                entry['duration'] = float(duration)
                entry['text'] = transcript_text
                entry['gender'] = gender
                entry['speaker_id'] = speaker_id
                entries.append(entry)
    return entries

Next, define a function to generate `*manifest.json` metadata files from the `.tsv` metadata files included with this dataset.

In [7]:
import json
import random

def generate_en_ng_manifest(data_dir, random_seed=0, val_split=0.05, test_split=0.05):
    # Extract the relevant information from the tsv files
    entries = process_en_ng_tsvs(data_dir)
    # Generate the manifest files
    # Set the random seed for reproducibility
    random.seed(random_seed)
    random.shuffle(entries)
    num_val_entries  = int(val_split  * len(entries))
    num_test_entries = int(test_split * len(entries))
    ft_manifest_file   = os.path.join(data_dir, 'en_ng_ft_manifest.json')
    val_manifest_file  = os.path.join(data_dir, 'en_ng_val_manifest.json')
    test_manifest_file = os.path.join(data_dir, 'en_ng_test_manifest.json')
    with open(ft_manifest_file, 'w') as fout:
        for m in entries[:-(num_val_entries+num_test_entries)]:
            fout.write(json.dumps(m) + '\n')
    with open(val_manifest_file, 'w') as fout:
        for m in entries[-(num_val_entries+num_test_entries):-num_test_entries]:
            fout.write(json.dumps(m) + '\n')
    with open(test_manifest_file, 'w') as fout:
        for m in entries[-num_test_entries:]:
            fout.write(json.dumps(m) + '\n')

Generate the manifest files for the Nigerian English Speech dataset using the functions we just defined.

In [8]:
generate_en_ng_manifest(DATA_DIR)

In [9]:
# Check to see that the manifest files were created in the data directory
!ls -hl $DATA_DIR/*.json

-rw-r--r-- 1 root root 653K Mar 30 06:41 /dli/task/data/en_ng_ft_manifest.json
-rw-r--r-- 1 root root  37K Mar 30 06:41 /dli/task/data/en_ng_test_manifest.json
-rw-r--r-- 1 root root  37K Mar 30 06:41 /dli/task/data/en_ng_val_manifest.json


Take a look at a few lines from the training manifest to get an idea of the information there.  

In [10]:
!head $DATA_DIR/en_ng_ft_manifest.json

{"audio_filepath": "/dli/task/data/en_ng_female/ngf_04310_01061195987.wav", "duration": 7.594667, "text": "try saying, \\\"hey google, navigate home\\\", \\\"play some music\\\" or \\\"read my messages\\\"", "gender": "female", "speaker_id": "04310"}
{"audio_filepath": "/dli/task/data/en_ng_female/ngf_05223_01898706469.wav", "duration": 6.997333, "text": "many pedigreed and especially purebred cats are exhibited as show cats.", "gender": "female", "speaker_id": "05223"}
{"audio_filepath": "/dli/task/data/en_ng_female/ngf_05223_00650271211.wav", "duration": 4.522667, "text": "in a half mile take the exit", "gender": "female", "speaker_id": "05223"}
{"audio_filepath": "/dli/task/data/en_ng_male/ngm_07049_00526371791.wav", "duration": 2.986667, "text": "i think the dreidel is over here", "gender": "male", "speaker_id": "07049"}
{"audio_filepath": "/dli/task/data/en_ng_female/ngf_06136_00016668548.wav", "duration": 4.864, "text": "norse mythology was released in february 2017.", "gender": 

Let's listen to an audio file from the Nigerian English dataset.

In [11]:
# Play an audio file
import IPython.display as ipd
path = os.path.join(DATA_DIR, 'en_ng_male/ngm_02436_00539200207.wav')
ipd.Audio(path)

---
# 7.5 ASR Fine-Tuning

## 7.5.1 Create a Tokenizer
Tokenization is the process of splitting a string into a list of tokens. It could be by word, by subword, or character. 

Before we can do the actual model training, we need to create a tokenizer because [this ASR model](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_en_fastconformer_ctc_large) uses [SentencePiece](https://github.com/google/sentencepiece), which is a subword-based tokenization algorithm. Character-based models don't need tokenizer creation because only single characters are regarded as elements in the vocabulary. 

We can use NeMo's [process_asr_text_tokenizer.py](nemo/scripts/tokenizers/process_asr_text_tokenizer.py) script to create the tokenizer that generates the subword vocabulary for us for use in training. The size of the vocabulary (`vocab_size`) should be the same as the vocabulary size in the ASR model. 

In [12]:
# create the tokenizer
!python3 $NEMO_DIR/scripts/tokenizers/process_asr_text_tokenizer.py \
         --manifest=$DATA_DIR/en_ng_ft_manifest.json \
         --data_root=$DATA_DIR \
         --vocab_size=128 \
         --tokenizer=spe \
         --spe_type=unigram

[NeMo W 2025-03-30 06:43:26 optimizers:66] Could not import distributed_fused_adam optimizer from Apex
[NeMo I 2025-03-30 06:43:27 sentencepiece_tokenizer:315] Processing /dli/task/data/text_corpus/document.txt and store at /dli/task/data/tokenizer_spe_unigram_v128
sentencepiece_trainer.cc(177) LOG(INFO) Running command: --input=/dli/task/data/text_corpus/document.txt --model_prefix=/dli/task/data/tokenizer_spe_unigram_v128/tokenizer --vocab_size=128 --shuffle_input_sentence=true --hard_vocab_limit=false --model_type=unigram --character_coverage=1.0 --bos_id=-1 --eos_id=-1 --normalization_rule_name=nmt_nfkc_cf
sentencepiece_trainer.cc(77) LOG(INFO) Starts training with : 
trainer_spec {
  input: /dli/task/data/text_corpus/document.txt
  input_format: 
  model_prefix: /dli/task/data/tokenizer_spe_unigram_v128/tokenizer
  model_type: UNIGRAM
  vocab_size: 128
  self_test_sample_size: 0
  character_coverage: 1
  input_sentence_size: 0
  shuffle_input_sentence: 1
  seed_sentencepiece_size:

## 7.5.2 Fine-Tune Conformer-CTC
Empirical evidence suggests that at approximately 50 epochs are required for decent inference performance. Unfortunately, fine-tuning for 50 epochs takes approximately 4 hours in this instance's environment. For now, we'll train for only 1 epoch to demonstrate the process.  This will result in empty (and thus useless) audio transcriptions. However, in the final notebook of this course, you'll try out a model which was fine-tuned for a longer period. 

To set up the `finetune` subtask command, we need to provide paths for the specs, models, and results, as well as specify the number of GPUs and the KEY.  The remaining lines below override values that are in the YAML file.  Note that the YAML file did not specify the manifest file paths, so we must provide them here.  

In [13]:
%%time
# Training one epoch takes about 8 minutes in this environment

# To fully train the model from scratch, you'll need to increase trainer.max_epochs from 1.
# Empirical evidence suggests that around 45 epochs should suffice.
# To restrict NeMo to a particular GPU, place square brackets around the number passed into trainer.devices
! python3 $NEMO_DIR/examples/asr/asr_ctc/speech_to_text_ctc_bpe.py \
    --config-path=../conf/conformer/ --config-name=conformer_ctc_bpe \
    +init_from_pretrained_model=stt_en_conformer_ctc_large \
    model.train_ds.manifest_filepath=$DATA_DIR/en_ng_ft_manifest.json \
    model.validation_ds.manifest_filepath=$DATA_DIR/en_ng_val_manifest.json \
    model.tokenizer.dir=$DATA_DIR/tokenizer_spe_unigram_v128 \
    model.train_ds.batch_size=4 \
    model.validation_ds.batch_size=4 \
    trainer.devices=1 \
    trainer.max_epochs=1 \
    model.optim.name="adamw" \
    model.optim.lr=1.0 \
    model.optim.weight_decay=0.001 \
    model.optim.sched.warmup_steps=2000 \
    ++exp_manager.exp_dir=$RESULTS_DIR \
    ++exp_manager.version=en_ng \
    ++exp_manager.use_datetime_version=False

[NeMo W 2025-03-30 06:44:28 optimizers:66] Could not import distributed_fused_adam optimizer from Apex
[NeMo W 2025-03-30 06:44:31 experimental:27] Module <class 'nemo.collections.asr.modules.audio_modules.SpectrogramToMultichannelFeatures'> is experimental, not ready for production and is not fully supported. Use at your own risk.
    See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
      ret = run_job(
    
[NeMo I 2025-03-30 06:44:32 speech_to_text_ctc_bpe:78] Hydra config: name: Conformer-CTC-BPE
    model:
      sample_rate: 16000
      log_prediction: true
      ctc_reduction: mean_batch
      skip_nan_grad: false
      train_ds:
        manifest_filepath: /dli/task/data/en_ng_ft_manifest.json
        sample_rate: ${model.sample_rate}
        batch_size: 4
        shuffle: true
        num_workers: 8
        pin_memory: true
        use_start_end_token: false
        trim_silence: false
        max_duration: 16.7
        min_dur

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



[NeMo I 2025-03-30 06:44:42 mixins:170] Tokenizer SentencePieceTokenizer initialized with 128 tokens
[NeMo W 2025-03-30 06:44:42 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath:
    - - /data2/nemo_asr/nemo_asr_set_3.0/bucket1/tarred_audio_manifest.json
    - - /data2/nemo_asr/nemo_asr_set_3.0/bucket2/tarred_audio_manifest.json
    - - /data2/nemo_asr/nemo_asr_set_3.0/bucket3/tarred_audio_manifest.json
    - - /data2/nemo_asr/nemo_asr_set_3.0/bucket4/tarred_audio_manifest.json
    - - /data2/nemo_asr/nemo_asr_set_3.0/bucket5/tarred_audio_manifest.json
    - - /data2/nemo_asr/nemo_asr_set_3.0/bucket6/tarred_audio_manifest.json
    - - /data2/nemo_asr/nemo_asr_set_3.0/bucket7/tarred_audio_manifest.json
    - - /data2/nemo_asr/nemo_asr_set_3.0/bucket8/tarred_audio_manifest.json
    sample_rate: 16000
    batch_size:

Take a look at the model files you've created.  There should now be checkpoint files (`.ckpt`) from the epoch results as well as a final `.nemo` file.

In [14]:
# List the checkpoints and .nemo model
! ls $RESULTS_DIR/Conformer-CTC-BPE/en_ng/checkpoints

'Conformer-CTC-BPE--val_wer=0.3927-epoch=0.ckpt'        Conformer-CTC-BPE.nemo
'Conformer-CTC-BPE--val_wer=0.3927-epoch=1-last.ckpt'


If you wish to create a `.nemo` file from a checkpoint, add the following code snippet to a code cell, substitute `<CHECKPOINT_NAME>` appropriately, and run the cell to convert a checkpoint (`.ckpt`) file to a `.nemo` model.

```python
from nemo.collections.asr.models import EncDecCTCModelBPE
conformer_checkpoint = os.path.join(RESULTS_DIR, 'Conformer-CTC-BPE/en_ng/checkpoints/<CHECKPOINT_NAME>.ckpt')
conformer = EncDecCTCModelBPE.load_from_checkpoint(conformer_checkpoint)
conformer = conformer.eval().cuda()
conformer.save_to(os.path.join(RESULTS_DIR, 'Conformer-CTC-BPE/en_ng/checkpoints/Conformer-CTC-BPE.nemo'))
```

---
# 7.6 ASR Model Export

We now want to convert a `.nemo` file to the `.riva` format, so that we can deploy it to the Riva server.  To do this, we need to install the `nemo2riva` utility, which is included in the Riva Quick Start resources folder. Install it by running the next cell.

In [15]:
!cd $RIVA_DIR && pip install nemo2riva*.whl

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing ./nemo2riva-2.11.0-py3-none-any.whl
Collecting pyarmor<8 (from nemo2riva==2.11.0)
  Downloading pyarmor-7.7.4-py2.py3-none-any.whl (2.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m112.5 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-eff<=0.6.2,>=0.5.3 (from nemo2riva==2.11.0)
  Downloading nvidia_eff-0.5.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m63.3 MB/s[0m eta [36m0:00:00[0mta [36m0:00:01[0m
[?25hCollecting onnxruntime-gpu>=1.13.1 (from nemo2riva==2.11.0)
  Downloading onnxruntime_gpu-1.19.2-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (226.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.2/226.2 MB[0m [31m247.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting pyinstaller (from nvidia-eff<=0.6.2,>

#### Convert to Riva
Convert the downloaded model to the `.riva` format. We will set the encryption key with `--key=tlt_encode`. Choose a different encryption key value when generating `.riva` models for production.

At this point, you can export the model you just built.  In this case, the file path would be:

```python
nemo_file_path = os.path.join(RESULTS_DIR, 'Conformer-CTC-BPE/en_ng/checkpoints/Conformer-CTC-BPE.nemo')
```

However, since it was only fine-tuned with a single epoch, it may not be the best model.  For this course, some example trained models and checkpoints have been provided at the following location:

In [16]:
!ls asr-models/custom-models/trained_en-ng/

'Conformer-CTC-BPE--val_wer=0.1215-epoch=43.ckpt'
 Conformer-CTC-BPE-43-epochs.nemo
 Conformer-CTC-BPE-43-epochs.riva
 en_ng_asr_lm_itn_offline.rmir


In [17]:
# Export the .nemo file to .riva format
nemo_file_path = os.path.join(CUSTOM_MODEL_DIR, 'trained_en-ng', 
    'Conformer-CTC-BPE-43-epochs.nemo')
nemo_path_list = nemo_file_path.split('/')
nemo_file_name = nemo_path_list[-1]
riva_file_name = nemo_file_name[:-5] + ".riva"
riva_file_path = os.path.join(CUSTOM_MODEL_DIR, riva_file_name)

!nemo2riva --out {riva_file_path} --key=tlt_encode {nemo_file_path}

[NeMo W 2025-03-30 06:55:49 optimizers:66] Could not import distributed_fused_adam optimizer from Apex
[NeMo W 2025-03-30 06:55:52 experimental:27] Module <class 'nemo.collections.tts.models.fastpitch_ssl.FastPitchModel_SSL'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2025-03-30 06:55:52 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2025-03-30 06:55:52 experimental:27] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2025-03-30 06:55:52 experimental:27] Module <class 'nemo.collections.asr.modules.audio_modules.SpectrogramToMultichannelFeatures'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2025-03-30

In [18]:
# Check your work - your fine-tuned acoustic model should exist here
! ls $CUSTOM_MODEL_DIR/*.riva

/dli/task/asr-models/custom-models/Conformer-CTC-BPE-43-epochs.riva


---
<h2 style="color:green;">Congratulations!</h2>

You've learned how to:
- Fine-tune an ASR acoustic model with NeMo
- Export models from NeMo to Riva

Next, let's put it all together and [deploy our custom model with Riva!](008_Deploy_Custom_ASR_Pipeline.ipynb). 

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>