<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# 5.0 Build and Deploy an ASR Pipeline with Riva
In this notebook, you'll build and deploy an ASR pipeline on NVIDIA Riva with freely available components, which have been pretrained with NVIDIA NeMo.  These components include an acoustic model, an n-gram language model, a punctuation and capitalization model, and an inverse text normalization model.


<img src=images/riva/ASR_pipeline.PNG width=1000>

**[5.1 Riva ServiceMaker](#5.1-Riva-ServiceMaker)<br>**
**[5.2 Download the ASR Pipeline Models from NGC](#5.2-Download-the-ASR-Pipeline-Models-from-NGC)<br>**
**[5.3 `riva-build`](#5.3-riva-build)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[5.3.1 Identify the File Paths Required](#5.3.1-Identify-the-File-Paths-Required)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.3.2 `docker run` Syntax](#5.3.2-docker-run-Syntax)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.3.3 `riva-build speech_recognition` Syntax](#5.3.3-riva-build-speech_recognition-Syntax)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.3.4 Exercise: Build a Punctuation and Capitalization RMIR](#5.3.4-Exercise:-Build-a-Punctuation-and-Capitalization-RMIR)<br>
**[5.4 `riva-deploy`](#5.4-riva-deploy)<br>**
**[5.5 Start the Riva Server ](#5.5-Start-the-Riva-Server)**<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.5.1 Exercise: Configure the `config.sh` File](#5.5.1-Exercise:-Configure-the-config.sh-File)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.5.2 Start the Server](#5.5.2-Start-the-Server)<br>
**[5.6 Run Inference](#5.6-Run-Inference)**<br>
**[5.7 Stop the Riva Server](#5.7-Stop-the-Riva-Server)**<br>

### Notebook Dependencies
The steps in this notebook assume that you have:

1. **NGC Credentials Installed**<br>Be sure you have added your NGC credential using the [NGC Setup notebook](003_NGC_Setup.ipynb)
1. **Riva Quick Start resources folder has been downloaded**<br>Execute the following cell to make sure you have this folder.

In [1]:
import os

# Set the path to the Riva Skills Quick Start resource folder
RIVA_DIR = "riva_quickstart_v2.11.0"

# Downloads the Riva Skills Quick Start resource folder (overwrite if necessary)
if os.path.exists(RIVA_DIR):
    print("Riva Riva Skills Quick Start resource folder already downloaded")
else:
    print("Downloading the Riva Skills Quick Start resource folder")
    !ngc registry resource download-version "nvidia/riva/riva_quickstart:2.11.0"
    # Make special modification required for our docker-in-docker course environment
    !sed -i '/--name riva-service-maker*/i \              --network host \\' $RIVA_DIR/riva_init.sh

Riva Riva Skills Quick Start resource folder already downloaded


---
# 5.1 Riva ServiceMaker

Riva ServiceMaker is a Docker container image that includes a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Riva deployment to a target environment. It has two main components: `riva-build` and `riva-deploy`.

The `riva_init.sh` script we used in the Quick Start notebook used the ServiceMaker `riva-deploy` tool behind the scenes to deploy prebuilt `.rmir` models for us. In this notebook, we'll use the Riva ServiceMaker container directly to build models and deploy to a target environment using the `riva-build` and `riva-deploy` calls.  

With these tools, we have the option of customizing our ASR pipeline with a variety of models as needed.  We trade the abstraction of simply calling `riva_init.sh` for flexibility going forward.

<img src=images/riva/servicemaker.png width=1000>

--
# 5.2 Download the ASR Pipeline Models from NGC

Models for the ASR pipeline can be downloaded from NGC.  To see a list of models available, search the NGC catalog for the [Riva Speech Skills collection](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/collections/riva-speech/entities), or search for individual models. 

The next several cells use the NGC command line utility to download various models with the API key you've loaded. First, though, let's define the directories into which we want to download our models.

In [2]:
import os

# Overarching model diretory
MODEL_LOC = "/dli/task/asr-models"
# Directory for the components of the prebuilt, OOTB models
DEFAULT_MODEL_LOC = os.path.join(MODEL_LOC, "default-models")

#### Download Conformer-CTC
Download the deployable `Conformer-CTC-L-en-US-ASR-set-4p0.riva` ASR model, which we'll deploy for our out-of-the-box example with Riva.

In [3]:
AM_DIR = "speechtotext_en_us_conformer_vdeployable_v4.0"
AM_PATH = os.path.join(DEFAULT_MODEL_LOC, AM_DIR)

if os.path.exists(AM_PATH):
    print("Deployable Acoustic Model exists, skipping download")
else:
    print("Downloading the deployable Acoustic Model")
    !ngc registry model \
       download-version "nvidia/riva/speechtotext_en_us_conformer:deployable_v4.0" \
       --dest $DEFAULT_MODEL_LOC

Downloading the deployable Acoustic Model
{
    "download_end": "2025-03-30 05:49:30",
    "download_start": "2025-03-30 05:49:29",
    "download_time": "0s",
    "files_downloaded": 1,
    "local_path": "/dli/task/asr-models/default-models/speechtotext_en_us_conformer_vdeployable_v4.0",
    "size_downloaded": "110 B",
    "status": "COMPLETED"
}


#### Download Inverse Text Normalization Files
Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output. See [this paper](https://arxiv.org/pdf/2104.05055.pdf) for detailed information.

In [4]:
ITN_DIR = "inverse_normalization_en_us_vdeployable_v2.0"
ITN_PATH = os.path.join(DEFAULT_MODEL_LOC, ITN_DIR)

if os.path.exists(ITN_PATH):
    print("ITN Model exists, skipping download")
else:
    print("Downloading the ITN Model")
    !ngc registry model \
       download-version "nvidia/riva/inverse_normalization_en_us:deployable_v2.0" \
       --dest $DEFAULT_MODEL_LOC

Downloading the ITN Model
{
    "download_end": "2025-03-30 05:49:52",
    "download_start": "2025-03-30 05:49:51",
    "download_time": "0s",
    "files_downloaded": 2,
    "local_path": "/dli/task/asr-models/default-models/inverse_normalization_en_us_vdeployable_v2.0",
    "size_downloaded": "220 B",
    "status": "COMPLETED"
}


#### Download Punctuation and Capitalization Model
Adding this punctuation and capitalization model will improve the readability of the transcripts.

In [5]:
PC_DIR = "punctuationcapitalization_en_us_bert_base_vdeployable_v3.0"
PC_PATH = os.path.join(DEFAULT_MODEL_LOC, PC_DIR)

if os.path.exists(PC_PATH):
    print("Punctuation and Capitalization Model exists, skipping download")
else:
    print("Downloading the Punctuation and Capitalization Model")
    !ngc registry model \
        download-version "nvidia/riva/punctuationcapitalization_en_us_bert_base:deployable_v3.0" \
        --dest $DEFAULT_MODEL_LOC

Downloading the Punctuation and Capitalization Model
{
    "download_end": "2025-03-30 05:50:16",
    "download_start": "2025-03-30 05:50:16",
    "download_time": "0s",
    "files_downloaded": 1,
    "local_path": "/dli/task/asr-models/default-models/punctuationcapitalization_en_us_bert_base_vdeployable_v3.0",
    "size_downloaded": "110 B",
    "status": "COMPLETED"
}


#### Download Language Model Files
The language model files we need are preloaded in this course to save time, so you don't need to pull them. For reference, we downloaded them as follows: 
```bash
LM_DIR = "speechtotext_en_us_lm_vdeployable_v4.1"
LM_PATH = os.path.join(DEFAULT_MODEL_LOC, LM_DIR)

if os.path.exists(LM_PATH):
    print("Language Model exists, skipping download")
else:
    print("Downloading the Language Model")
    !ngc registry model download-version "nvidia/riva/speechtotext_en_us_lm:deployable_v4.1" --dest $DEFAULT_MODEL_LOC
```

In [6]:
# Check the downloads.
!ls -g $DEFAULT_MODEL_LOC

total 20
drwx------ 2 root 4096 Mar 30 05:49 inverse_normalization_en_us_vdeployable_v2.0
drwxr-xr-x 9 1102 4096 Mar 30 05:35 models
drwx------ 2 root 4096 Mar 30 05:50 punctuationcapitalization_en_us_bert_base_vdeployable_v3.0
drwx------ 2 root 4096 Mar 30 05:49 speechtotext_en_us_conformer_vdeployable_v4.0
drwx------ 2 root 4096 Mar 30 05:37 speechtotext_en_us_lm_vdeployable_v4.1


The list should include the following model directories needed for our pipeline:

- **speechtotext_en_us_conformer_vdeployable_v4.0** (accoustic model)
- **speechtotext_en_us_lm_vdeployable_v4.1** (language model)
- **punctuationcapitalization_en_us_bert_base_vdeployable_v3.0** (punctuation and capitalization model)
- **inverse_normalization_en_us_vdeployable_v2.0** (inverse text normalization model)

---
# 5.3 `riva-build`

The `riva-build` step helps build a Riva-ready version of the model. Its only output is an intermediate format (called an RMIR) of an end-to-end pipeline for the supported services within Riva. RMIR stands for  for **R**iva **M**odel **I**ntermediate **R**epresentation.

`riva-build` is responsible for combining one or more exported models (`.riva` files) into a single file containing an intermediate RMIR format  (`.rmir`). This file contains a deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the final deployment and inference. 

For more information on `riva-build`, refer to the [Riva Build documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/model-overview.html#riva-build).  For ASR, we'll use the `riva-build speech_recognition` task.

## 5.3.1 Identify the File Paths Required
For our ASR n-gram language model project, we need to aggregate the following elements: 
   - An _acoustic_ model file in the `.riva` format
   - A _language_ binary model file in the `.binary` format<br>
   - A decoder vocabulary file 
   - Weighted Finite State Transducer (WFST) tokenizer and verbalizer files for Inverse Text Normalization (ITN). For more information on WFST and ITN, refer to the [NeMo Inverse Text Normalization: From Development to Production](https://arxiv.org/pdf/2104.05055.pdf) paper.  
   - A punctuation and capitalization (P&C) model. This isn't strictly necessary, but it improves the readability of ASR transcripts. 
   
Start by setting up the paths to the files previously downloaded for the project.  Then pull the ServiceMaker Docker container. 

In [7]:
import os

# ServiceMaker Docker container
RIVA_SM_CONTAINER = "nvcr.io/nvidia/riva/riva-speech:2.11.0-servicemaker"

# All model paths relative to Riva ServiceMaker Docker container include the _SM suffix

# Model base directory w.r.t. both the host and the ServiceMaker container
ASR_MODEL_DIR = os.path.abspath("asr-models/default-models")
ASR_MODEL_DIR_SM = "/servicemaker-dev" # Path where we mount the downloaded ASR models in the ServiceMaker container

# Relative path to Acoustic Model
AM_DIR = "speechtotext_en_us_conformer_vdeployable_v4.0"
AM_SM  = os.path.join(ASR_MODEL_DIR_SM, AM_DIR, "Conformer-CTC-L-en-US-ASR-set-4p0.riva")

# Relative path to LM model artifacts
LM_DIR = "speechtotext_en_us_lm_vdeployable_v4.1"
DECODING_LM_BINARY_SM = os.path.join(ASR_MODEL_DIR_SM, LM_DIR, "riva_asr_train_datasets_3gram.binary")
DECODING_VOCAB_SM = os.path.join(ASR_MODEL_DIR_SM, LM_DIR, "flashlight_decoder_vocab.txt")

# Relative path to WSFT artifacts for inverse text normalization
ITN_DIR = "inverse_normalization_en_us_vdeployable_v2.0"
WFST_TOKENIZER_MODEL_SM  = os.path.join(ASR_MODEL_DIR_SM, ITN_DIR, "tokenize_and_classify.far")
WFST_VERBALIZER_MODEL_SM = os.path.join(ASR_MODEL_DIR_SM, ITN_DIR, "verbalize.far")
SPEECH_HINTS_MODEL_SM = os.path.join(ASR_MODEL_DIR_SM, ITN_DIR, "speech_class.far")

# Relative path to Punctuation and Capitalization Model
PC_DIR = "punctuationcapitalization_en_us_bert_base_vdeployable_v3.0"
PC_SM  = os.path.join(ASR_MODEL_DIR_SM, PC_DIR, "bert-base_PnC_en-US_3.0.riva")

# Relative paths where the generated .rmir files will be stored
!mkdir -p $ASR_MODEL_DIR/rmir
ASR_RMIR_DIR_SM = os.path.join(ASR_MODEL_DIR_SM, "rmir")
ASR_RMIR_SM = os.path.join(ASR_RMIR_DIR_SM, "asr_lm_itn_offline.rmir")
PC_RMIR_SM = os.path.join(ASR_RMIR_DIR_SM, "p_and_c.rmir")

# Key that model is encrypted with
KEY = "tlt_encode"

In [8]:
# Get the ServiceMaker Docker container (should already have been pulled in the Quick Start example)
! docker pull $RIVA_SM_CONTAINER

2.11.0-servicemaker: Pulling from nvidia/riva/riva-speech
Digest: sha256:7831bcd8deb4e18f6af937730833c93ee10e706add3b9da0572f56c94d292074
Status: Image is up to date for nvcr.io/nvidia/riva/riva-speech:2.11.0-servicemaker
nvcr.io/nvidia/riva/riva-speech:2.11.0-servicemaker


## 5.3.2 `docker run` Syntax

We use a [docker run](https://docs.docker.com/engine/reference/commandline/run/) command to run the ServiceMaker container with the basic syntax:
```text
    docker run [OPTIONS] IMAGE [COMMAND] [ARG...]       
```

which becomes for us:

```text
    docker run --rm --gpus 1 \
        -v $ASR_MODEL_DIR:/servicemaker-dev \
        $RIVA_SM_CONTAINER -- \
        riva-build speech_recognition \       
```

Here's a breakdown of the command and options we are using:<br>
- **docker run** - command to run the container
- **--rm** - tells docker to clean up after the container runs
- **--gpus 1** - specifies number of GPUs (just one in this case)
- **-v \$MODEL_LOC:/servicemaker-dev** - shared volume; we are mapping our `ASR_MODEL_DIR` on the host to `/servicemaker-dev` inside the ServiceMaker container.
- **\$RIVA_SM_CONTAINER** - the container image we just pulled from NGC
- **riva-build speech_recognition** - This command and all its arguments are run inside the ServiceMaker container

## 5.3.3 `riva-build speech_recognition` Syntax
We can get some help on the `riva-build speech_recognition` syntax from the [Pipeline Configuration documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-pipeline-configuration.html?highlight=pipeline%20configuration).  For our use case, the basic syntax is:

```text
    riva-build speech_recognition \
        output-dir-for-rmir/model.rmir:key \
        dir-for-riva/acoustic_model.riva:key \
        --decoding_language_model_binary=lm_model.binary          
```
    
which becomes:

```text
    riva-build speech_recognition \
        $ASR_RMIR_SM:$KEY \
        $AM_SM:$KEY \
        --decoding_language_model_binary=$DECODING_LM_BINARY_SM          
```    

Note that the location of the model in the `--decoding_language_model_binary=` argument is relative to the container location, not the host.  Since we've mapped `$ASR_MODEL_DIR` on the host to `/servicemaker-dev` in the container, and defined `DECODING_LM_BINARY_SM` with respect to the container, specifying `$DECODING_LM_BINARY_SM` will ultimately map to `/dli/task/asr-models/speechtotext_en_us_lm_vdeployable_v4.1/riva_asr_train_datasets_3gram.binary`.  

There are a lot of arguments available to the `riva-build speech_recognition` command.  A comprehensive list can be found in the [Riva-build Optional Parameters documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-pipeline-configuration.html?highlight=pipeline%20configuration#riva-build-optional-parameters). 

`riva-build` supports language models in the `.riva` and `.arpa` formats as well as `.binary`. If your language model is in the `.riva` format, define `DECODING_LM_RIVA_SM` analogously to `DECODING_LM_BINARY_SM` and replace `--decoding_language_model_binary=$DECODING_LM_BINARY_SM` with `$DECODING_LM_RIVA_SM:$KEY` . If your language model is in the `.arpa` format, define `DECODING_LM_ARPA_SM` analogously to `DECODING_LM_BINARY_SM` and replace `--decoding_language_model_binary=$DECODING_LM_BINARY_SM` with `--decoding_language_model_arpa=$DECODING_LM_ARPA_SM`.

Refer to the [Riva ASR Pipeline Configuration documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-pipeline-configuration.html) if you want to build an ASR pipeline for a supported language other than US English. To obtain the proper `riva-build` parameters for your particular application, select the acoustic model (the parameters below assume Conformer-CTC), language, and pipeline type (offline for the purposes of this tutorial) from the interactive web menu at the bottom of the first section of the page.

Execute the following cell to build an `.rmir` file from a `.binary`-formatted n-gram language model file.

SyntaxError: invalid syntax (2920761605.py, line 1)

In [9]:
# Syntax: 
# riva-build <task-name> \
#     output-dir-for-rmir/model.rmir:key \
#     dir-for-riva/acoustic_model.riva:key \
#     --decoding_language_model_binary=lm_model.binary
! docker run --rm --gpus 1 -v $ASR_MODEL_DIR:$ASR_MODEL_DIR_SM $RIVA_SM_CONTAINER -- \
    riva-build speech_recognition \
        $ASR_RMIR_SM:$KEY \
        $AM_SM:$KEY \
        --decoding_language_model_binary=$DECODING_LM_BINARY_SM \
        --decoding_vocab=$DECODING_VOCAB_SM \
        --wfst_tokenizer_model=$WFST_TOKENIZER_MODEL_SM \
        --wfst_verbalizer_model=$WFST_VERBALIZER_MODEL_SM \
        --name=conformer-ctc-en-US-asr-lm-itn-offline \
        --featurizer.use_utterance_norm_params=False \
        --featurizer.precalc_norm_time_steps=0 \
        --featurizer.precalc_norm_params=False \
        --ms_per_timestep=40 \
        --endpointing.start_history=200 \
        --endpointing.residue_blanks_at_start=-2 \
        --nn.fp16_needs_obey_precision_pass \
        --chunk_size=4.8 \
        --left_padding_size=1.6 \
        --right_padding_size=1.6 \
        --max_batch_size=16 \
        --featurizer.max_batch_size=512 \
        --featurizer.max_execution_batch_size=512 \
        --decoder_type=flashlight \
        --flashlight_decoder.asr_model_delay=-1 \
        --flashlight_decoder.lm_weight=0.8 \
        --flashlight_decoder.word_insertion_score=1.0 \
        --flashlight_decoder.beam_size=32 \
        --flashlight_decoder.beam_threshold=20. \
        --flashlight_decoder.num_tokenization=1 \
        --language_code=en-US \
        --offline 


=== Riva Speech Skills ===

NVIDIA Release  (build 59018721)
Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://gi

In [10]:
# Check your work - the new language model RMIR file should be in the ASR_MODEL_DIR/rmir directory now
!ls $ASR_MODEL_DIR/rmir/asr_lm_itn_offline.rmir

ls: cannot access '/dli/task/asr-models/default-models/rmir/asr_lm_itn_offline.rmir': No such file or directory


## 5.3.4 Exercise: Build a Punctuation and Capitalization RMIR

Punctuation and capitalization RMIR models are built using `riva-build punctuation`.  The `docker run` portion is the same, and we use the same ServiceMaker container image. Here's the basic syntax for the `riva-build punctuation` command portion: 

```text
riva-build punctuation \
    output-dir-for-rmir/punctuation_model.rmir:key \
    dir-for-riva/punctuation_model.riva:key \
    --language_code=<language, 2 letters>-<country, 2 letters>
    --name=p_and_c_pipeline
```

which becomes

```text
riva-build punctuation \
    $PC_RMIR_SM:$KEY \
    $PC_SM:$KEY \
    --language_code=en-US \
    --name=p_and_c_pipeline
```

The syntax is similar, but instead of requiring language and acoustic models, we need the punctuation model as input and need to specify the language code.

For this exercise, put the `docker run` and `riva-build` commands together to build the punctuation model.  If you get stuck, you can check the [solution](solutions/ex5.3.4.ipynb).

In [None]:
# TODO: Use docker run and riva-build to create a punctuation RMIR model

In [None]:
# quick fix
import os
if not os.path.exists("asr-models/default-models/rmir/p_and_c.rmir"):
    ! docker run --rm --gpus 1 -v $ASR_MODEL_DIR:$ASR_MODEL_DIR_SM $RIVA_SM_CONTAINER -- \
        riva-build punctuation \
            $PC_RMIR_SM:$KEY \
            $PC_SM:$KEY \
            --language_code=en-US \
            --name=p_and_c_pipeline

In [None]:
# Check your work - the new p&c RMIR file should be in the MODEL_LOC directory now
!ls $ASR_MODEL_DIR/rmir/p_and_c.rmir

ls: cannot access '/dli/task/asr-models/default-models/rmir/p_and_c.rmir': No such file or directory


---
# 5.4 `riva-deploy`

The deployment tool takes as input one or more RMIR files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and writes all those assets to the output model repository directory.  For our project, we are using both the `asr_offline_binary_ngram_lm.rmir` and `p_and_c.rmir` files as input.  Our output directory is mapped to `$MODEL_LOC/models` on the host system. For more details, see the [Using riva-deploy](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/model-overview.html#using-riva-deploy-and-riva-speech-container-advanced) documentation.

_Note: The files we need have been preloaded for the course to save time, because this step would otherwise take about 30 minutes.  The "-f" option has been removed to avoid overwriting the preloaded models._

In [None]:
# Syntax: 
# riva-deploy -f \
#     dir-for-rmir/asr_model.rmir:key \
#     dir-for-rmir/p_and_cmodel.rmir:key \
#     output-dir-for-repository
! docker run --rm --gpus 1 -v $ASR_MODEL_DIR:/data $RIVA_SM_CONTAINER -- \
    riva-deploy \
        /data/rmir/asr_lm_itn_offline.rmir:$KEY \
        /data/rmir/p_and_c.rmir:$KEY \
        /data/models/


=== Riva Speech Skills ===

NVIDIA Release  (build 59018721)
Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://gi

In [None]:
# Check your work - the new model files should be in the ASR_MODEL_DIR/models directory now
!ls $ASR_MODEL_DIR/models

---
# 5.5 Start the Riva Server
After the model repository is generated, we are ready to start the Riva server. We've already downloaded the [Riva Quick Start](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/resources/riva_quickstart) resource from NGC. <br>
Set the path to the directory here:

In [None]:
# Set the Riva Quick Start directory
RIVA_DIR = "/dli/task/riva_quickstart_v2.11.0"

The `riva_quickstart` folder includes shell scripts to start the Riva server as well as the Riva Python API bindings for the client.  When running the `riva_init.sh` and `riva_start.sh` scripts the `config.sh` file is used as an argument to encapsulate the settings. To learn more about this workflow, check the [Deploy Process documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/model-overview.html#deploy-process).

For our project, we will not run `riva_init.sh` because we've already used `riva-deploy`. We can move on directly to the `riva_start.sh` script.  

## 5.5.1 Exercise: Configure the `config.sh` File
Open the [config.sh](riva_quickstart_v2.11.0/config.sh) file to edit it.  You'll need to make two general modifications.

- Modify it so that we are only enabling the ASR service (not NLP or TTS).  
- Modify the path to the model repository (`riva_model_loc`) generated in the previous step among other configurations.  For example, if the model repository is generated at `$ASR_MODEL_DIR/models`, then you can specify `riva_model_loc` as the same directory as `ASR_MODEL_DIR`.  Use the literal value, which for this environment is `"/dli/task/asr-models/default-models"`.

Use the following snippet as a guide, then check your work before you attempt to start the server.  You can also take a look at the [solution](solutions/ex5.5.1_config.sh) if you get stuck.

#### config.sh snippet
```bash
# Enable or Disable Riva Services
service_enabled_asr=true 
service_enabled_nlp=true # MAKE CHANGES HERE - SET TO FALSE
service_enabled_tts=true # MAKE CHANGES HERE - SET TO FALSE
service_enabled_nmt=true # MAKE CHANGES HERE - SET TO FALSE

...

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified.
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
#
# Custom models produced by NeMo or TLT and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="riva-model-repo"  ## MAKE CHANGES HERE - REPLACE WITH 

if [[ $riva_target_gpu_family == "tegra" ]]; then
    riva_model_loc="`pwd`/model_repository"
fi

# The default RMIRs are downloaded from NGC by default in the above $riva_rmir_loc directory
# If you'd like to skip the download from NGC and use the existing RMIRs in the $riva_rmir_loc
# then set the below $use_existing_rmirs flag to true. You can also deploy your set of custom
# RMIRs by keeping them in the riva_rmir_loc dir and use this quickstart script with the
# below flag to deploy them all together.
use_existing_rmirs=false          ## MAKE CHANGES HERE - SET TO TRUE                  
```

In [None]:
# quick fix!
! cp solutions/ex5.5.1_config.sh $RIVA_DIR/config.sh

In [None]:
# Check your work.  Compare with the solution.  Exact matches provide no output.
! diff solutions/ex5.5.1_config.sh $RIVA_DIR/config.sh

## 5.5.2 Start the Server

In [None]:
# Ensure you have permission to execute these scripts
! cd $RIVA_DIR && chmod +x *.sh

In [None]:
# Start the server.  This should take about 30 seconds.
! cd $RIVA_DIR && ./riva_start.sh config.sh

---
# 5.6 Run Inference
After the Riva server is up and running with our models, we can send inference requests querying the server using the Riva Python API bindings we used in the Quick Start example. 

The following cell queries the Riva server (using gRPC) to yield a result.

In [None]:
import riva.client

def run_inference(audio_file, server = "localhost:50051"):
    wf = wave.open(audio_file, 'rb')
    with open(audio_file, 'rb') as fh:
        data = fh.read()

    channel = grpc.insecure_channel(server)
    client = rasr_srv.RivaSpeechRecognitionStub(channel)
    config = rasr.RecognitionConfig(
        encoding=ra.AudioEncoding.LINEAR_PCM,
        sample_rate_hertz=wf.getframerate(),
        language_code="en-US",
        max_alternatives=1,
        enable_automatic_punctuation=True,
        audio_channel_count=1
    )

    request = rasr.RecognizeRequest(config=config, audio=data)

    response = client.Recognize(request)
    print(response.results[0].alternatives[0].transcript)

### Connect to the Riva Server and Run Inference

First, define a helper function for obtaining an audio file's encoding.

In [None]:
def get_encoding(audio_file):
    file_extension = audio_file.split('.')[-1]
    if file_extension == 'wav':
        encoding = riva.client.AudioEncoding.LINEAR_PCM
    elif file_extension == 'flac':
        encoding = riva.client.AudioEncoding.FLAC
    elif file_extension == 'alaw':
        encoding = riva.client.AudioEncoding.ALAW
    elif file_extension == 'mulaw':
        encoding = riva.client.AudioEncoding.MULAW
    else:
        raise Exception(f'Audio format ".{file_extension}" not supported.')
    return encoding 

Calling this inference function queries the Riva server (using gRPC) to transcribe an audio file. 

In [None]:
def run_inference(audio_file, server='localhost:50051', print_full_response=False):
    with open(audio_file, 'rb') as fh:
        data = fh.read()

    auth = riva.client.Auth(uri=server)
    client = riva.client.ASRService(auth)
    config = riva.client.RecognitionConfig(
        encoding=get_encoding(audio_file),
        language_code="en-US",
        max_alternatives=1,
        enable_automatic_punctuation=True,
    )
    riva.client.add_audio_file_specs_to_config(config, audio_file)

    response = client.offline_recognize(data, config)
    if print_full_response: 
        print(response)
    else:
        print("ASR transcript:")
        print(response.results[0].alternatives[0].transcript)

Let's play the audio file on which we will run inference on.

In [None]:
import io
import IPython.display as ipd

audio_file = "audio_samples/test.wav"
# Load a sample audio file from local disk
# This example uses a .wav file with LINEAR_PCM encoding.
with io.open(audio_file, 'rb') as fh:
    content = fh.read()
ipd.Audio(audio_file)

Run inference and compare the transcript.

In [None]:
run_inference(audio_file)

As you can see and hear, our ASR pipeline, constructed from pretrained, OOTB components, transcribed the sample audio file perfectly. However, the acoustic model was trained on US English. In the later notebook, we'll explore how to fine-tune the acoustic model to better transcribe audio from Nigerian English speakers. 

---
# 5.7 Stop the Riva Server
Before moving on, shut down the Riva server to stop the containers.

In [None]:
! cd $RIVA_DIR && ./riva_stop.sh

---
<h2 style="color:green;">Congratulations!</h2>

You've learned how to:
- Download models from NGC
- Build a Riva pipeline using Riva ServiceMaker to take a pretrained `.riva` file (exported from NeMo) and convert it to `.rmir` file
- Deploy the model locally on the Riva server
- Send inference requests from a demo client using Riva API bindings

Related tutorials:
- [How do I use Riva ASR APIs with out-of-the-box models?](https://github.com/nvidia-riva/tutorials/blob/main/asr-basics.ipynb)<br>
- [How To Train, Evaluate, and Fine-Tune an n-gram Language Model with NVIDIA NeMo](https://github.com/nvidia-riva/tutorials/blob/main/asr-python-advanced-nemo-ngram-training-and-finetuning.ipynb)<br>

Next, let's explore how to improve inference at runtime in [the Word Boosting notebook](006_Word_Boosting.ipynb). 

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>