# Deploy an OOTB ASR Pipeline with Riva
In this notebook, I'll deploy an out-of-the-box (OOTB) ASR pipeline on NVIDIA Riva using the [Riva Quick Start scripts](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#local-deployment-using-quick-start-scripts). 

---
# 4.2 Riva Quick Start

The [Riva Skills Quick Start](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/riva/resources/riva_quickstart) resource folder contains scripts which allow us to download, deploy, and run inference with pretrained models.  

## 4.2.1 Download Riva Skills Quick Start from NGC

In [1]:
import os

# Set the path to the Riva Skills Quick Start resource folder
RIVA_DIR = "riva_quickstart_v2.11.0"

# Downloads the Riva Skills Quick Start resource folder (overwrite if necessary)
if os.path.exists(RIVA_DIR):
    !rm -rf $RIVA_DIR
print("Downloading the Riva Skills Quick Start resource folder")
!ngc registry resource download-version "nvidia/riva/riva_quickstart:2.11.0"

# Make special modification required for our docker-in-docker course environment
!sed -i '/--name riva-service-maker*/i \              --network host \\' $RIVA_DIR/riva_init.sh

Downloading the Riva Skills Quick Start resource folder
{
    "download_end": "2025-03-30 05:37:28",
    "download_start": "2025-03-30 05:37:27",
    "download_time": "1s",
    "files_downloaded": 26,
    "local_path": "/dli/task/riva_quickstart_v2.11.0",
    "size_downloaded": "141.52 KB",
    "status": "COMPLETED"
}


In [2]:
# List the resource folder contents
!ls -g --group-directories-first $RIVA_DIR

total 96
drwx------ 2 root  4096 Mar 30 05:37 asr_lm_tools
drwx------ 2 root  4096 Mar 30 05:37 examples
drwx------ 2 root  4096 Mar 30 05:37 protos
-rw------- 1 root 13615 Mar 30 05:37 config.sh
-rw------- 1 root 33131 Mar 30 05:37 nemo2riva-2.11.0-py3-none-any.whl
-rw------- 1 root  2815 Mar 30 05:37 riva_clean.sh
-rw------- 1 root  8766 Mar 30 05:37 riva_init.sh
-rw------- 1 root  4861 Mar 30 05:37 riva_start.sh
-rw------- 1 root  2611 Mar 30 05:37 riva_start_client.sh
-rw------- 1 root  1387 Mar 30 05:37 riva_stop.sh


In [3]:
# Overarching model diretory
MODEL_LOC = "/dli/task/asr-models"
# Directory for prebuilt, OOTB models obtained by running the Riva Skills Quick Start scripts
OOTB_MODEL_LOC = os.path.join(MODEL_LOC, "ootb-models")

In [None]:
# quick fix!
! cp solutions/ex4.2.2_config.sh riva_quickstart_v2.11.0/config.sh

Check your work. The following cell should have no output.

In [4]:
! diff solutions/ex4.2.2_config.sh riva_quickstart_v2.11.0/config.sh

## 4.2.3 Run `riva_init.sh`

The Quick Start `riva-init.sh` script takes a set of models that have already been built in RMIR ( Riva Model Intermediate Representation) format and deploys them. The script also pulls required docker images, which takes about 4 minutes the first time it is run

_Note: The model files have been preloaded for the course to save time during deployment._

In [5]:
# Ensure you have permission to execute these scripts
! cd $RIVA_DIR && chmod +x *.sh

In [6]:
%%time
! cd $RIVA_DIR && bash riva_init.sh config.sh

Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Riva Speech Server images.
  > Pulling nvcr.io/nvidia/riva/riva-speech:2.11.0. This may take some time...
  > Pulling nvcr.io/nvidia/riva/riva-speech:2.11.0-servicemaker. This may take some time...

Downloading models (RMIRs) from NGC...
Note: this may take some time, depending on the speed of your Internet connection.
To skip this process and use existing RMIRs set the location and corresponding flag in config.sh.

=== Riva Speech Skills ===

NVIDIA Release  (build 59018721)
Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its conten

---
# 4.3 Start the Riva Server
The Riva server runs [NVIDIA Triton™ Inference Server](https://developer.nvidia.com/triton-inference-server) in the background.  While the models are loading to the location where Triton can read them, you'll receive "waiting" messages.

In [7]:
# Start the server.  This should take about 30 seconds
! cd $RIVA_DIR && bash riva_start.sh config.sh

Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Riva server is ready...


---
# 4.4 Run Inference

Connect to the Riva Server and run Inference.  We use the Python client API Bindings, which can be installed with `pip install nvidia-riva-client`. For more information on Riva clients, please check the [GitHub repository](https://github.com/nvidia-riva/python-clients) documentation.

In [8]:
import re
import os
import wave
import soundfile as sf

from pathlib import Path
from typing import Callable, Dict, Generator, Iterable, List, Optional, TextIO, Union

import riva.client
from riva.client import AudioEncoding

Define a helper function named `run_inference` for obtaining an audio file's encoding. Calling this inference function queries the Riva server (using gRPC) to transcribe an audio file. 

In [9]:
def run_inference(audio_file, server='localhost:50051', print_full_response=False):
    with open(audio_file, 'rb') as fh:
        data = fh.read()
    
    auth = riva.client.Auth(uri=server)
    client = riva.client.ASRService(auth)
    config = riva.client.RecognitionConfig(
        language_code="en-US",
        max_alternatives=1,
        enable_automatic_punctuation=True,
    )
    
    response = client.offline_recognize(data, config)
    if print_full_response: 
        print(response)
    else:
        print("ASR transcript:")
        print(response.results[0].alternatives[0].transcript)

Listen to the sample audio file.

In [10]:
import io
import IPython.display as ipd

audio_file = "audio_samples/test.wav"
# Load a sample audio file from local disk
# This example uses a .wav file with LINEAR_PCM encoding.
with io.open(audio_file, 'rb') as fh:
    content = fh.read()
ipd.Audio(audio_file)

Run inference on the audio file and check the transcription.

In [11]:
run_inference(audio_file)

ASR transcript:
Congratulations, you've successfully deployed a Riva speech recognition pipeline. 


## 4.5 Stop the Riva Server

In [12]:
! cd $RIVA_DIR && bash riva_stop.sh

Shutting down docker containers...
