# Check generation speed on GPU

This notebook calls the pre-trained hn-nsf models and generate the waveforms on CPU. 

We can check the generation speed.

We will directly load the model definition, rather than following the step-by-step definition in s1_demonstration_hn-nsf.ipynb. For more details on the models, please check s1_demonstration_hn-nsf.ipynb.


## Load packages

In [1]:
# At the begining, let's load packages 
from __future__ import absolute_import
from __future__ import print_function
import sys
import numpy as np
import torch
import torch.nn as torch_nn
import torch.nn.functional as torch_nn_func
import time

# misc functions for this demonstration book
import tool_lib

## Initialize original Hn-NSF 
We first use the hn-nsf in data_models/pre_trained_hn_nsf.

#### Load model definition

In [28]:
# input feature dim (80 dimension Mel-spec + 1 dimension F0)
mel_dim = 80
f0_dim = 1
input_dim = mel_dim + f0_dim

# output dimension = 1 for waveform
output_dim = 1
# sampling rate of waveform (Hz)
sampling_rate = 16000
# up-sampling rate of acoustic features (sampling_rate * frame_shift)
feat_upsamp_rate = int(16000 * 0.005)

# load the basic function blocks
import data_models.pre_trained_hn_nsf.model as nii_nn_blocks

# sampling rate and up-sampling rate have been written in data_models/pre_trained_hn_nsf/model.py for this tutorial.
# no need to provide them as arguments
# declare the model
hn_nsf_model = nii_nn_blocks.Model(input_dim, output_dim, None)

#### Load pre-trained model and data

In [29]:
# load pre-trained model
device=torch.device("cpu")
hn_nsf_model.to(device, dtype=torch.float32)
checkpoint = torch.load("data_models/pre_trained_hn_nsf/trained_network.pt", map_location="cpu")
hn_nsf_model.load_state_dict(checkpoint)

<All keys matched successfully>

In [30]:
# load mel and F0
input_mel = tool_lib.read_raw_mat("data_models/acoustic_features/hn_nsf/slt_arctic_b0474.mfbsp", mel_dim)
input_f0 = tool_lib.read_raw_mat("data_models/acoustic_features/hn_nsf/slt_arctic_b0474.f0", f0_dim)

print("Input Mel shape:" + str(input_mel.shape))
print("Input F0 shape:" + str(input_f0.shape))

# compose the input tensor
input_length = min([input_mel.shape[0], input_f0.shape[0]])
input_tensor = torch.zeros(1, input_length, mel_dim + f0_dim, dtype=torch.float32)
input_tensor[0, :, 0:mel_dim] = torch.tensor(input_mel[0:input_length, :])
input_tensor[0, :, mel_dim:] = torch.tensor(input_f0[0:input_length]).unsqueeze(-1)
print("Input data tensor shape:" + str(input_tensor.shape))

Input Mel shape:(554, 80)
Input F0 shape:(553,)
Input data tensor shape:torch.Size([1, 553, 81])


### Do generation and evaluate speed

In [31]:
num_iter = 5

print("Generate a waveform for %d times:" % (num_iter))
time_start = time.time()
with torch.no_grad():
    for idx in range(num_iter):
        output_waveform = hn_nsf_model(input_tensor)
        print("%d" % (idx), end=', ')
time_end = time.time()


Generate a waveform for 5 times:
0, 1, 2, 3, 4, 

In [32]:
print("Generation done")
output_waveform_array = output_waveform[0].numpy()
output_duration = output_waveform_array.shape[0] / sampling_rate

time_average = (time_end - time_start) / num_iter
speed_per_s = output_waveform_array.shape[0] / time_average
real_time_factor = time_average / output_duration
print("Speed (waveform sampling points per second): %f" % (speed_per_s))
print("Real time factor: %f" % (real_time_factor))


print("Generated sample:")
import IPython.display
IPython.display.Audio(output_waveform[0].numpy(), rate=sampling_rate, normalize=False)

Generation done
Speed (waveform sampling points per second): 7188.614416
Real time factor: 2.225742
Generated sample:
