<img src="https://nv-adlr.github.io/images/waveglow_logo.png" width=300 align=center >




# Part1. Voice Synthesize with NVIDIA WaveGlow Model
 


by **Hyungon Ryu** | Sr. Solution Architect at NVIDIA


---

```

```


----



  **Content**
- **Part1. Voice Synthesis with NVIDIA  WaveGlow Model**
- Part2. Voice Synthesis with NVIDIA Tacotron2 + WaveGlow



In this jupyter, I'll demonstrate Voice Synthesis from Mel with WaveGlow Model. You can reproduce  through the provided pretrained WaveGlow parameters. You can reproduce the voice synthesis of the WaveGlow model on this jupyter notebook in Google COLAB environmnet with Tesla K80. You can replay it within 10 minutes, including the time you receive the weight file. If you use Tesla T4 or Tesla V100, you can synthesize voice in real time. 
Visit the NVIDIA ADLR's WaveGlow [blog](https://nv-adlr.github.io/WaveGlow) to see the sound quality of WaveGlow model. 


```

```
----

## Step1. DevOps


### allocate GPU

At the time of creation of this jupyter noteboo, I already selected GPU as a preference Accelerator option. Before we get started, let's see if a GPU is allocated.


Select  the **[EDIT]** menu > Select the  **[Notebook Settings]** >  and check the box of ** [ GPU ]** option. 

#### check Tesla K80
Google COLAB provide <a href="https://images.nvidia.com/content/pdf/kepler/Tesla-K80-BoardSpec-07317-001-v05.pdf" target="_blank_"> Tesla K80</a> with 12GB Memory. 
You can see the assigned GPU information with simple command  `nvidia-smi`

In [1]:
!nvidia-smi

Thu Nov 22 10:23:29 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   73C    P8    32W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

#### system information and configure
You can see detailed information about the specs for free systems offered by Google COLAB.  In particular, the nvidia-smi tool allows you to adjust the Tesla K80's application clock to its highest application clock rate of 875 Mhz

In [3]:
%%bash
#check the environemnt 
echo "Check H/W"
lscpu | grep 'CPU(s):            '
lscpu | grep GHz
echo "memory" && free -m | cut -c-49 |  head -n 2 
echo "storage" && df -h |  cut -c-60 | head -n 2
df -h |  grep '/dev/sda1'
echo " " && nvidia-smi -L | cut -c-17
echo "confure Max Application Clock for K80 875Mhz"
nvidia-smi -ac 2505,875 && nvidia-smi -pm 1
echo " " &&echo "Check S/W"
cat /etc/*-release | grep PRETTY_NAME
python --version 
nvcc --version | grep  tools

Check H/W
CPU(s):              2
Model name:          Intel(R) Xeon(R) CPU @ 2.30GHz
memory
              total        used        free      
Mem:          13022         407       10986      
storage
Filesystem      Size  Used Avail Use% Mounted on
overlay         359G  7.6G  333G   3% /
/dev/sda1       365G   10G  355G   3% /opt/bin
 
GPU 0: Tesla K80 
confure Max Application Clock for K80 875Mhz
Applications clocks set to "(MEM 2505, SM 875)" for GPU 00000000:00:04.0
All done.
Persistence mode is already Enabled for GPU 00000000:00:04.0.
All done.
 
Check S/W
PRETTY_NAME="Ubuntu 18.04.1 LTS"
Python 3.6.7
Cuda compilation tools, release 9.2, V9.2.148


 ### clone WaveGlow  Model

Copy the  NVIDIA's 
[WaveGlow](https://github.com/NVIDIA/waveglow) model to COLAB via the git clone command. In particular, the WaveGlow model uses tacotron2 as a submodule to creat a Mel Spectrogram.

This jupyter is based on the last commit [ f4c04e2 ](https://github.com/NVIDIA/waveglow/commit/f4c04e2d968de01b22d2fb092bbbf0cec0b6586f)  and Google COLAB environment in October 10, 2018

In [30]:
%%bash
git clone https://github.com/NVIDIA/waveglow.git
cd waveglow
git fetch origin f4c04e2d968de01b22d2fb092bbbf0cec0b6586f
git checkout FETCH_HEAD
git submodule init
git submodule update

M	tacotron2


fatal: destination path 'waveglow' already exists and is not an empty directory.
From https://github.com/NVIDIA/waveglow
 * branch            f4c04e2d968de01b22d2fb092bbbf0cec0b6586f -> FETCH_HEAD
HEAD is now at f4c04e2 config.json: addinf mel fmin and mel fmax params


### install requirements

The WaveGlow model has been tested in pytorch 0.4.0. You also need some library like librosa to handle audio and mel spectrogram  files. It takes about one minute to finish. It may vary depending on network environment.

In [5]:
%%time
%%bash 
pip install torch==0.4.0 matplotlib==2.1.0 tensorflow  inflect==0.2.5 \
 librosa==0.6.0 scipy==1.0.0 tensorboardX==1.1 Unidecode==1.0.22 pillow 

Collecting torch==0.4.0
  Downloading https://files.pythonhosted.org/packages/69/43/380514bd9663f1bf708abeb359b8b48d3fabb1c8e95bb3427a980a064c57/torch-0.4.0-cp36-cp36m-manylinux1_x86_64.whl (484.0MB)
Collecting matplotlib==2.1.0
  Downloading https://files.pythonhosted.org/packages/b2/9c/fcc9cfbf2454d93be66a615657cda4184954b4b67b9fc07c8511ff152b8f/matplotlib-2.1.0-cp36-cp36m-manylinux1_x86_64.whl (15.0MB)
Collecting inflect==0.2.5
  Downloading https://files.pythonhosted.org/packages/66/15/2d176749884cbeda0c92e0d09e1303ff53a973eb3c6bb2136803b9d962c9/inflect-0.2.5-py2.py3-none-any.whl (58kB)
Collecting librosa==0.6.0
  Downloading https://files.pythonhosted.org/packages/6b/f4/422bfbefd581f74354ef05176aa48558c548243c87e359d91512d4b65523/librosa-0.6.0.tar.gz (1.5MB)
Collecting scipy==1.0.0
  Downloading https://files.pythonhosted.org/packages/d8/5e/caa01ba7be11600b6a9d39265440d7b3be3d69206da887c42bef049521f2/scipy-1.0.0-cp36-cp36m-manylinux1_x86_64.whl (50.0MB)
Collecting tensorboardX==1.

tcmalloc: large alloc 1073750016 bytes == 0x5c584000 @  0x7f42f80a12a4 0x591a07 0x5b5d56 0x502e9a 0x506859 0x502209 0x502f3d 0x506859 0x504c28 0x502540 0x502f3d 0x506859 0x504c28 0x502540 0x502f3d 0x506859 0x504c28 0x502540 0x502f3d 0x507641 0x502209 0x502f3d 0x506859 0x504c28 0x502540 0x502f3d 0x507641 0x504c28 0x502540 0x502f3d 0x507641


CPU times: user 10.6 ms, sys: 7.86 ms, total: 18.4 ms
Wall time: 1min 4s


```



```
---

## Step2.Prepare Weight Files

Voice Synthesis from real audio


### 2-1 Mel from NVIDIA ADLR homepage 


###  WaveGlow weight files

NVIDIA provide pre-trained WaveGlow Weight for voice synthesis. 

###  2-1.1 Copy files to Google Drive
Copy WaveGlow    [weight file](https://drive.google.com/file/d/1cjKPHbtAMh_4HTHmuIGNkbOkPBD9qwhj/view?usp=sharing)   to google drive.

 
Select **[More Action]** >  **[Organize]**  > select **[My drive]** then copy the file to root directory in Google Drive. 


### 2-1.2 mount Google Drive 

Mount Google Drive on COLAB for seamless file copying of weight files shared by NVIDIA through Google Drive. the home directory of jupyter is **`/content`**. Let's set up a mount point as **`Gdrive`**.  Go to the URL in a browser and enter your authorization key and activate it.

In [0]:
from google.colab import drive
drive.mount('/content/Gdrive')

Mounted at  **`/content/Gdrive`**. However, it's not immediately refected in the file browser in the left pannel. Therefore, you can click the **REFRESH** button in the File tab to update the mounted files.  You could see **`Gdrive`**



In [0]:
%%bash 
ls .
ls -alh   "Gdrive/My Drive"


Note, the name of the mount point contains a space. Therefore, you should use quotation marks to recognize the folder name correctly when working. It takes about  30 sec to copy 2GB file from google drive to COLAB. 

Check the files in mounted point

In [0]:
ls -alh "Gdrive/My Drive/waveglow_old.pt"

-r-------- 2 root root 2.0G Nov  8 03:51 'Gdrive/My Drive/waveglow_old.pt'


copy files to  local storage in COLAB VM

In [0]:
%%time
%%bash
ls -alh "Gdrive/My Drive/waveglow_old.pt"
cp -rf "Gdrive/My Drive/waveglow_old.pt" ./.
pwd
ls -alh "./waveglow_old.pt"

-r-------- 2 root root 2.0G Nov  8 03:51 Gdrive/My Drive/waveglow_old.pt
/content
-r-------- 1 root root 2.0G Nov 10 05:05 ./waveglow_old.pt
CPU times: user 4.28 ms, sys: 6.49 ms, total: 10.8 ms
Wall time: 32.6 s


# Download from Google Drive to COLAB

In [0]:
import requests

def download_file_from_google_drive(id, destination):
    def get_confirm_token(response):
        for key, value in response.cookies.items():
            if key.startswith('download_warning'):
                return value

        return None

    def save_response_content(response, destination):
        CHUNK_SIZE = 32768

        with open(destination, "wb") as f:
            for chunk in response.iter_content(CHUNK_SIZE):
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)

    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)

In [16]:
%%time
destination="/content/waveglow_old.pt"
file_id="1cjKPHbtAMh_4HTHmuIGNkbOkPBD9qwhj"
download_file_from_google_drive(file_id, destination)

CPU times: user 5.66 s, sys: 5.69 s, total: 11.3 s
Wall time: 14.2 s


In [0]:
https://drive.google.com/file/d/1cjKPHbtAMh_4HTHmuIGNkbOkPBD9qwhj/view?usp=sharing

## Step3. Voice Synthesis from provided Mel files

### public Mel files

NVIDIA provide Generated Mel files from real voice. 

### 3-1. Copy files to Google Drive

- [Mel files ](https://drive.google.com/file/d/1g_VXK2lpP9J25dQFhQwx7doWl_p20fXA/view?usp=sharing)

Select **[More Action]** >  **[Organize]**  > select **[My drive]** then copy the file to root directory in Google Drive. 




Check files in mount point

In [0]:
ls -alh "Gdrive/My Drive/mel_spectrograms.zip"

-r-------- 2 root root 1.5M Nov  8 00:41 'Gdrive/My Drive/mel_spectrograms.zip'


### 3-2 copy to local storage in COLAB VM
Note, the name of the mount point contains a space. Therefore, you should use quotation marks to recognize the folder name correctly when working. 

In [0]:
%%time
%%bash
ls -alh "Gdrive/My Drive/mel_spectrograms.zip"
cp -rf "Gdrive/My Drive/mel_spectrograms.zip" .
pwd 
ls -alh "/content/mel_spectrograms.zip"

-r-------- 2 root root 1.5M Nov  8 00:41 Gdrive/My Drive/mel_spectrograms.zip
/content
-r-------- 1 root root 1.5M Nov 10 05:15 /content/mel_spectrograms.zip
CPU times: user 2.43 ms, sys: 7.34 ms, total: 9.77 ms
Wall time: 597 ms


In [8]:
%%time
destination="/content/mel_spectrograms.zip"
file_id="1g_VXK2lpP9J25dQFhQwx7doWl_p20fXA"
download_file_from_google_drive(file_id, destination)

CPU times: user 37 ms, sys: 6.71 ms, total: 43.8 ms
Wall time: 1.09 s


### 3-3 . Decompess Mel files

An abnormal phenomenon was observed in COLAB.  Delete all files associated with MACOSX in compresse zip file.

In [9]:
%%bash
unzip mel_spectrograms.zip
rm -rf content/mel_spectrogram/.DS_Store
rm -rf __MACOSX 

Archive:  mel_spectrograms.zip
   creating: mel_spectrograms/
  inflating: mel_spectrograms/LJ001-0153.wav.pt  
  inflating: mel_spectrograms/LJ001-0096.wav.pt  
  inflating: mel_spectrograms/LJ001-0094.wav.pt  
  inflating: mel_spectrograms/.DS_Store  
   creating: __MACOSX/
   creating: __MACOSX/mel_spectrograms/
  inflating: __MACOSX/mel_spectrograms/._.DS_Store  
  inflating: mel_spectrograms/LJ001-0079.wav.pt  
  inflating: mel_spectrograms/LJ001-0051.wav.pt  
  inflating: mel_spectrograms/LJ001-0063.wav.pt  
  inflating: mel_spectrograms/LJ001-0173.wav.pt  
  inflating: mel_spectrograms/LJ001-0102.wav.pt  
  inflating: mel_spectrograms/LJ001-0015.wav.pt  
  inflating: mel_spectrograms/LJ001-0072.wav.pt  


### 3-4 . Generate Audio

##checkpoint convert

In [25]:
%%bash
cd waveglow
git fetch origin f4c04e2d968de01b22d2fb092bbbf0cec0b6586f
git checkout FETCH_HEAD

M	tacotron2


From https://github.com/NVIDIA/waveglow
 * branch            f4c04e2d968de01b22d2fb092bbbf0cec0b6586f -> FETCH_HEAD
Note: checking out 'FETCH_HEAD'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at f4c04e2 config.json: addinf mel fmin and mel fmax params


In [27]:
%%time
%%bash
cd /content/waveglow
python3 convert_model.py /content/waveglow_old.pt  /content/waveglow_new.pt

python3: can't open file 'convert_model.py': [Errno 2] No such file or directory


CPU times: user 5.26 ms, sys: 5.23 ms, total: 10.5 ms
Wall time: 56.2 ms


Now we will synthesize the voice from the provided Mel Spectrogram. Likewise, it takes time to load 2GB parameter file.  

In [19]:
%%bash
rm -rf audio_mel_ref 
mkdir audio_mel_ref 
cd waveglow
python inference.py -f <(ls /content/mel_spectrograms/*.pt) -w /content/waveglow_old.pt -o /content/audio_mel_ref   -s 0.6

Traceback (most recent call last):
  File "inference.py", line 73, in <module>
    args.output_dir, args.sampling_rate, args.is_fp16)
  File "inference.py", line 35, in main
    waveglow = torch.load(waveglow_path)['model']
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 303, in load
    return _load(f, map_location, pickle_module)
  File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 469, in _load
    result = unpickler.load()
  File "/content/waveglow/glow_old.py", line 3, in <module>
    from glow import Invertible1x1Conv, remove
  File "/content/waveglow/glow.py", line 33, in <module>
    @torch.jit.script
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 537, in script
    graph = _script_graph(fn, frame_id=3)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 532, in _script_graph
    ast = get_jit_ast(fn)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/frontend.py", li

### 3-5. Compare Voice Quality

**Real Voice**

In [0]:
ipd.Audio(audio_file_real, rate=22050)

**Generated Voice** from provided Mel

In [0]:
audio_file_synth = "/content/audio_mel_ref/LJ001-0153.wav_synthesis.wav"
ipd.Audio(audio_file_synth, rate=22050)

##  Step4. Check Voice Synthesis Quality from Mel of Real Audio

You could generate the audio from voice files. Visit the NVIDIA ADLR's WaveGlow [blog](https://nv-adlr.github.io/WaveGlow) to see the sound quality of WaveGlow model. You can reproduce through the provided pretrained WaveGlow parameters.

### 4-1. Upload Audio files to Colab
Download one of Real Audio's files from your blog. I will select the last file, [LJ001-0053.wav](http://docs.google.com/uc?export=open&id=1kM_7q5dVGkf4CV97cc7rY07JLwB9VaAL). Click the **UPLOAD** button in the left file browser to upload the file. 


caution! COLAB does not yet support drag-and-drop from Browser or File Explorer into COLAB's File Explorer window.




In [0]:
ls /content/LJ001-0153.wav

/content/LJ001-0153.wav


### 4-2. Generate Mel from Real Audio
I created Mel Spectrogram from Real Audio(LJ001-0153.wav) in the **`Mel_real `** folder as configured by config.json.

```
    "data_config": {
        "training_files":"train_files.txt",
        "segment_length": 16000,
        "sampling_rate": 22050,
        "filter_length": 1024,
        "hop_length": 256,
        "win_length": 1024,
        "mel_fmin": 0.0,
        "mel_fmax": 8000.0
    },
```

  

In [0]:
%%time
%%bash
rm -rf Mel_real
mkdir Mel_real
cd waveglow
ls /content/LJ001-0153.wav > /content/waveglow/test_files.txt
ls /content/LJ001-0153.wav > /content/waveglow/train_files.txt
python mel2samp.py -f test_files.txt -o /content/Mel_real -c config.json
ls /content/Mel_real/

/content/Mel_real/LJ001-0153.wav.pt
LJ001-0153.wav.pt
CPU times: user 6.24 ms, sys: 6.04 ms, total: 12.3 ms
Wall time: 3.57 s


### 4-3. Generate Synthetic Audio
6 seconds of voice can be generated in about 20 seconds. It takes most of the time to load a 2GB weight file. Actual speech synthesis is processed in real time, and it takes time to save the speech file.

In [0]:
%%time
%%bash
rm -rf audio
mkdir audio_real
ls /content/Mel_real/*.pt > /content/waveglow/mel_files_real.txt
cd waveglow
python inference.py -f mel_files_real.txt -w /content/waveglow_old.pt -o /content/audio_real  -s 0.6

/content/audio_real/LJ001-0153.wav_synthesis.wav




CPU times: user 6.35 ms, sys: 4.86 ms, total: 11.2 ms
Wall time: 21.1 s


### 4-4. compare  Audio Quality




**Real Audio**


In [0]:
import IPython.display as ipd
audio_file_real ="/content/LJ001-0153.wav"
ipd.Audio(audio_file_real, rate=22050)

 **synthesis Audio**



In [0]:
audio_file_synth = "/content/audio_real/LJ001-0153.wav_synthesis.wav"
ipd.Audio(audio_file_synth, rate=22050)

If the sound quality differs from the actual sound, the option settings for preprocessing may be incorrect as [issue7](https://github.com/NVIDIA/waveglow/issues/7)

```


```

---

  ```   ```

## Summary

With this jupyter you can easily demonstrate the speech synthesis.

I especially would like to thank Rafael Valle for urgent commit during validating this jupyter.


## Reference
- paper  https://arxiv.org/abs/1811.00002

- blog https://nv-adlr.github.io/WaveGlow 

- github https://github.com/NVIDIA/waveglow


<img src="https://nv-adlr.github.io/images/waveglow_logo.png" width=300 align=center >

```





```