# SEGAN pytorch implementation
by Hyungon Ryu | Sr. Solution Architect in Korea


This jupyter implement  of [SEGAN-PyT](https://github.com/yhgon/segan-pyt) which is similar as [SEGAN-TF](https://github.com/santi-pdp/segan) project  in COLAB. The original paper can be found [here](https://arxiv.org/abs/1703.09452) , and there are two SEGAN PyTorch implementation : 
 - [leftthomas' SEGAN-PyT ](https://github.com/leftthomas/SEGAN)
 - [deNsuh' SEGAN-PyT ](https://github.com/deNsuh/segan-pytorch)

![SEGAN model image](https://github.com/santi-pdp/segan/raw/master/assets/segan_g.png)

This model deals with raw speech waveforms on many noise conditions at different SNRs (40 at training time and 20 during test). It also models the speech characteristics from many speakers mixed within the same structure (without any supervision of identities), which makes the generative structure generalizable in the noise and speaker dimensions.

There are two repositories that were good references on how GANs are defined and deployed:

  - [OpenAI's improved-gan](https://github.com/openai/improved-gan): implementing improvements to train GANs in a more stable way
  - [Carpedm20's DCGAN-tensorflow](https://github.com/carpedm20/DCGAN-tensorflow): implementation of the DCGAN in tensorflow
 
  - [Rafael's vNorm ](https://discuss.pytorch.org/t/parameter-grad-of-conv-weight-is-none-after-virtual-batch-normalization/9036) implementation of virtual Batch Normalization
 
 

## DevOps 
for using pytorch in COLAB, we need to check exact CUDA libraries and pytorch binaries.

### step1. network install CUDA 9.0 libraries 


In [32]:
!wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/cuda-repo-ubuntu1704_9.0.176-1_amd64.deb
!apt-get install dirmngr
!dpkg -i cuda-repo-ubuntu1704_9.0.176-1_amd64.deb
!apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/7fa2af80.pub
!apt-get update
!apt-get install  -y --no-install-recommends  \
 cuda-core-9-0 \
 cuda-cublas-9-0 cuda-cublas-dev-9-0 cuda-cudart-9-0 cuda-cudart-dev-9-0 \
 cuda-cufft-9-0 cuda-cufft-dev-9-0 cuda-curand-9-0 cuda-curand-dev-9-0 \
 cuda-cusolver-9-0 cuda-cusolver-dev-9-0 cuda-cusparse-9-0 \
 cuda-cusparse-dev-9-0 \
 cuda-libraries-9-0 cuda-libraries-dev-9-0 \
 cuda-misc-headers-9-0 cuda-npp-9-0 cuda-npp-dev-9-0 \
 cuda-nvgraph-9-0 cuda-nvgraph-dev-9-0 cuda-nvml-dev-9-0 cuda-nvrtc-9-0 \
 cuda-nvrtc-dev-9-0 


Redirecting output to ‘wget-log.1’.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
dirmngr is already the newest version (2.1.15-1ubuntu8.1).
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
(Reading database ... 19751 files and directories currently installed.)
Preparing to unpack cuda-repo-ubuntu1704_9.0.176-1_amd64.deb ...
Unpacking cuda-repo-ubuntu1704 (9.0.176-1) over (9.0.176-1) ...
Setting up cuda-repo-ubuntu1704 (9.0.176-1) ...
Executing: /tmp/apt-key-gpghome.qfre4qwlMG/gpg.1.sh --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/7fa2af80.pub
gpg: requesting key from 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1704/x86_64/7fa2af80.pub'
gpg: key F60F4B3D7FA2AF80: "cudatools <cudatools@nvidia.com>" not changed
gpg: Total number processed: 1
gpg:              unchanged: 1
Hit:1 http://security.ubuntu.com/ubuntu artful-security InRelease
Hit:2 http://archive.ubunt

### step2. pytorch installation for python3.6


In [3]:
!pip3 install torch torchvision
#below link is slower than files.pythonhosted.org but same file
#!pip3 install http://download.pytorch.org/whl/cu90/torch-0.4.1-cp36-cp36m-linux_x86_64.whl

Collecting torch
[?25l  Downloading https://files.pythonhosted.org/packages/49/0e/e382bcf1a6ae8225f50b99cc26effa2d4cc6d66975ccf3fa9590efcbedce/torch-0.4.1-cp36-cp36m-manylinux1_x86_64.whl (519.5MB)
[K    100% |████████████████████████████████| 519.5MB 30kB/s 
tcmalloc: large alloc 1073750016 bytes == 0x5a064000 @  0x7fa9513ca1c4 0x46d6a4 0x5fcbcc 0x4c494d 0x54f3c4 0x553aaf 0x54e4c8 0x54f4f6 0x553aaf 0x54efc1 0x54f24d 0x553aaf 0x54efc1 0x54f24d 0x553aaf 0x54efc1 0x54f24d 0x551ee0 0x54e4c8 0x54f4f6 0x553aaf 0x54efc1 0x54f24d 0x551ee0 0x54efc1 0x54f24d 0x551ee0 0x54e4c8 0x54f4f6 0x553aaf 0x54e4c8
[?25hCollecting torchvision
[?25l  Downloading https://files.pythonhosted.org/packages/ca/0d/f00b2885711e08bd71242ebe7b96561e6f6d01fdb4b9dcf4d37e2e13c5e1/torchvision-0.2.1-py2.py3-none-any.whl (54kB)
[K    100% |████████████████████████████████| 61kB 21.7MB/s 
Collecting pillow>=4.1.1 (from torchvision)
[?25l  Downloading https://files.pythonhosted.org/packages/62/94/5430ebaa83f91cc7a9f687f

In [4]:
!pip3 install   librosa tqdm  

Collecting librosa
[?25l  Downloading https://files.pythonhosted.org/packages/09/b4/5b411f19de48f8fc1a0ff615555aa9124952e4156e94d4803377e50cfa4c/librosa-0.6.2.tar.gz (1.6MB)
[K    100% |████████████████████████████████| 1.6MB 13.4MB/s 
Collecting audioread>=2.0.0 (from librosa)
  Downloading https://files.pythonhosted.org/packages/f0/41/8cd160c6b2046b997d571a744a7f398f39e954a62dd747b2aae1ad7f07d4/audioread-2.1.6.tar.gz
Collecting resampy>=0.2.0 (from librosa)
[?25l  Downloading https://files.pythonhosted.org/packages/14/b6/66a06d85474190b50aee1a6c09cdc95bb405ac47338b27e9b21409da1760/resampy-0.2.1.tar.gz (322kB)
[K    100% |████████████████████████████████| 327kB 25.8MB/s 
[?25hCollecting numba>=0.38.0 (from librosa)
[?25l  Downloading https://files.pythonhosted.org/packages/83/ac/c87f229ae7f29fbf85bc5405f85ec7097c979c021dac52f2d5206834a899/numba-0.40.0-cp36-cp36m-manylinux1_x86_64.whl (2.4MB)
[K    100% |████████████████████████████████| 2.4MB 9.8MB/s 
[?25hCollecting llvmlite>

In [0]:
### step2. pytorch installation for python3.6

### step3. check the system
with nvidia-smi, you could check GPU is avalable.
if you have prolem,  in the menu of jupyter notebook `EDIT > Notebook Settings` and check the Accelerator for `GPU`

In [5]:
!nvidia-smi

Wed Oct  3 12:59:02 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P8    31W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

In [19]:
#pytorch verification code
from __future__ import print_function
import torch
x = torch.rand(5, 3)
print('Check computation\n',x)
print('Check GPU is available : ',torch.cuda.is_available())
print('CuDNN version : ', torch.backends.cudnn.version())

Check computation
 tensor([[0.6055, 0.1374, 0.2097],
        [0.6046, 0.6662, 0.1934],
        [0.3607, 0.1231, 0.3689],
        [0.5207, 0.9183, 0.8511],
        [0.2108, 0.9905, 0.2923]])
Check GPU is available :  True
CuDNN version :  7102


In [20]:
!nvidia-smi | grep MiB

| N/A   34C    P8    31W / 149W |     11MiB / 11439MiB |      0%      Default |


# prepare Dataset

## Dataset
The speech enhancement dataset used can be found in [Edinburgh DataShare](http://datashare.is.ed.ac.uk/handle/10283/1942).  In COLAB environment, you will lose the downloaded dataset in every reconnection. So I recommend to mount google drive in COLAB. for each connection, a time for each connection in COLAB.


There are [GUIDE](https://colab.research.google.com/notebooks/io.ipynb) and example for External data: Drive, Sheets, and Cloud Storage in COLAB


There are three dataset I'll use 48khz dataset base on original segan tensorflow implmentation to save storage
 - [48khz dataset](http://datashare.is.ed.ac.uk/handle/10283/1942)
 - [56khz and 28khz dataset](https://datashare.is.ed.ac.uk/handle/10283/2791)

I assume you already upload below 4 files in google drive 
 - clean_testset_wav.zip	
 - noisy_testset_wav.zip
 - clean_trainset_wav.zip	
 - noisy_trainset_wav.zip



In [16]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [18]:
%%time
!rm -rf ./dataset-segan
!ls -h   "drive/My Drive/COLAB/segan/"
!mkdir ./dataset-segan
!cp  -rf "drive/My Drive/COLAB/segan/clean_testset_wav.zip" ./dataset-segan/.
!cp  -rf "drive/My Drive/COLAB/segan/noisy_testset_wav.zip" ./dataset-segan/.
!cp  -rf "drive/My Drive/COLAB/segan/clean_trainset_wav.zip" ./dataset-segan/.
!cp  -rf "drive/My Drive/COLAB/segan/noisy_trainset_wav.zip" ./dataset-segan/.
!ls -h ./dataset-segan

clean_testset_wav.zip	noisy_testset_wav.zip	segan.tfrecords
clean_trainset_wav.zip	noisy_trainset_wav.zip
clean_testset_wav.zip	noisy_testset_wav.zip
clean_trainset_wav.zip	noisy_trainset_wav.zip
CPU times: user 418 ms, sys: 92.9 ms, total: 511 ms
Wall time: 46.3 s


In [21]:
%%time
!mkdir data
!unzip -q ./dataset-segan/clean_trainset_wav.zip -d ./data/clean_trainset_wav
!unzip -q ./dataset-segan/clean_testset_wav.zip  -d ./data
!unzip -q ./dataset-segan/noisy_trainset_wav.zip -d ./data/noisy_trainset_wav
!unzip -q ./dataset-segan/noisy_testset_wav.zip  -d ./data
!du -h data
!ls ./data/clean_trainset_wav | head -n 2
!ls ./data/clean_testset_wav  | head -n 2
!ls ./data/noisy_trainset_wav | head -n 2
!ls ./data/noisy_testset_wav  | head -n 2

192M	data/noisy_testset_wav
1.1G	data/clean_trainset_wav
192M	data/clean_testset_wav
1.1G	data/noisy_trainset_wav
2.5G	data
p226_001.wav
p226_002.wav
p232_001.wav
p232_002.wav
p226_001.wav
p226_002.wav
p232_001.wav
p232_002.wav
CPU times: user 549 ms, sys: 111 ms, total: 660 ms
Wall time: 59 s


## clone the SEGAN model and utilities
The origianl SEGAN implementation based old pytorch so I've  modify few word for compatability with COLAB which use python version 3.6.3 
you could check


 [diff1 1ded4f0 ](https://github.com/yhgon/segan-pyt/commit/1ded4f0c15bffe51027bb78e8a63aa25afbfded9)
```
in data_preprocess.py
7  -  clean_train_folder = 'data/clean_trainset_56spk_wav'
8  -  noisy_train_folder = 'data/noisy_trainset_56spk_wav'
7  +  clean_train_folder = 'data/clean_trainset_wav'
8  + noisy_train_folder = 'data/noisy_trainset_wav'
 ```
 

 [diff2 d1e70f ](https://github.com/yhgon/segan-pyt/commit/d1e70ffe67a9d3e2cbfe4bd48d90948b75693cf3)
```
model.py
 21 -      self.gamma = Parameter(torch.normal(means=torch.ones(1, num_features, 1), std=0.02))
 21 +      self.gamma = Parameter(torch.normal(mean=torch.ones(1, num_features, 1), std=0.02))
 ```
[diff3 44fe049 ](https://github.com/yhgon/segan-pyt/commit/44fe04947e8f1ea580295d877207908b5fba851b)
```
in model.py
166 -  nn.init.xavier_normal(m.weight.data)
 166 + nn.init.xavier_normal_(m.weight.data)
 
 278 -  nn.init.xavier_normal(m.weight.data)
 276 +  nn.init.xavier_normal_(m.weight.data)
 ```
 
 [diff4 991db35 ](https://github.com/yhgon/segan-pyt/commit/991db3581a95b71bcbe476daf1cb3c30b4e80b80)
```
in main.py
 53  -  z = nn.init.normal(torch.Tensor(train_batch.size(0), 1024, 8))
 53  +  z = nn.init.normal_(torch.Tensor(train_batch.size(0), 1024, 8))
 ```
 
 
 [diff5 bdbe9f3 ](https://github.com/yhgon/segan-pyt/commit/bdbe9f3e3ff337ab13f065abd0d37de6ca63e9cf)
```
in main.py
99 -  z = nn.init.normal(torch.Tensor(test_noisy.size(0), 1024, 8))
99 +  z = nn.init.normal_(torch.Tensor(test_noisy.size(0), 1024, 8)) 
```
 

In [35]:
!rm -rf segan-pyt
!git clone https://github.com/yhgon/segan-pyt.git

Cloning into 'segan-pyt'...
remote: Enumerating objects: 9, done.[K
remote: Counting objects:  11% (1/9)   [Kremote: Counting objects:  22% (2/9)   [Kremote: Counting objects:  33% (3/9)   [Kremote: Counting objects:  44% (4/9)   [Kremote: Counting objects:  55% (5/9)   [Kremote: Counting objects:  66% (6/9)   [Kremote: Counting objects:  77% (7/9)   [Kremote: Counting objects:  88% (8/9)   [Kremote: Counting objects: 100% (9/9)   [Kremote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (9/9), done.[K
remote: Total 164 (delta 3), reused 0 (delta 0), pack-reused 155[K
Receiving objects: 100% (164/164), 2.14 MiB | 4.98 MiB/s, done.
Resolving deltas: 100% (90/90), done.


## prepare dataset 
down sampling with librosa and make numpy
- train dataset  : downsampling 11572 wav  and generate  48640 np files 
- test dataset   : downsampling     824 wav and generate 2805 np files  

it would takes  about 6 minutes. you could see the log : 

```
Serialize and down-sample train audios: 100% 11572/11572 [01:28<00:00, 131.34it/s]
Verify serialized train audios: 100% 48640/48640 [01:20<00:00, 606.76it/s]
Serialize and down-sample test audios: 100% 824/824 [02:50<00:00,  4.18it/s]
Verify serialized test audios: 100% 2805/2805 [00:01<00:00, 2235.37it/s]
CPU times: user 5.38 s, sys: 1.33 s, total: 6.71 s
Wall time: 5min 42s
```


In [28]:
%%time
!python segan-pyt/data_preprocess.py

Serialize and down-sample train audios: 100% 11572/11572 [01:28<00:00, 131.34it/s]
Verify serialized train audios: 100% 48640/48640 [01:20<00:00, 606.76it/s]
Serialize and down-sample test audios: 100% 824/824 [02:50<00:00,  4.18it/s]
Verify serialized test audios: 100% 2805/2805 [00:01<00:00, 2235.37it/s]
CPU times: user 5.38 s, sys: 1.33 s, total: 6.71 s
Wall time: 5min 42s


In [0]:
!mkdir results
!mkdir epochs

performance in COLAB with K80 
```
with batch  4 100% 12160/12160 [2:29:28<00:00,  1.37it/s]
with batch  8   0%    19/ 6080 [00:21<1:54:24,  1.13s/it
with batch 16   1%    22/ 3040 [00:44<1:41:04,  2.01s/it]
with batch 32   0%     0/ 1520 [00:00<?, ?it/s]ERROR: insufficient shared memory

```

In [0]:
!python ./segan-pyt/main.py --batch_size 16 --num_epochs 64

loading data...
# generator parameters: 75453878
# discriminator parameters: 97473194
  .format(epoch + 1, clean_loss.data[0], noisy_loss.data[0], g_loss.data[0], g_cond_loss.data[0]))
Epoch 1: d_clean_loss 0.0020, d_noisy_loss 0.0000, g_loss 100.5021, g_conditional_loss 100.0026: 100% 3040/3040 [1:41:37<00:00,  2.00s/it]
  z = nn.init.normal(torch.Tensor(test_noisy.size(0), 1024, 8))
Test model and save generated audios: 100% 176/176 [00:22<00:00,  7.80it/s]
Epoch 2: d_clean_loss 0.0000, d_noisy_loss 0.0000, g_loss 100.4992, g_conditional_loss 99.9996: 100% 3040/3040 [1:41:24<00:00,  2.00s/it]
Test model and save generated audios: 100% 176/176 [00:22<00:00,  7.84it/s]
Epoch 3: d_clean_loss 0.0000, d_noisy_loss 0.0000, g_loss 100.5049, g_conditional_loss 100.0049: 100% 3040/3040 [1:41:40<00:00,  2.02s/it]
Test model and save generated audios: 100% 176/176 [00:22<00:00,  7.73it/s]
Epoch 4: d_clean_loss 0.0000, d_noisy_loss 0.0000, g_loss 100.5091, g_conditional_loss 100.0091:  11% 323/3