Skip to content

Commit

Permalink
update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
yistLin committed Nov 6, 2020
1 parent e3de949 commit 28c699e
Showing 1 changed file with 27 additions and 19 deletions.
46 changes: 27 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,39 +3,47 @@
This is a restructured and rewritten version of [bshall/UniversalVocoding](https://github.com/bshall/UniversalVocoding).
The main difference here is that the model is turned into a [TorchScript](https://pytorch.org/docs/stable/jit.html) module during training and can be loaded for inferencing anywhere without Python dependencies.

### Preprocess training data
## Generate waveforms using pretrained models

Multiple directories containing audio files can be processed at the same time.

```bash
python preprocess.py VCTK-Corpus LibriTTS/train-clean-100 preprocessed
```

### Train from scratch

```bash
python train.py preprocessed
```

### Generate waveforms

You can load a trained model anywhere and generate multiple waveforms parallelly.
Since the pretrained models were turned to TorchScript, you can load a trained model anywhere.
Also you can generate multiple waveforms parallelly, e.g.

```python
import torch

vocoder = torch.jit.load("vocoder.pt")

mels = [
torch.randn(100, 80),
torch.randn(200, 80),
torch.randn(300, 80),
]
] # (length, mel_dim)

with torch.no_grad():
wavs = vocoder.generate(mels)
```

Emperically, if you're using the default architecture, you can generate 100 samples at the same time on an Nvidia GTX 1080 Ti.
Emperically, if you're using the default architecture, you can generate 30 samples at the same time on an GTX 1080 Ti.

## Train from scratch

Multiple directories containing audio files can be processed at the same time, e.g.

```bash
python preprocess.py \
VCTK-Corpus \
LibriTTS/train-clean-100 \
preprocessed # the output directory of preprocessed data
```

And train the model with the preprocessed data, e.g.

```bash
python train.py preprocessed
```

With the default settings, it would take around 12 hr to train to 100K steps on an RTX 2080 Ti.

### References
## References

- [Towards achieving robust universal neural vocoding](https://arxiv.org/abs/1811.06292)

0 comments on commit 28c699e

Please sign in to comment.