# Text-To-Speech inference with my funny API

This notebook shows how to use the `tts` module as API to make complete inference with your pretrained `Tacotron-2` model as `synthesizer` and pretrained `NVIDIA's Waveglow` as `vocoder`

The API is really easy to use ! Just call the function with your sentence(s) and the model you want to use and that's it ! The code will automatically load the right model and make the whole inference !

You can also associate models to language in the `models/tts/__init__.py` file in order to enable language-based loading (instead of model-based loading) (see 2nd cell for example)

Note that `Vocoder` is loaded as global variable and all `BaseModel` are `singleton` so you can call the function as mny times as you want without reloading models every time !

I left the results in the `example_outputs/` directory so that you can listen to them without re-executing the code ;)

PS : I will (*theorically*) **never** share weights for the last french demonstration in this notebook (which is a `fine-tuned SV2TTS`) but will share other french voices trained on `SIWIS` (single-speaker) and `SIWIS + CommonVoice + VoxForge` (multi-speaker) [1]

Note : it is normal that the 1st generation is relatively slow but the next ones will be faster than real time !

## Steps to reproduce

1. Download model weights (see README.md for links)
2. Unzip weights in `pretrained_models/` directory
3. Execute cells !

Note : to associate a model to a language, go to `models/tts/__init__.py` and modify the `_pretrained` global variable (at the end of the file). The `key` is the language and the `value` is de model's name

### Streaming

This functionality allows you to enter text and get the output directly

In [None]:
from models.tts import tts_stream

# I suggest you to not save audios when streaming because it is slower ;)
# I did it so that you can listen to the result

# PS : the 'goodbye, see you soon !' message at the end is a funny sentence I added
# when you stop the streaming :D
tts_stream(model = 'pretrained_tacotron2', directory = 'example_outputs', overwrite = True)


### TTS on text

This function generates audios based on the provided text / list of text. 

By default the model will not regenerate sentences if they were already generated but you can change this behavior with the `overwrite` argument to force regeneration. 

In this example you can see model loading with `lang = 'en'` which will load the `nvidia_pretrained` model. In the `models/tts/__init__.py` file I associated by default the `en` language with this model. 

Note on performances : in the `example_waveglow` notebook you can see `1.4 sec generated / sec` with the `tensorflow` implementation of `WaveGlow`. The `pytorch` version achieves `1.7s generated / sec`. Here it achieves `1.4s generated / sec` because I save results (which takes some time). 

In [None]:
from models.tts import tts

text = [
    "Hello world ! Hope you will enjoy this funny API for Text-To-Speech !",
    "If you train new models, do not hesitate to contact me or add it in the available models !"
]

tts(
    text, lang = 'en', directory = 'example_outputs', display = True, 
    overwrite = True, debug = True, tqdm = lambda x: x
)

In [None]:
from models.tts import tts

text = [
    "Bonjour à tous ! J'espère que vous allez aimer cette démonstration de voix en français !"
]

tts(
    text, lang = 'fr', directory = 'example_outputs', display = True, 
    overwrite = True, debug = True, tqdm = lambda x: x
)

In [None]:
from models.tts import tts

text = "Bonjour à tous ! J'espère que vous allez aimer cette démonstration de mon super modèle entrainé avec seulement 20 minutes d'audios !\
Je ne partagerai pas cemodèle mais je trouvais ça intéressant de montrer ce qu'il était possible de faire !"

tts(
    text, model = 'sv2tts_fine_tuned', directory = 'example_outputs', display = True, 
    overwrite = True, debug = True, tqdm = lambda x: x
)