# Text-To-Speech inference with my funny API

This notebook shows how to use the `tts` module as API to make complete inference with your pretrained `Tacotron-2` model as `synthesizer` and pretrained `NVIDIA's Waveglow` as `vocoder`

The API is really easy to use ! Simply call the function with your sentence(s) and the model you want to use, and that's it ! The code automatically loads the right model and makes the whole inference !

You can also associate models to language in the `models/tts/__init__.py` file, in order to enable *language-based* loading (instead of *model-based* loading) (see the 2nd cell for example)

Note that `Vocoder` is loaded as global variable, and all `BaseModel` are `singleton`, so you can call the function as mny times as you want without reloading models every time !

I left the results in the `example_outputs/` directory such that you can listen to them without re-executing the code ;)

Note : it is normal that the 1st generation is relatively slow (because of models' loading and `tensorflow graph` compilation) but the next ones will be faster than real time !

PS : I will (*theorically*) **never** share weights for the 2 last french demonstrations in this notebook (which is a `fine-tuned SV2TTS`) but will share other french voices trained on `SIWIS` (single-speaker) and `SIWIS + CommonVoice + VoxForge` (multi-speaker)

## Steps to reproduce

1. Download model weights (see `README.md` for links)
2. Unzip weights in `pretrained_models/` directory
3. Execute cells !

Note : to associate a model to a language, go to `models/tts/__init__.py` and modify the `_pretrained` global variable (at the end of the file). The `key` is the language and the `value` is de model's name.

**Important note** : it is not required to download the `WaveGlow` vocoder model. It is also possible to use the `torch hub` model but it requires a working **GPU-enabled** pytorch installation. If you do not want to have both tensorflow and pytorch working together, you can download the `WaveGlow` tensorflow implementation I made. 

### Streaming

This functionality allows you to enter text and get the output directly

In [None]:
from models.tts import tts_stream

# I suggest you to not save audios when streaming because it is slower ;)
# I did it so that you can listen to the result

# PS : the 'goodbye, see you soon !' message at the end is a funny sentence I added
# when you stop the streaming :D
tts_stream(model = 'pretrained_tacotron2', directory = 'example_outputs/streaming', overwrite = True)


## Special Christmas

In [None]:
from models.tts import tts

text_en = "I wish you a merry christmas, and all the best for the new year 2023 !"
text_fr = "Je vous souhaite un joyeux Noël, et tout le meilleur pour cette nouvelle année 2023 !"

_ = tts(
    text_en, lang = 'en', directory = 'example_outputs/christmas', display = True, 
    overwrite = True, debug = True, tqdm = None
)
_ = tts(
    text_fr, model = 'sv2tts_fine_tuned', directory = 'example_outputs/christmas', display = True, 
    overwrite = True, debug = True, tqdm = None
)

In [None]:
from utils.audio import display_audio

_ = display_audio('example_outputs/christmas/sentence_audios/audio_0.mp3')
_ = display_audio('example_outputs/christmas/sentence_audios/audio_1.mp3')

### TTS on text

This function generates audios based on the provided text / list of text. 

By default, the model does not regenerate sentences if they have already been generated, but you can change this behavior with the `overwrite` argument to force regeneration. Note that for `SV2TTS`-based models, `overwrite = True` by default as those models are designed to have multiple intonations as they are *multi-speakers* models. 

In this example, you can see model loading with `lang = 'en'` which will load the `pretrained_tacotron2` model. In the `models/tts/__init__.py` file, I have associated by default the `en` language with this model. 

In [None]:
from models.tts import tts
from loggers import set_level

text = [
    "Hello world ! Hope you will enjoy this funny API for Text-To-Speech !",
    "If you train new models, do not hesitate to contact me or add it in the available models !"
]

set_level('info')

_ = tts(
    text, lang = 'en', directory = 'example_outputs/en', display = True, 
    overwrite = True, debug = True, tqdm = None
)

In [None]:
from models.tts import tts

text = [
    "Bonjour tout le monde ! J'espère que vous allez aimer cette démonstration de voix en français !"
]

_ = tts(
    text, lang = 'fr', directory = 'example_outputs/fr', display = True, 
    overwrite = True, debug = True, tqdm = None
)

In [None]:
from models.tts import tts

text = "Bonjour tout le monde ! J'espère que vous allez aimer cette démonstration de mon super modèle entrainé avec seulement 20 minutes d'audios ! \
Je ne partagerai pas ce modèle, mais je trouvais ça intéressant de montrer ce qu'il était possible de faire !"

_ = tts(
    text, model = 'sv2tts_fine_tuned', directory = 'example_outputs/fr', display = True, 
    overwrite = False, debug = True, tqdm = None
)

This cell allows to display an audio based on its saving directory and the text. 

In [None]:
from models.tts import get_audio_file
from utils.audio import display_audio

text1 = "Bonjour tout le monde ! J'espère que vous allez aimer cette démonstration de mon super modèle entrainé avec seulement 20 minutes d'audios ! \
Je ne partagerai pas ce modèle, mais je trouvais ça intéressant de montrer ce qu'il était possible de faire !"

text2 = "Bonjour tout le monde ! J'espère que vous allez aimer cette démonstration de mon super modèle entrainé avec seulement 5 minutes d'audios !\
Je ne partagerai pas cemodèle, mais je trouvais ça intéressant de montrer ce qu'il était possible de faire !"

_ = display_audio(get_audio_file(text1, directory = 'example_outputs/fr'))

_ = display_audio(get_audio_file(text2, directory = 'example_outputs/fr'))