Narrator or Character: Voice Modulation in an Expressive Multi-speaker TTS

Current Text-to-Speech (TTS) systems are trained on audiobook data and perform well in synthesizing read-style speech. In this work, we are interested in synthesizing audio stories as narrated to children. The storytelling style is more expressive and requires perceptible changes of voice across the narrator and story characters. To address these challenges, we present a new TTS corpus of English audio stories for children with 32.7 hours of speech by a single female speaker with a UK accent. We provide evidence of the salient differences in the suprasegmentals of the narrator and character utterances in the dataset, motivating the use of a multi-speaker TTS for our application. We use a fine-tuned BERT model to label each sentence as being spoken by a narrator or character that is subsequently used to condition the TTS output. Experiments show our new TTS system is superior in expressiveness in both A-B preference and MOS testing compared to reading-style TTS and single-speaker TTS.

Please find the link for the paper here

Audio samples and additional analysis results are in supplementary material [link]
StoryNory TTS Dataset with speaker labels [link]. The original audio files scraped from the StoryNory website along with the full transcripts can be obtained here. Nvidia Nemo is used to segment and format into standard TTS dataset.
Fine-tuning script for predicting Narrator/character labels using Bert [link]
Fine-tuned Bert checkpoint [link]
The basic architecture is VITS in multi-speaker setting [code] [paper]
VITS single speaker checkpoint trained on StoryNory TTS datset [link]
VITS multi-speaker checkpoint trained on StoryNory TTS dataset [link]
Interactive Colab for conducting AB preference test [link]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
Story_telling__TTS_AB_Evaluation.ipynb		Story_telling__TTS_AB_Evaluation.ipynb
TSNE.ipynb		TSNE.ipynb
VITS_NC_architecture.png		VITS_NC_architecture.png
data.csv		data.csv
narrator_classification.ipynb		narrator_classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Narrator or Character: Voice Modulation in an Expressive Multi-speaker TTS

About

Releases

Packages

Languages

tpavankalyan/Storynory

Folders and files

Latest commit

History

Repository files navigation

Narrator or Character: Voice Modulation in an Expressive Multi-speaker TTS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages