ECSS

Introduction

This is an implementation of the following paper. 《Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling》 (Accepted by AAAI2024)

Rui Liu *, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li.

Demo Page

Speech Demo

Dependencies

For details about the operating environment dependency, see FCTalker.
You also need to install PyTorch Geometric（Used to support heterogeneous graph neural networks）

Dataset

You can download dataset from DailyTalk.
You can get the emotion category and emotion intensity annotation information in the ./preprocessed_data/DailyTalk/ folder.

1_1_d30|1|{Y EH1 S AY1 N OW1}|yes, i know.|none|1 The format of each piece of data is representing sentence ID|speaker|phoneme sequence|original content|emotion|emotion intensity

Preprocessing

Run

python3 prepare_align.py --dataset DailyTalk

for some preparations.

For the forced alignment, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences. Pre-extracted alignments for the datasets are provided here. You have to unzip the files in preprocessed_data/DailyTalk/TextGrid/. Alternately, you can run the aligner by yourself. Please note that our pretrained models are not trained with supervised duration modeling (they are trained with learn_alignment: True).

After that, run the preprocessing script by

python3 preprocess.py --dataset DailyTalk

Training

Train your model with

python3 train.py --dataset DailyTalk

Inference

Only the batch inference is supported as the generation of a turn may need contextual history of the conversation. Try

python3 synthesize.py --source preprocessed_data/DailyTalk/val_*.txt --restore_step RESTORE_STEP --mode batch --dataset DailyTalk

to synthesize all utterances in preprocessed_data/DailyTalk/val_*.txt.

Citing

To cite this repository:

@article{liu2023emotion,
  title={Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling},
  author={Liu, Rui and Hu, Yifan and Ren, Yi and Yin, Xiang and Li, Haizhou},
  journal={arXiv preprint arXiv:2312.11947},
  year={2023}
}

Author

E-mail：hyfwalker@163.com

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
audio		audio
config/DailyTalk		config/DailyTalk
deepspeaker		deepspeaker
hifigan		hifigan
lexicon		lexicon
model		model
preprocessed_data/DailyTalk		preprocessed_data/DailyTalk
preprocessor		preprocessor
text		text
utils		utils
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
evaluate.py		evaluate.py
prepare_align.py		prepare_align.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
synthesize.py		synthesize.py
train.py		train.py

License

walker-hyf/ECSS

Folders and files

Latest commit

History

Repository files navigation

ECSS

Introduction

Demo Page

Dependencies

Dataset

Preprocessing

Training

Inference

Citing

Author

About

Resources

License

Stars

Watchers

Forks

Languages