Skip to content

walker-hyf/ECSS

Repository files navigation

ECSS

Introduction

This is an implementation of the following paper. 《Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling》 (Accepted by AAAI2024)

Rui Liu *, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li.

Demo Page

Speech Demo

Dependencies

  • For details about the operating environment dependency, see FCTalker.
  • You also need to install PyTorch Geometric(Used to support heterogeneous graph neural networks)

Dataset

  • You can download dataset from DailyTalk.
  • You can get the emotion category and emotion intensity annotation information in the ./preprocessed_data/DailyTalk/ folder.

1_1_d30|1|{Y EH1 S AY1 N OW1}|yes, i know.|none|1 The format of each piece of data is representing sentence ID|speaker|phoneme sequence|original content|emotion|emotion intensity

Preprocessing

Run

python3 prepare_align.py --dataset DailyTalk

for some preparations.

For the forced alignment, Montreal Forced Aligner (MFA) is used to obtain the alignments between the utterances and the phoneme sequences. Pre-extracted alignments for the datasets are provided here. You have to unzip the files in preprocessed_data/DailyTalk/TextGrid/. Alternately, you can run the aligner by yourself. Please note that our pretrained models are not trained with supervised duration modeling (they are trained with learn_alignment: True).

After that, run the preprocessing script by

python3 preprocess.py --dataset DailyTalk

Training

Train your model with

python3 train.py --dataset DailyTalk

Inference

Only the batch inference is supported as the generation of a turn may need contextual history of the conversation. Try

python3 synthesize.py --source preprocessed_data/DailyTalk/val_*.txt --restore_step RESTORE_STEP --mode batch --dataset DailyTalk

to synthesize all utterances in preprocessed_data/DailyTalk/val_*.txt.

Citing

To cite this repository:

@article{liu2023emotion,
  title={Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling},
  author={Liu, Rui and Hu, Yifan and Ren, Yi and Yin, Xiang and Li, Haizhou},
  journal={arXiv preprint arXiv:2312.11947},
  year={2023}
}

Author

E-mail:hyfwalker@163.com

About

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages