How to disentangle style and speaker information? #74

mozykhau · 2023-01-30T14:25:18Z

I would like to transfer speech style of one speaker and apply it to the another speaker, while preserving identity of the speaker.

Do you have any advice how to use it for emotional cross-speaker style transfer? I thought about adding additional discriminator to classify speaker id but how to define domains in such case?

Thanks

yl4579 · 2023-01-31T20:58:01Z

You can define the domains in terms of emotions instead of speakers. This way you can preserve the speakers but only convert emotions.

mozykhau · 2023-02-27T12:31:45Z

Thanks, Approach of defining domains as emotions instead of speakers worked but sometimes it messed up speaker identity for specific emotional domains.
Found an interesting research for EVC based on StarGANv2-vc by Sony Research India: https://arxiv.org/pdf/2302.10536.pdf. They added second encoder and classifier for speaker domain for better disentanglement.

chiaki-luo · 2023-03-24T07:21:27Z

Maybe that's because a same people speaks too many same emotional sentences?

CONGLUONG12 · 2023-04-08T17:12:04Z

@yl4579 Hi，thanks for this project.
I want to know if this domain is the emotional category of one speaker or many speakers?

You can define the domains in terms of emotions instead of speakers. This way you can preserve the speakers but only convert emotions.

yl4579 · 2023-04-09T06:10:27Z

@CONGLUONG12 It should be of multiple speakers. You can refer to https://arxiv.org/pdf/2302.10536.pdf for more details. This is a good example of how to modify StarGANv2-VC for emotion conversion.

CONGLUONG12 · 2023-04-11T10:35:53Z

@yl4579 Thank you very much.
In your demo, you chose a speaker with a specific emotion. With this emotion, if you choose another speaker (call speaker A) included in the training set, you will have sound with this emotion and timbre of A?

yl4579 · 2023-04-16T07:07:54Z

@CONGLUONG12 Probably yes, if speaker A has samples in the training set with similar emotions, otherwise it might not work.

gnekt · 2023-09-05T13:05:14Z

Hey There!
I made something similar for my MSc Degree in AI starting from the great implementation of @yl4579
Take a look there for some hint Here

yl4579 added the discussion New research topic label Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to disentangle style and speaker information? #74

How to disentangle style and speaker information? #74

mozykhau commented Jan 30, 2023

yl4579 commented Jan 31, 2023

mozykhau commented Feb 27, 2023

chiaki-luo commented Mar 24, 2023

CONGLUONG12 commented Apr 8, 2023

yl4579 commented Apr 9, 2023

CONGLUONG12 commented Apr 11, 2023

yl4579 commented Apr 16, 2023

gnekt commented Sep 5, 2023

How to disentangle style and speaker information? #74

How to disentangle style and speaker information? #74

Comments

mozykhau commented Jan 30, 2023

yl4579 commented Jan 31, 2023

mozykhau commented Feb 27, 2023

chiaki-luo commented Mar 24, 2023

CONGLUONG12 commented Apr 8, 2023

yl4579 commented Apr 9, 2023

CONGLUONG12 commented Apr 11, 2023

yl4579 commented Apr 16, 2023

gnekt commented Sep 5, 2023