-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to disentangle style and speaker information? #74
Comments
You can define the domains in terms of emotions instead of speakers. This way you can preserve the speakers but only convert emotions. |
Thanks, Approach of defining domains as emotions instead of speakers worked but sometimes it messed up speaker identity for specific emotional domains. |
Maybe that's because a same people speaks too many same emotional sentences? |
@yl4579 Hi,thanks for this project.
|
@CONGLUONG12 It should be of multiple speakers. You can refer to https://arxiv.org/pdf/2302.10536.pdf for more details. This is a good example of how to modify StarGANv2-VC for emotion conversion. |
@yl4579 Thank you very much. |
@CONGLUONG12 Probably yes, if speaker A has samples in the training set with similar emotions, otherwise it might not work. |
I would like to transfer speech style of one speaker and apply it to the another speaker, while preserving identity of the speaker.
Do you have any advice how to use it for emotional cross-speaker style transfer? I thought about adding additional discriminator to classify speaker id but how to define domains in such case?
Thanks
The text was updated successfully, but these errors were encountered: