Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stress a vowel manually #12

Open
NikitaKononov opened this issue Jun 26, 2023 · 5 comments
Open

Stress a vowel manually #12

NikitaKononov opened this issue Jun 26, 2023 · 5 comments

Comments

@NikitaKononov
Copy link

Hello
Thanks for your exciting work!
Can you please tell me, is there a possibility to stress a vowel manually with you phonemizer?
For example: alibab'a / alib'aba, з'амок / зам'ок
And does your phonemizer support stresses at all?

@xinjli
Copy link
Owner

xinjli commented Jun 27, 2023

thanks for your question!

I did not include the stress symbol during the training, so currently, it will not predict any stress on it.
For a few languages, I think the stress information is in the training set but I removed them before training.
Depending on the language, it might be possible to support stress in the future.

Can you tell me what's your application of using stress? and which language do you want to apply?

@NikitaKononov
Copy link
Author

NikitaKononov commented Jun 27, 2023

Thanks for the quick response

Can you tell me what's your application of using stress? and which language do you want to apply?

If your solution would support manual stress setting, I would use it for the following tasks:

  1. Phonemizing the input texts of text-to-speech models
  2. Phonemizing text for training phoneme level BERT (for text to speech tasks too)

Languages: English, Slavic (Polish, Russian etc.)
Manual stress correction is very important for correct phonemizing of proper names in English
For Slavic languages it's critically important, same sequences of characters can have different meanings depending on the stress.
But old tools like espeak unfortunately don't have an ability to control stress manually

@xinjli
Copy link
Owner

xinjli commented Jun 29, 2023

For English, I think it is possible to train a model supporting stress as it is included in the CMU dictionary.
I am not very sure whether the Slavic languages also have these annotations there, it would be difficult if there are no datasets containing stress. Do you know any pronunciation datasets supporting stress?

@NikitaKononov
Copy link
Author

For English, I think it is possible to train a model supporting stress as it is included in the CMU dictionary. I am not very sure whether the Slavic languages also have these annotations there, it would be difficult if there are no datasets containing stress. Do you know any pronunciation datasets supporting stress?

I can try to find them
Can you please give an example piece of dataset?
It should be like word / phoneme form?
and stress must be set with char + or '?

@xinjli
Copy link
Owner

xinjli commented Jun 30, 2023

it should be something like the following format with fields delimited by space

word phoneme1 phoneme2 stress1 phoneme3

Currently, I removed all non IPA symbols, so there need some code changes to include your stress symbol, I think you can assign whatever character you think is appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants