Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README
alphabets.grm
consonants.grm
crossword.grm
definitions.grm
diphthongs.grm
g2p.grm
inflections.grm
make.sh
palatalization.grm
syllabification.grm
tester.sh
transduce.sh
vowels.grm

README

To compile the rules and generate the final FSTs, just run ./make.sh. The
exported FSTs are defined in the grammar file g2p.grm. The two FST files
G2P1 and G2P2 are defined for being used with the Python transcriber
script. They are split in two parts to minimize size on disk, but you may
just export a single FST file, if you wish.

After succesful compilation, you might test the rules using either:

- tester.sh: a very simple interactive loop. Just write in or paste the
  words/sentences you want to test.

- transcribe.sh: which will transcribe a file with words or sentences
  given as input.

Please, take the following in account:

- you will be able only to test words or sentences that have previously
  been normalized. "Normalization" means in this case that the set of
  characters used in the input string must be contained in the input
  alphabet defined in the first FST (in_feeder in alphabets.grm). This
  is done already when you use scripts/tts_transcriber.py

- second, the strings should contain a stress marker (the "+" char) for
  optimal accuracy. This information comes either from the exceptions
  lexicon or from the stress prediction model.

Thanks:

The implementation of the rules would not have been possible without the
Russian language advice from my colleague at Yandex Anastasiya Polkanova.

Sources:

- Chew, Peter A. (2003): A Computational Phonology of Russian. Dissertation.com
- Jones, Daniel & Ward, Dennis (1969): The Phonetics of Russian. Cambridge University Press
You can’t perform that action at this time.