iRun

A web app to help with the pronunciation of Turkish words and phrases

Website: irun.fyi

Scripts

Install dependencies : yarn install
Lint source code : yarn lint
Preprocess data : yarn preprocess
Start development server : yarn dev
Build and generate the app page to the /out directory : yarn export
Serve the generated page in the /out directory : yarn serve

How does it work?

Preprocessing steps - `/preprocessing`

The words which do not exist in the standard English dictionary are filtered from CMUdict. generate-filtered-dict.js
From the filtered CMUdict entries, a reverse mapping (from one pronunciation to possibly multiple words) is generated. generate-reverse-multimap.js
The raw English word frequency data file is parsed. generate-frequency-map.js
The words with the same pronunciation but lower usage frequency are eliminated from the reverse mapping. generate-reverse-map.js

Pronunciation algorithm - `/pronunciation`

All possible syllable combinations are generated from the input Turkish word. hyphenate-all.js
The letters in the syllables are written using the alternatives in CMUdict phonetic alphabet. phonetic-map.json
The result is searched in the reverse mapping file. reverse-map.json
If no match is found for a syllable, simple translations are applied to each letter. letter-pronunciation-map.json
The results are sorted prioritizing:
- the ones with the most English word matches
- the one which fits the Turkish natural hyphenation
The first 10 of the best results are returned.

Example input: `bahadır`

(1). ['bah', 'ad', 'ır'], ['ba', 'had', 'ır'], ['bah', 'a', 'dır'], ['ba', 'ha', 'dır']
(2). [[['B', 'AA', 'HH'], ['AA', 'D'], ['AH0', 'R']], ... (all combinations) ... ]
(3, 4). ['baah-odd-er', 'bah-hud-er', 'baah-uh-derr', 'bah-huh-derr']
(5). ['bah-hud-er', 'bah-huh-derr', 'baah-odd-er', 'baah-uh-derr']

User interface - `/app`

Consists of a single Next.js statically-generated page with no back-end.
The reverse mapping file is loaded to the client app, so the algorithm runs on the browser.

References

Pronunciation dictionary data source: Carnegie Mellon Pronouncing Dictionary
Word frequency data source: English Word Frequency dataset on Kaggle
Text-to-speech API: Voice RSS
Icons: Freepik on Flaticons
NPM packages: Next.js, React, Blueprint

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
app		app
data		data
pages		pages
preprocessing		preprocessing
pronunciation		pronunciation
public		public
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iRun

Scripts

How does it work?

Preprocessing steps - `/preprocessing`

Pronunciation algorithm - `/pronunciation`

Example input: `bahadır`

User interface - `/app`

References

About

Languages

License

tuluce/iRun

Folders and files

Latest commit

History

Repository files navigation

iRun

Scripts

How does it work?

Preprocessing steps - /preprocessing

Pronunciation algorithm - /pronunciation

Example input: bahadır

User interface - /app

References

About

Resources

License

Stars

Watchers

Forks

Languages

Preprocessing steps - `/preprocessing`

Pronunciation algorithm - `/pronunciation`

Example input: `bahadır`

User interface - `/app`