Tajiki Resources #48

Shreeshrii · 2017-02-04T13:19:15Z

Ref: tesseract-ocr/tesseract#654 (comment)

tgk - tg - Tajiki - http://crubadan.org/languages/tg

Shreeshrii · 2017-04-01T02:06:36Z

@theraysmith commented 2 days ago
Update: after going back to the www to get fresh data, I believe that my corpus text is now good for:
chr
dzo
iku
snd
syr
tgk
tir
I have put a lot of time into cleaners/filters for languages that use 'virama' characters.
I am not convinced that they are perfect, but I will add the code to the github repo in due course, so experts/native speakers can offer suggestions/fixes to make them better.

Shreeshrii closed this as completed Apr 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tajiki Resources #48

Tajiki Resources #48

Shreeshrii commented Feb 4, 2017 •

edited

Shreeshrii commented Apr 1, 2017

Tajiki Resources #48

Tajiki Resources #48

Comments

Shreeshrii commented Feb 4, 2017 • edited

Shreeshrii commented Apr 1, 2017

Shreeshrii commented Feb 4, 2017 •

edited