Using different set of fonts for Persian #294

ebraminio · 2016-03-29T00:19:22Z

First of all, thanks for adding support to tesseract finally. From quickly inspecting Persian related codes on tesseract I reached to https://github.com/tesseract-ocr/tesseract/blob/master/training/language-specific.sh#L520 which I can say speculatively is not a good set of fonts for training Persian printed text and can result in poor performance of OCR quality as most Persian fonts don't have the style these fonts have. On "Font recognition using Variogram fractal dimension", a good set of Persian fonts is introduced (second page, at the bottom) which as you can see there also, it is different from favorites Arabic language fonts (even the fact both are using Arabic script). So for training Persian OCR for tesseract I suggest adding or replacing current fonts with these free fonts, Nazli (i.e. Nazanin as indicated on that article) and Titr from Debian fonts-farsiweb package and also XB Zar and XB Yaghut from OFL licensed xfonts. I think also @roozbehp @behdad from Google can help you on this. Thank you.

amitdo · 2016-05-27T10:03:47Z

I suggest you repost this here:
https://github.com/tesseract-ocr/langdata
and then close this issue.

but don't repeat calling/mentioning those 2 people.

ebraminio mentioned this issue May 27, 2016

Use a different set of fonts for Persian tesseract-ocr/langdata#26

Closed

ebraminio closed this as completed May 27, 2016

amitdo mentioned this issue Sep 14, 2016

Arabic language (right to left in writing) stored (left to right) after create PDF Searchable #238

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using different set of fonts for Persian #294

Using different set of fonts for Persian #294

ebraminio commented Mar 29, 2016

amitdo commented May 27, 2016

Using different set of fonts for Persian #294

Using different set of fonts for Persian #294

Comments

ebraminio commented Mar 29, 2016

amitdo commented May 27, 2016