Skip to content

Data Files Contributions

Shreeshrii edited this page Mar 7, 2019 · 16 revisions

This page lists repositories with Tesseract4 compatible tessdata (for --oem 1 - LSTM) by Tesseract community.

Such tessdata contributions should ideally document everything needed to reproduce the training process (fonts, images, ground truth, texts, scripts, documentation, ...).


Language Code Language Data File Contributor Info
khmLimon Khmer best OpenInstituteCambodia/phyrumsk PR in tessdata_best
cop Coptic best shreeshrii/tessdata_coptic tesseract-ocr forum post
jpn_vert Japanese Vertical best zodiac3539/jpn_vert tesseract-ocr forum post
ocrb_plus MRZ best shreeshrii/tessdata_ocrb tesseract-ocr forum post
jav_java Aksara Jawa Best User/Repo
LangCode Language Best Shreeshrii/tessdata_jav_java tesseract-ocr forum post

As of 02/02/2020


These wiki pages are no longer maintained.

All pages were moved to tesseract-ocr/tessdoc.

The latest documentation is available at https://tesseract-ocr.github.io/.


Clone this wiki locally