No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
zdenop Merge pull request #99 from stweil/master
Add scripts from tessdata_best (converted to fast integer models)
Latest commit 590567f May 10, 2018
Permalink
Failed to load latest commit information.
script Add scripts from tessdata_best (converted to fast integer models) May 10, 2018
COPYING add license info Aug 3, 2015
README.md Fix typo in README.md Apr 17, 2018
afr.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
amh.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
ara.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
asm.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
aze.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
aze_cyrl.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
bel.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
ben.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
bod.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
bos.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
bre.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
bul.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
cat.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
ceb.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
ces.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
chi_sim.traineddata Update traineddata LSTM model with best model converted to integer May 10, 2018
chi_sim_vert.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
chi_tra.traineddata Update traineddata LSTM model with best model converted to integer May 10, 2018
chi_tra_vert.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
chr.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
cos.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
cym.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
dan.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
dan_frak.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
deu.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
deu_frak.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
div.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
dzo.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
ell.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
eng.traineddata Remove cube components from traineddata and update version component May 10, 2018
enm.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
epo.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
equ.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
est.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
eus.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
fao.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
fas.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
fil.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
fin.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
fra.traineddata Remove cube components from traineddata and update version component May 10, 2018
frk.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
frm.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
fry.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
gla.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
gle.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
glg.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
grc.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
guj.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
hat.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
heb.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
hin.traineddata Only int best model for hin, san, mar, nep, tel and kan Mar 22, 2018
hrv.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
hun.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
hye.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
iku.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
ind.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
isl.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
ita.traineddata Remove cube components from traineddata and update version component May 10, 2018
ita_old.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
jav.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
jpn.traineddata Update traineddata LSTM model with best model converted to integer May 10, 2018
jpn_vert.traineddata Remove parameter textord_tabfind_vertical_horizontal_mix Mar 29, 2018
kan.traineddata Only int best model for hin, san, mar, nep, tel and kan Mar 22, 2018
kat.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
kat_old.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
kaz.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
khm.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
kir.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
kor.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
kor_vert.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
kur.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
kur_ara.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
lao.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
lat.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
lav.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
lit.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
ltz.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
mal.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
mar.traineddata Only int best model for hin, san, mar, nep, tel and kan Mar 22, 2018
mkd.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
mlt.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
mon.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
mri.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
msa.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
mya.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
nep.traineddata Only int best model for hin, san, mar, nep, tel and kan Mar 22, 2018
nld.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
nor.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
oci.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
ori.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
osd.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
pan.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
pol.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
por.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
pus.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
que.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
ron.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
rus.traineddata Remove cube components from traineddata and update version component May 10, 2018
san.traineddata Only int best model for hin, san, mar, nep, tel and kan Mar 22, 2018
sin.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
slk.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
slk_frak.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
slv.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
snd.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
spa.traineddata Remove cube components from traineddata and update version component May 10, 2018
spa_old.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
sqi.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
srp.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
srp_latn.traineddata Update LSTM Models to integerized tessdata_best for files < 25mb Mar 22, 2018
sun.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
swa.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
swe.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
syr.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
tam.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
tat.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
tel.traineddata Only int best model for hin, san, mar, nep, tel and kan Mar 22, 2018
tgk.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
tgl.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
tha.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
tir.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
ton.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
tur.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
uig.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
ukr.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
urd.traineddata remove legacy model from indic and arabic script languages Mar 22, 2018
uzb.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
uzb_cyrl.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
vie.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
yid.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018
yor.traineddata Update LSTM Models to integerized tessdata_best for files<25mb Mar 22, 2018

README.md

tessdata

These language data files only work with Tesseract 4.0.0. They are based on the sources in tesseract-ocr/langdata on GitHub. (still to be updated for 4.0.0 - 20180322)

These have models for legacy tesseract engine (--oem 0) as well as the new LSTM neural net based engine (--oem 1).

The LSTM models (--oem 1) in these files have been updated to the integerized versions of tessdata_best on GitHub. So, they should be faster but probably a little less accurate than tessdata_best.

tessdata_fast on GitHub provides an alternate set of integerized LSTM models which have been built with a smaller network. tessdata_fast files are the ones packaged for Debian and Ubuntu.

The legacy tesseract models (--oem 0) have been removed for Indic and Arabic script language files.

tessdata for 3.04 or 3.05

Get language data files for Tesseract 3.04 or 3.05 from the 3.04 tree.

More information and a complete list of all languages is available in the Tesseract wiki.