Skip to content
master
Go to file
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
afr
 
 
amh
 
 
ara
 
 
asm
 
 
aze
 
 
 
 
bel
 
 
 
 
ben
 
 
bih
 
 
bod
 
 
bos
 
 
bul
 
 
cat
 
 
ceb
 
 
ces
 
 
 
 
 
 
 
 
 
 
chr
 
 
cym
 
 
dan
 
 
deu
 
 
div
 
 
dzo
 
 
ell
 
 
eng
 
 
enm
 
 
epo
 
 
est
 
 
eus
 
 
fas
 
 
fin
 
 
fra
 
 
frk
 
 
frm
 
 
gle
 
 
 
 
glg
 
 
grc
 
 
guj
 
 
hat
 
 
heb
 
 
hin
 
 
hrv
 
 
hun
 
 
 
 
iku
 
 
ind
 
 
isl
 
 
ita
 
 
 
 
jav
 
 
jpn
 
 
 
 
kan
 
 
kat
 
 
 
 
kaz
 
 
khm
 
 
kir
 
 
kmr
 
 
kor
 
 
 
 
lao
 
 
lat
 
 
lav
 
 
lit
 
 
mal
 
 
mar
 
 
mkd
 
 
mlt
 
 
mri
 
 
msa
 
 
mya
 
 
nep
 
 
nld
 
 
nor
 
 
ori
 
 
pan
 
 
pol
 
 
por
 
 
pus
 
 
ron
 
 
rus
 
 
 
 
san
 
 
sin
 
 
slk
 
 
slv
 
 
snd
 
 
spa
 
 
 
 
sqi
 
 
srp
 
 
 
 
swa
 
 
swe
 
 
syr
 
 
tam
 
 
tel
 
 
tgk
 
 
tgl
 
 
tha
 
 
tir
 
 
tur
 
 
tyv
 
 
uig
 
 
ukr
 
 
urd
 
 
uzb
 
 
 
 
vie
 
 
yid
 
 
zlm
 
 
 
 
 
 
 
 
 
 
 
 

README.md

langdata

Source training data for Tesseract for lots of languages

Want to re-train tesseract for a specific language, by modifying/augmenting the original training data? Then you have come to the right place!

If you want to find a language data set to run Tesseract, then look at our tessdata repository instead.

To re-create the training of a single language, lang, you need the following:

  • All the data in the lang directory.
  • The corresponding unicharset/xheights files for the script(s) used by lang.
  • All the remaining non-lang-specific files in the top-level directory, such as font_properties.
  • You also need to obtain the fonts needed to train the language. Some languages were trained with commercially available fonts, so you will need to buy them in order to reproduce the training exactly, or use substitutes.

About

Source training data for Tesseract for lots of languages

Resources

License

Packages

No packages published
You can’t perform that action at this time.