4.0 with LSTM

Shreeshrii edited this page Feb 13, 2017 · 27 revisions

4.0

Tesseract 4.0 alpha source code is available in the 'master' branch of the repository. It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in the tessdata repository.

Documentation

4.0.0-alpha ppa

Unofficial Ubuntu PPAs for Tesseract 4.00 & Leptonica 1.74:

Leponica 1.74.1 package for Debian:

4.0.0-alpha for Windows

Unofficial experimental binaries of tesseract-ocr 4.0.0-alpha (Jan 30, 2017) are available from the following links:

Unofficial binaries of tesseract-ocr 4.0.0-alpha [as of commit 2f10be5] with GUI interface are available for gImageReader from

Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:

https://github.com/tesseract-ocr/tessdata/blob/master/hin.traineddata *

3.05-dev for Windows

An unofficial installer for Tesseract 3.05-dev for Windows is available from Tesseract at UB Mannheim. This includes the training tools.

The 3.05 branch on GitHub can be used by those who want the bug fixes for 3.04 release.

3.04.1

The current official release is 3.04.1.