Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
4.0 with LSTM
Tesseract 4.0 alpha source code is available in the 'master' branch of the repository. It adds a new OCR engine based on LSTM neural networks. It initially works (well) on x86/Linux. Model data for 101 languages is available in the tessdata repository.
Training Tesseract LSTM engine
3.0 version of box files can be converted for use with LSTM training by adding a tab character at end of each line and boxes with space after each word.
Mark EOL and
Mark EOL Bulk functions under
Box Editor tab of latest version of jTessBoxEditor - jTessBoxEditor-2.0-Beta can be used to add the EOL tabs automatically. Insert mode can be used on last letter of each word to add a box with space. There is no automated way to do this.
Unofficial Ubuntu PPAs for Tesseract 4.00 & Leptonica 1.74:
Leptonica 1.74.1 package for Debian:
4.0.0-alpha for Windows
Unofficial experimental binaries of tesseract-ocr 4.0.0-alpha are available from the following links. Each one is from a different commit from master branch in early 2017. See individual sites for more details:
- Windows Installer made with MinGW-w64 from UB Mannheim
- zip file with cppan generated .dll and .exe files, You have to install VC2015 x86 redist from microsoft.com in order to run them.
- Win64 build of tesseract 4.0.0 alpha, leptonica 1.74.1, and charlesw/tesseract .Net wrapper - built using CPPAN for Visual Studio 2017.
4.0.0-alpha with GUI frontend
Visual C++ Redistributable for Visual Studio 2015 runtime - vc_redist.x86.exe is REQUIRED for VietOCR to run correctly.
VietOCR can be used to download appropriate 4.0.0alpha traineddata for additional languages.
Download 4.0.0alpha traineddata to use with the above from master branch of tessdata. e.g. for Hindi download the following file:
The [3.05 branch on GitHub] (https://github.com/tesseract-ocr/tesseract/tree/3.05) can be used by those who want the bug fixes for 3.05.01 release.
An unofficial installer for Tesseract 3.05-dev for Windows is available from Tesseract at UB Mannheim. This includes the training tools.
Current official release
The current official release is 3.05.01.