Tesseract Open Source OCR Engine (main repository)
C++ C Shell Java Makefile CMake Other
Latest commit 182ca5b Nov 7, 2016 @zdenop zdenop committed on GitHub Merge pull request #470 from stweil/fix
Fix crash caused by undefined value of local variable
Permalink
Failed to load latest commit information.
android Fixes #74 NO_CUBE_BUILD with reverting to ANDROID_BUILD in baseapi Aug 9, 2015
api api/baseapi: Fix memory leaks at program termination Oct 25, 2016
ccmain ccmain/paragraphs: Fix memory leak Oct 24, 2016
ccstruct ccstruct/polyblk: Fix memory leak Oct 24, 2016
ccutil Merge pull request #447 from stweil/leak Oct 24, 2016
classify Fix crash caused by undefined value of local variable Nov 7, 2016
cmake Update Configure.cmake Jul 17, 2016
contrib helper script to generate dawg input files from text Oct 17, 2016
cube Fix more typos in comments (found by codespell) Nov 4, 2015
cutil cutil/cutil: Fix comment (copy+paste error) Oct 24, 2016
dict dict/dict: Fix memory leaks at program termination Oct 25, 2016
doc Doxyfile: Fix typo in comment (found by codespell) Sep 14, 2015
java Java: Fix typos in comments and strings Sep 14, 2015
neural_networks/runtime Revert "temporary add config/*, configure and Makefile.in for release" Jul 31, 2015
opencl opencl: Remove unused function getNumDeviceWithEmptyScore Nov 7, 2016
tessdata remove install-langs - fix #376 Sep 1, 2016
testing Add LTR & mixed direction test files Feb 17, 2016
textord textord: Remove unused constants Sep 6, 2016
training training: Remove unnecessary const qualifiers Oct 8, 2016
viewer viewer/svutil: Fix resource leak Oct 24, 2016
vs2010 vs2010: Fix implementation of strcasestr Aug 31, 2016
wordrec Fix format string for tprintf Mar 17, 2016
.gitignore Implement CPPAN support for easy Windows building. Jun 29, 2016
.travis.yml Turn off macos travis build as it fails during bootstrap. Oct 11, 2016
AUTHORS Integrated patch to AUTHORS fixing issue 814 and adding more authors … Jan 3, 2013
CMakeLists.txt Update CMakeLists.txt Sep 5, 2016
CONTRIBUTING.md CONTRIBUTING.md: Fix a typo May 29, 2016
COPYING Fix grammar in license file Dec 7, 2015
ChangeLog fix invalid release year for V3.04.01 May 21, 2016
Dockerfile Dockerifying using travis build script Mar 18, 2016
INSTALL install data files; small fix of INSTALL, README; removed ABOUT-NLS (… Feb 5, 2012
INSTALL.GIT.md add info OSD data file is need too Sep 1, 2016
Makefile.am Makefile: Fix phony training target Jun 19, 2016
NEWS top-skimming import from sf.net Mar 7, 2007
README.md Update README.md Feb 17, 2016
appveyor.yml Update appveyor.yml Oct 16, 2016
autogen.sh autogen.sh: fix a bashism Jul 13, 2015
configure.ac opencl: Add tiff library needed by openclwrapper Oct 30, 2016
cppan.yml Update cppan.yml Sep 29, 2016
docker-compose.yml Dockerifying using travis build script Mar 18, 2016
tesseract.pc.in improve tesseract.pc.in - fixes #241 Mar 4, 2016

README.md

Build Status Build status

For the latest online version of the README.md see:

https://github.com/tesseract-ocr/tesseract/blob/master/README.md

About

This package contains an OCR engine - libtesseract and a command line program - tesseract.

The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see AUTHORS and github's log of contributors.

Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". It can be trained to recognize other languages. See Tesseract Training for more information.

Tesseract supports various output formats: plain-text, hocr(html), pdf.

This project does not include a GUI application. If you need one, please see the 3rdParty wiki page.

You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the image you are giving Tesseract.

The latest stable version is 3.04.01, released in February 2016.

Brief history

Tesseract was originally developed at Hewlett-Packard Laboratories Bristol and at Hewlett-Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998.

In 2005 Tesseract was open sourced by HP. Since 2006 it is developed by Google.

Release Notes

For developers

Developers can use libtesseract C or C++ API to build their own application. If you need bindings to libtesseract for other programming languages, please see the wrapper section on AddOns wiki page.

Documentation of Tesseract generated from source code by doxygen can be found on tesseract-ocr.github.io.

License

The code in this repository is licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

NOTE: This software depends on other packages that may be licensed under different open source licenses.

Installing Tesseract

You can either Install Tesseract via pre-built binary package or build it from source.

Running Tesseract

Basic command line usage:

tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfiles...]

For more information about the various command line options use tesseract --help or man tesseract.

Support

Mailing-lists:

Please read the FAQ before asking any question in the mailing-list or reporting an issue.