Skip to content
Shreeshrii edited this page May 1, 2018 · 66 revisions

A collection of frequently asked questions and the answers, or pointers to them for Tesseract 4.0.0.

For the older version of the FAQ pertaining to Tesseract 2.0x, 3.0x and 4.00.00alpha, please see FAQ Old.


If you have a question which is not answered by the FAQ, Wiki pages and Issues, please search in the users mailing-list/forum before posting it there.

If you think you found a bug in Tesseract, please search existing issues. If you find an existing similar issue, please add to it, otherwise create a new issue.

Read the CONTRIBUTING guide before you report an issue in GitHub or ask a question in the forum.

Table of Contents

(Please note, this page is currently being updated for 4.0.0).

Frequently Asked Questions

How do I get Tesseract 4.0.0?

See Tesseract Wiki Home page for details.

Which language models are available for Tesseract 4.0.0?

See Tesseract man page for the list of languages and scripts supported by Tesseract4.0.0.

See the Tesseract Wiki Data Files page for information regarding the language models available for Tesseract 4.0.0.

How do I improve OCR results?

You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the input image you are giving Tesseract.

How do I train Tesseract 4.0.0 LSTM Engine?

Tesseract can be trained to recognize other languages or finetune existing language models. See Tesseract Wiki Training Tesseract 4.00 page for information on training the LSTM engine.

Please note that currently LSTM training is only supported using synthetic images created using a UTF-8 training text and unicode fonts to render the text.

My question isn't in here!

Try searching the forum: http://groups.google.com/group/tesseract-ocr as well as open and closed issues on GitHub: https://github.com/tesseract-ocr/tesseract/issues, as your question may have come up before even if it is not listed here.