Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower Performance in Latest Tesseract #1171

Closed
ibr123 opened this issue Oct 17, 2017 · 14 comments
Closed

Slower Performance in Latest Tesseract #1171

ibr123 opened this issue Oct 17, 2017 · 14 comments

Comments

@ibr123
Copy link

ibr123 commented Oct 17, 2017

Hi,

i have installed Tesseract: 4.00.00dev-690-g1b0379c with Leptonica: 1.74.4 and its working fine with the detection and all, but i have noticed that the performance is slower than before (comparing with 5 months ago tesseract, and leptonica 1.74.1).

in the past the time was around 4 or 5 seconds but lately its almost the double, that command that im using is the normal tesseract detection command which is: **tesseract image results -l lang--tessdata-dir ./tessdata --oem 1 ** , so am i missing something or is there some sort of a parameter that i should add after the updates to the tesseract or leptonica? or any other way to enhance the performance speed? (for both single thread case or multi thread case)

Thank you

@amitdo
Copy link
Collaborator

amitdo commented Oct 17, 2017

Slower Performance in Latest Tesseract

It's not clear if you're comparing a newer 4.00 to older 4.00 or 4.00 to 3.05.

@amitdo
Copy link
Collaborator

amitdo commented Oct 17, 2017

Also, do you use the newest traineddata for 4.0?

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Oct 18, 2017 via email

@amitdo
Copy link
Collaborator

amitdo commented Oct 18, 2017

or any other way to enhance the performance speed? (for both single thread case or multi thread case)

If you use multi-threading try disabling OpenMP.
OMP_THREAD_LIMIT=1 tesseract in.png out --oem 1

.

@ibr123
Copy link
Author

ibr123 commented Oct 18, 2017

@amitdo actually im comparing the latest (4.00.00dev-690-g1b0379c with Leptonica: 1.74.4 ) with the older version (4.00.00dev-549-g2b854e3 with leptonica 1.74.1)

@Shreeshrii "tessdata_fast" is a news to me, i'm already using the official traineddata, but i dont know about this one, can you please give me the link to it?, also i already created a tuned LSTM, can i also combine it with the new tessdata_fast as well?

Thank you both

@stweil
Copy link
Contributor

stweil commented Oct 18, 2017

The latest traineddata files are at https://github.com/tesseract-ocr/tessdata_best and https://github.com/tesseract-ocr/tessdata_fast. But if you want to compare the performance of an older Tesseract 4.00 with the latest version, you will have to use the same traineddata for both, usually from https://github.com/tesseract-ocr/tessdata. I'd disable multithreading for the test (set environment variable OMP_THREAD_LIMIT=1).

@Shreeshrii
Copy link
Collaborator

Please see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line

If you have the data for your finetuning, you can create the 'faster' integer type of traineddata by using
convert_to_int with stop_training.

@ibr123
Copy link
Author

ibr123 commented Oct 18, 2017

@Shreeshrii so i assume that if i fine tuned an LSTM file (made by older version tools) it won't combine with the new traineddate? (for example a traineddata from: https://github.com/tesseract-ocr/tessdata_best)
also you mean by "data for your fine tuning" as the following?
1
and the steps in the link that you have shared are to enhance accuracy, detection speed or both?

@stweil the difference between "tessdata_best" and "tessdata_fast" is the accuracy vs speed? meaning "tessdata_fast" will be faster in detection but wont be accurate as "tessdata_best" ?

Thanks for the answers

@ibr123 ibr123 closed this as completed Oct 18, 2017
@stweil
Copy link
Contributor

stweil commented Oct 18, 2017

the difference between "tessdata_best" and "tessdata_fast" is the accuracy vs speed? meaning "tessdata_fast" will be faster in detection but wont be accurate as "tessdata_best" ?

tessdata_fast is faster than tessdata_best, yes.
tessdata_best is generally better, but not always. I also noticed cases where tessdata_fast is better. And there are even cases where the old Tesseract gives the best recognition rates of all current tessdata.

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Oct 18, 2017 via email

@ibr123
Copy link
Author

ibr123 commented Oct 18, 2017

if i wanted to fine tune using the tool "lstmtraining" while i'm using the latest Tesseract: (4.00.00dev-690-g1b0379c) can i use .lstmf files (which are generated by tesstrain.sh)file that are created by older Tesseract version, such as (4.00.00dev-549-g2b854e3) ?
meaning are lstmf files compatible between tesseract versions?

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Oct 18, 2017 via email

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Oct 18, 2017 via email

@ibr123
Copy link
Author

ibr123 commented Oct 18, 2017

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants