Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App with more 2 or more threads with Tesseract will deadlock on Linux (but not on OSX). #2312

Closed
Sauraus opened this issue Mar 12, 2019 · 6 comments
Labels

Comments

@Sauraus
Copy link

@Sauraus Sauraus commented Mar 12, 2019

Environment

  • Tesseract Version: 4.0.0 (built from source)
  • Commit Number: GitHub tag 4.0.0
  • Platform: Ubuntu 18.04, AWS EC2 c5.xlarge / c5.2xlarge / c5.4xlarge

Current Behavior:

We have a golang app that runs multiple threads to analyze video frames and as part of that we do some OCR using https://github.com/otiai10/gosseract.

If we run this code in a Docker container locally on a MacBook Pro or ECS or a VM in EC2 with 16 cores the application will deadlock with the tesseract code enabled.

NOTE: What is interesting is that running the same code straight up on a MacBook Pro does not deadlock the code.

Expected Behavior:

Multi-threaded application can work.

Suggested Fix:

I built tesseract with ./configure --disable-openmp but I am not sure if that actually affects the way the libraries are built or just the stand alone app and also tried the OMP_THREAD_LIMIT=1 env variable neither option seems to do much good.

@Sauraus Sauraus changed the title 2 or more threads dead-lock on 4/8/16 core VMs in EC2 App with more 2 or more threads with Tesseract will deadlock on Linux (but not on OSX). Mar 12, 2019
@Sauraus

This comment has been minimized.

Copy link
Author

@Sauraus Sauraus commented Mar 12, 2019

https://github.com/tesseract-ocr/tesseract/wiki/FAQ#can-i-increase-speed-of-ocr

Thanks for the RTFM answer, very helpful.

In this case it's not about increasing the speed of OCR as such, I've got way more going on when I am analyzing the video frame and I was hoping I could have each frame analyzed in its own thread such that the process would be self contained and I could scale this up if I need to analyze more frames per second.

Furthermore allowing Tesseract to run in each thread would also prevent me from having to build a solution that will consolidate the data for each frame from multiple sources.

@magnusja

This comment has been minimized.

Copy link

@magnusja magnusja commented Jun 5, 2019

same for me using libtesseract from pyocr. Any solution?

@zdenop

This comment has been minimized.

Copy link
Contributor

@zdenop zdenop commented Jun 5, 2019

solution is to turn off openmp support, that is causing this.

@magnusja

This comment has been minimized.

Copy link

@magnusja magnusja commented Jun 5, 2019

thanks, OMP_THREAD_LIMIT=1 works for me.

@zdenop zdenop closed this Jun 5, 2019
@stweil stweil added the bug label Jun 5, 2019
@stweil

This comment has been minimized.

Copy link
Contributor

@stweil stweil commented Jun 5, 2019

I am afraid that there is a potential misunderstanding.

The tesseract executable can use multithreading to speed up the OCR processing of a single page. The gain is not really large, it costs excessive CPU overhead, and so the suggested solution is to disable that, either at compile time (--disable-openmp) or at run time (OMP_THREAD_LIMIT=1).

Applications which use the Tesseract library can do the same, but in addition they can use more than one thread to process several pages in parallel. @Sauraus, I have understood that you tried to do this and got a deadlock. Am I right? Then this is a bug which should be fixed. It would help to use a Tesseract library with debug information, attach a debugger to the locked process and get stack traces for all threads to see where they are hanging.

@Sauraus

This comment has been minimized.

Copy link
Author

@Sauraus Sauraus commented Jun 5, 2019

@stweil you are correct in your description of the behaviour observed when using Tesseract as a lib.

However the problem also appears to have resolved itself as we are no longer seeing deadlocks, however I cannot say with certainty what go dependency did the trick.

@stweil stweil added the wontfix label Jun 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.