-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert pdf preprocessor #108
Conversation
Thanks for the contribution! I verified that it builds locally, and triggered new docker images on dockerhub. (still processing) |
Hi guys, great work! i Tried this feature but even for very small PDFs (i.e. 2 pages) i got Unable to perform OCR decode. Error: Timeout waiting for RPC response Any ideas why this happens. I use tesseract3 insides the containers |
Any logs on the containers? I'm guessing it failed with some sort of error that didn't get propagated back. |
I have same issue. Logs:
|
Can you get logs on the worker container? Or maybe there isn't one running, which would explain the timeout. What does |
Worker container log is:
I have 4 containers running,
Line
of docker-compose.yml shoud be changed to:
if I am right? |
hello darmanovic, |
I suspected that stars are typos, but when I remove them, container won't run at all. LINE:
LOG:
|
Same error as @darmanovic. Someone solved it? |
Hi, all. Please have a look at #117 for a follow-up on this error. |
The tesseract engine now can be confronted with pdf files. This is achieved by a new ConvertPdf preprocessor.
Usage:
and afterwards it can be tested with:
Internal we are calling gs to create a multi page TIFF from our input. The ImageMagick won't work for this purpose because it creates a single paged image files which tesseract can't handle.
e.g.
Regards!