Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract hangs #3960

Closed
vmato opened this issue Nov 9, 2022 · 13 comments
Closed

Tesseract hangs #3960

vmato opened this issue Nov 9, 2022 · 13 comments

Comments

@vmato
Copy link

vmato commented Nov 9, 2022

Environment

  • Tesseract Version:

tesseract 5.2.0
leptonica-1.82.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.4) : libpng 1.6.37+apng : libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.4
Found OpenMP 201811
Found libarchive 3.6.1 zlib/1.2.12 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.2
Found libcurl/7.85.0 OpenSSL/1.1.1o zlib/1.2.12 libpsl/0.21.1 (+libidn2/2.3.3) libssh2/1.10.0 nghttp2/1.48.0

  • Platform:
    FreeBSD hostname.com 13.1-RELEASE-p3 FreeBSD 13.1-RELEASE-p3 GENERIC amd64

Current Behavior:

Tesseract hangs without any result. Hanged process even can not be killed.
Commands like tesseract anyimage.png stdout or tesseract -l eng+rus anyimage.png stdout

Expected Behavior:

Tesseract works (produces result and exits)

@stweil
Copy link
Contributor

stweil commented Nov 9, 2022

Did you install the Tesseract which is provided by FreeBSD?

Can you run it with a debugger and get a stack trace when it is hanging? Or build it from source and debug the hanging process?

@vmato
Copy link
Author

vmato commented Nov 10, 2022

Yes, Tesseract is from binary packages (pkg install tesseract-5.2.0_1)

It's not possible to attach debugger or dump process with kill -QUIT.
Related discussion https://forums.freebsd.org/threads/process-hangs-in-run-state-and-can-not-be-killed-or-debugged.87035/

@amitdo
Copy link
Collaborator

amitdo commented Nov 10, 2022

Commands like tesseract anyimage.png stdout or tesseract -l eng+rus anyimage.png stdout

Does this hang happen with any image (as the name of the image indicates)?

Does this hang happen with this command:

Tesseract /path/to/myimage.png /path/to/output_filename_without_extension
?

@amitdo
Copy link
Collaborator

amitdo commented Nov 10, 2022

You can also try to prepend this to the tesseract command:

OMP_THREAD_LIMIT=1

@stweil
Copy link
Contributor

stweil commented Nov 10, 2022

Also interesting: is this issue limited to one specific installation / computer, or does it occur with any installation of FreeBSD / on several or all computers?

@amitdo
Copy link
Collaborator

amitdo commented Nov 10, 2022

Another thing I just noticed.

According to your tesseract -v output, tesseract does not detect any SIMD instruction. not even SSE4.1. How old is your machine? Do you run this in the cloud? or locally inside a VM? Did you somehow disabled SIMD?

@stweil
Copy link
Contributor

stweil commented Nov 10, 2022

That's indeed very strange because any amd64 platform should at least show support for SSE 4.1.

@amitdo
Copy link
Collaborator

amitdo commented Nov 10, 2022

any amd64 platform should at least show support for SSE 4.1

First 64-bit CPU from AMD came out in 2003, a few years before SSE4.1 was introduced.

Still, in practice you are right (unless the machine is antique).

https://en.wikipedia.org/wiki/X86-64

https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level#background_of_the_x86_64_microarchitecture_levels

https://www.phoronix.com/news/GCC-11-x86-64-Feature-Levels

https://www.phoronix.com/news/LLVM-12-Clang-12-Feature-Over

@vmato
Copy link
Author

vmato commented Nov 11, 2022

Does this hang happen with any image
We've tried several images. Always with same result

tesseract /home/user/test/test.png /home/user/test/test
Same freeze. But empty output file test.txt was created

OMP_THREAD_LIMIT=1
Didn't get any visible changes

is this issue limited to one specific installation / computer, or does it occur with any installation of FreeBSD

Currently we have 2 FreeBSD servers.
On second tesseract runs without this issue. Versions there are:

tesseract 5.2.0
leptonica-1.82.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.3) : libpng 1.6.37+apng : libtiff 4.4.0 : zlib 1.2.11 : libwebp 1.2.4
Found OpenMP 201811
Found libarchive 3.6.1 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.5.2
Found libcurl/7.85.0 OpenSSL/1.1.1o zlib/1.2.12 libpsl/0.21.1 (+libidn2/2.3.2) libssh2/1.10.0 nghttp2/1.48.0

Nothing about SSE here but than doesn's stop

Running inside vm used by https://www.hetzner.com/cloud
(dmesg indicates
CPU: AMD EPYC Processor (2445.48-MHz K8-class CPU
...
Hypervisor: Origin = "KVMKVMKVM"
...
ACPI APIC Table:
...
da0: <QEMU QEMU HARDDISK 2.5+> Fixed Direct Access SPC-3 SCSI device
)

@vmato
Copy link
Author

vmato commented Nov 14, 2022

Following trace may give additional info:
https://forums.freebsd.org/attachments/log_tail-txt.15069/

@amitdo
Copy link
Collaborator

amitdo commented Nov 14, 2022

The discussion in the FreeBSD forum indicates the issue is related to OpenMP.

OMP_THREAD_LIMIT=1
Didn't get any visible changes

It's not clear how you used this environment variable.

You need to do either of these:

OMP_THREAD_LIMIT=1 tesseract in.png output_name

Or

export OMP_THREAD_LIMIT=1
tesseract in.png output_name

@stweil
Copy link
Contributor

stweil commented Nov 14, 2022

According to your tesseract -v output, tesseract does not detect any SIMD instruction. not even SSE4.1.

That's a bug in Tesseract (fixed in commit adbefa8) which also caused bad performance for recognition and training on FreeBSD.

@vmato
Copy link
Author

vmato commented Nov 16, 2022

Looks very strange but now tesseract started working.

This happened before any software or OS updates.
So I can only suppose there was update of provider's virtualization software or something like that

@amitdo amitdo closed this as completed Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants