Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows (x64) compiled executable (tesseract.exe) doesn't work in any other windows machine (x64) #2674

Closed
jkang-eng opened this issue Sep 24, 2019 · 14 comments

Comments

@jkang-eng
Copy link
Contributor

jkang-eng commented Sep 24, 2019

Environment

  • Tesseract Version: tesseract 5.0.0-alpha-375-g179c8
  • Platform: Windows Server 2008 R2, 64bit

Current Behavior:

I built tesseract executable from source using following this instruction

  1. git clone https://github.com/tesseract-ocr/tesseract tesseract
  2. Modify source code (this is to suit our application's purpose, certain that this is NOT the root cause of issue)
  3. cd tesseract
  4. cppan
  5. mkdir build && cd build
  6. cmake ..
  7. cmake --build . --config Release

After step 6, I see tesseract.exe under "tesseract\build\bin\Release" directory, and executing it without any issue (working perfectly fine).

However, if once I copy entire \Release directory to another Windows with x64 platform (Windows Server 2008, 2016, etc..) It doesn't work anywhere else. (see screenshots below)

I have installed "Microsoft Visual C++ 2015-2019 Redistributable (x64)", so dependency does not look like issue here.

Screenshot 1 (Installed runtime libraries)
2019-09-24 12_30_06-VirtualBoxVM

Screenshot 2 (Message box pop-up on different Windows machine)
2019-09-24 12_30_56-VirtualBoxVM

Screenshot 3 (Windows Event Viewer)
2019-09-24 12_32_38-VirtualBoxVM

Screenshot 4 (Windows Event Viewer) - FYI, disk is not on network, it's under C:
2019-09-24 12_32_23-VirtualBoxVM

Can anyone suggest me with a fix? is there anything I'm missing during compilation procedure?

Suggested Fix:

@jkang-eng jkang-eng changed the title Windows (x64) compiled executable (tesseract.exe) doesn't work in any other windows machine (x64) Windows (x64) compiled executable (tesseract.exe) doesn't work in any other windows machine (x64) @build_process Sep 24, 2019
@jkang-eng jkang-eng changed the title Windows (x64) compiled executable (tesseract.exe) doesn't work in any other windows machine (x64) @build_process Windows (x64) compiled executable (tesseract.exe) doesn't work in any other windows machine (x64) Sep 24, 2019
@stweil
Copy link
Contributor

stweil commented Sep 24, 2019

CMake builds are currently optimized for the build host architecture. So if you build on a PC with AVX2 hardware, the resulting binaries won't work (= will crash) on a PC without AVX2.

Either fix cmake/OptimizeForArchitecture.cmake or remove it from CMakeLists.txt.

@stweil
Copy link
Contributor

stweil commented Sep 24, 2019

Builds using autoconf will work.

@jkang-eng
Copy link
Contributor Author

jkang-eng commented Sep 24, 2019

@stweil Thanks for your suggestions.
Regarding your recommendation CMakeLists.txt, I'm compiling without following section in CMakeLists.txt and it seems to work on other Windows machine.

# auto optimize
# include(OptimizeForArchitecture)
#AutodetectHostArchitecture()
# OptimizeForArchitecture()
#foreach(flag ${Vc_ARCHITECTURE_FLAGS})
#     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${flag}")
# endforeach()

When you mentioned build using autoconf, can you instruct me how to do this? I couldn't find autoconf usage under HowToBuild (Windows) instruction page.

@zdenop
Copy link
Contributor

zdenop commented Sep 25, 2019

With disabling optimization OCR process will be slow.
Autotools macros are common for linux and using them on Windows is very difficult.
Other option is to use cross-compiled windows build, but IMO it is not suitable for using tesseract API (library) on windows (VS Studio).

@jkang-eng
Copy link
Contributor Author

jkang-eng commented Sep 25, 2019

With disabling optimization OCR process will be slow.
Autotools macros are common for linux and using them on Windows is very difficult.
Other option is to use cross-compiled windows build, but IMO it is not suitable for using tesseract API (library) on windows (VS Studio).

Do you have any guess on what performance degradation would be without optimization ? (I will need to test performance anyways but was just curious if you can make any rough guess here)

Using cross-compiled windows build would be a problem for me, because I needed to changed the source code.

@zdenop
Copy link
Contributor

zdenop commented Sep 26, 2019

With image from issue 263 I got this results (tesseract 5.0.0-alpha-447-g52cf, Intel Core i7-6600U 2,60GHz 2.8 GHz; 8 GB ram; Windows 64 bit) - 5 runs:

With autooptimization: 26.177767169865813
Without autooptimization: 50.47452997387488

Update: I posted more details in #263

@zdenop
Copy link
Contributor

zdenop commented Sep 28, 2019

Now you can turn off cmake autooptimization with:
cmake .. -DAUTO_OPTIMIZE=OFF

@stweil
Copy link
Contributor

stweil commented Sep 29, 2019

@zdenop, did you specify CMAKE_BUILD_TYPE for your builds?

If not: please try latest Git master. It should be significantly faster.

@zdenop
Copy link
Contributor

zdenop commented Sep 29, 2019

@stweil: I used cmake&ninja&clang and this combinations needs to specify CMAKE_BUILD_TYPE. So yes, I specify it (to Release).

@jkang-eng
Copy link
Contributor Author

@zdenop Hi, thanks for testing the performance with optimization on/off.

Update: I posted more details in #263

In your post, you mentioned

  • tessdata_best
  • tessdata_fast
  • tessdata

Does this mean if I use eng.traineddata from each git, OCR engine will perform differently based on trained data file? In other words, if I swap exsiting eng.traineddata with the one from "tessdata_fast", would it increase performance?

After reading some doc, it looks like eng.traineddata has effect when using LSTM engine. I don't explicitly specify engine mode when executing. is LSTM used by default?

@zdenop
Copy link
Contributor

zdenop commented Sep 30, 2019

if I swap existing eng.traineddata with the one from "tessdata_fast", would it increase performance?

I am not sure how do you measure performance, but tessdata_fast model is faster version of tessdata_best. It has some limitations (e.g. you can do training only from best model, and best model provide better OCR result - at least in some cases).

is LSTM used by default?

>tesseract --help-oem
OCR Engine modes:
  0    Legacy engine only.
  1    Neural nets LSTM engine only.
  2    Legacy + LSTM engines.
  3    Default, based on what is available.

what is available => tessdata_best and tessdata_fast have only LSTM models (you will get error if you will use --oem 0: Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract), tessdata has also legacy model.

@jkang-eng
Copy link
Contributor Author

jkang-eng commented Oct 7, 2019

@zdenop Thanks, that makes sense.

Quick question regarding compilation (will create a new ticket if needed)

I'm using cmake to compile (shown below), which generate 64-bit executable. Is there any way to build 32-bit env executable?

:: Compile Tesseract
cd build\tesseract_src
cppan

cd ..
cmake tesseract_src
cmake --build . --config Release

Can I achieve all the same behavior above by using msbuild ? (Visual Studio compilation)

@zdenop
Copy link
Contributor

zdenop commented Oct 7, 2019

  1. IMO original topic is solved, so I will close this issue.
  2. Issue tracker is for bug and not for support. Please use forum or better - read cppan doc. It is external project.

@zdenop zdenop closed this as completed Oct 7, 2019
@jkang-eng
Copy link
Contributor Author

Got it. Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants