RFC: Best Practices re OPENMP - for training, evaluation and recognition #3744

Shreeshrii · 2022-02-06T05:44:25Z

For Tesseract 5 what are the best practices regarding OPENMP.

Is it still true:

OPENMP is needed for training so build tesseract and training tools with --enable-openmp.
For lstmeval (built with --enable-openmp), use OMP_THREAD_LIMIT=1.
For recognition with tesseract (built with --enable-openmp), use OMP_THREAD_LIMIT=1.

The text was updated successfully, but these errors were encountered:

stweil · 2022-02-06T11:09:23Z

OPENMP is not needed for training. It even makes things worse for me. Timing results for lstm_squashed_test on AMD EPYC 7502 show that no OPENMP (--disable-openmp) is best, followed by disabled OPENMP (OMP_THREAD_LIMIT=1). Enabled OPENMP is the last and burns a lot of CPU performance for nothing:

# --disable-openmp
real 28.41
user 28.33
sys 0.08
# --enable-openmp
real 33.16
user 129.41
sys 1.46
# --enable-openmp, OMP_THREAD_LIMIT=1
real 32.89
user 32.61
sys 0.28

amitdo · 2022-02-06T11:11:37Z

The plan is to disable it by default in 5.1.0.

stweil · 2022-02-06T11:13:06Z

... in autoconf builds. cmake already disables it by default.

stweil · 2022-02-06T11:18:00Z

Note that even without OPENMP training uses up to two CPU threads, one for training which runs until training is finished and one for evaluation which runs from time to time during the training process.

amitdo · 2022-02-06T11:55:36Z

The reason for disabling OpenMP is that Tesseract currently uses it inefficiently.

For text recognition the speed benefit for using OpenMP with fast / tessdata (best->int) traineddata is too small while it consumes too much CPU resources.

For training the OpenMP code is even more problematic than the code used for text recognition. I'm not sure how much speed will be lost here.

Shreeshrii · 2022-02-06T12:08:48Z

Thank you!

no OPENMP is best, followed by disabled OPENMP

Does no OPENMP mean building with --disable-openmp as part of autotools configure?

stweil · 2022-02-06T12:24:50Z

Yes, currently it is necessary to use configure --disable-openmp. As Amit has written above that should be the default, but I still have no simple code to achieve that.

I updated my comment to be clearer.

amitdo · 2022-02-06T12:43:44Z

--disable-openmp disables OpenMP at compile time, while OMP_THREAD_LIMIT=1 disables it at runtime. The first method is more efficient, while the second method is more flexible.

amitdo · 2022-02-06T13:57:19Z

Stefan, for 5.1.0, do you want to keep a way to enable OpenMP with --enable-openmp?

Shreeshrii · 2022-02-06T14:57:20Z

OPENMP is not needed for training. It even makes things worse for me. Timing results for lstm_squashed_test on AMD EPYC 7502 show that no OPENMP (--disable-openmp) is best, followed by disabled OPENMP (OMP_THREAD_LIMIT=1). Enabled OPENMP is the last and burns a lot of CPU performance for nothing:

Ok. I will try to test training from font scenarios in my tess5train-fonts repo to see if they get similar results.

Shreeshrii · 2022-02-06T15:29:19Z

lstmeval

Which time figures (real, user, sys) are important? Which scenario is preferable?

no OPENMP (--disable-openmp)

tesseract 5.0.1-19-g44ddde
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found NEON
 Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 liblz4/1.7.1
 Found libcurl/7.58.0 NSS/3.35 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3

time -p lstmeval  \
	--verbosity=0 \
	--model data/engFineTuned/tessdata_fast/engFineTuned_0.631000_121_600.traineddata \
	--eval_listfile data/engFineTuned/list.eval 2>&1 | grep "^BCER eval" > data/engFineTuned/tessdata_fast/engFineTuned_0.631000_121_600.eval.log
real 805.37
user 805.34
sys 0.03
time -p lstmeval  \
	--verbosity=0 \
	--model data/engFineTuned/tessdata_fast/engFineTuned_0.028000_156_2000.traineddata \
	--eval_listfile data/engFineTuned/list.eval 2>&1 | grep "^BCER eval" > data/engFineTuned/tessdata_fast/engFineTuned_0.028000_156_2000.eval.log
real 806.56
user 806.49
sys 0.07
time -p lstmeval  \
	--verbosity=0 \
	--model data/engFineTuned/tessdata_fast/engFineTuned_0.558000_125_700.traineddata \
	--eval_listfile data/engFineTuned/list.eval 2>&1 | grep "^BCER eval" > data/engFineTuned/tessdata_fast/engFineTuned_0.558000_125_700.eval.log
real 806.10
user 806.04
sys 0.07

Enabled OPENMP

older version of tesseract 5.0.1 built with --enable-openmp

time -p lstmeval  \
	--verbosity=0 \
	--model data/engFineTuned/tessdata_fast/engFineTuned_0.645000_119_600.traineddata \
	--eval_listfile data/engFineTuned/list.eval 2>&1 | grep "^BCER eval" > data/engFineTuned/tessdata_fast/engFineTuned_0.645000_119_600.eval.log
real 331.53
user 1041.90
sys 9.02
time -p lstmeval  \
	--verbosity=0 \
	--model data/engFineTuned/tessdata_fast/engFineTuned_0.119000_156_1500.traineddata \
	--eval_listfile data/engFineTuned/list.eval 2>&1 | grep "^BCER eval" > data/engFineTuned/tessdata_fast/engFineTuned_0.119000_156_1500.eval.log
real 331.30
user 1042.38
sys 8.55
time -p lstmeval  \
	--verbosity=0 \
	--model data/engFineTuned/tessdata_fast/engFineTuned_0.014000_165_2500.traineddata \
	--eval_listfile data/engFineTuned/list.eval 2>&1 | grep "^BCER eval" > data/engFineTuned/tessdata_fast/engFineTuned_0.014000_165_2500.eval.log
real 331.70
user 1042.77
sys 8.97

Shreeshrii · 2022-02-06T15:43:51Z

lstmeval - engImpact

No OPENMP

time -p lstmeval  \
	--verbosity=0 \
	--model data/engImpact/tessdata_fast/engImpact_0.489000_152_900.traineddata \
	--eval_listfile data/engImpact/list.eval 2>&1 | grep "^BCER eval" > data/engImpact/tessdata_fast/engImpact_0.489000_152_900.eval.log
real 19.85
user 19.82
sys 0.04

OPENMP

time -p lstmeval  \
	--verbosity=0 \
	--model data/engImpact/tessdata_fast/engImpact_0.489000_152_900.traineddata \
	--eval_listfile data/engImpact/list.eval 2>&1 | grep "^BCER eval" > data/engImpact/tessdata_fast/engImpact_0.489000_152_900.eval.log
real 8.25
user 25.87
sys 0.27

stweil · 2022-02-06T15:46:11Z

Which time figures (real, user. sys) are important? Which scenario is preferable?

"real" is the time spent from program start to termination.
"user" and "sys" is the accumulated time used by all CPUs in user space / system space.
For single threaded applications like Tesseract without OPENMP "real" is normally equal to the sum of "user" and "sys". "real" can also be much larger if the execution is delayed, for example by other applications running simultaneously.

In your test scenario lstmeval was much faster with OPENMP enabled ("real" is 331 s instead of 805 s), so you'd prefer that to get a result fast. The CPU resources where slightly more with OPENMP ("user" 1042 s and "sys" 9 s instead of 805 s / 0.05 s), so the faster execution costs some (acceptable) overhead in this case.

for 5.1.0, do you want to keep a way to enable OpenMP with --enable-openmp?

Yes, I think that's necessary because of compatibility and also because it can be useful as in @Shreeshrii's test case on ARM.

stweil · 2022-02-06T15:49:27Z

Running Tesseract with several threads seems to work better on ARM than on Intel architectures. I noticed that with Apple M1 (AARCH64), too.

Shreeshrii · 2022-02-06T15:54:55Z

Running Tesseract with several threads seems to work better on ARM than on Intel architectures. I noticed that with Apple M1 (AARCH64), too.

I am running this on AARCH64.

zdenop · 2022-02-06T18:50:19Z

Also, my tests shows that the enabled OPENMP could make sense in some cases (e.g. for the best data model on Windows & MSVC2019 and Intel processor) It would be great if we found somebody familiar with OpenMP at least for review how tesseract use it...

tdhintz · 2022-02-16T22:00:07Z

My timings for OpenMP on Windows MSVC at the end of issue #3044.

Shreeshrii · 2022-02-17T02:15:35Z

Thanks, @tdhintz

It would be good to know if the results still hold. If possible, please rerun tests with the tesseract 5 released version or latest GitHub version, since there have been many changes since 2020.

tdhintz · 2022-02-23T13:23:44Z

@Shreeshrii I'll add that task to our plan for late March. We build with very specific settings to get best results and I'm sure the build process has changed again, so this will be a heavy lift.

tdhintz · 2022-03-25T18:46:46Z

Looks like someone did this already: OpenMP benchmark

Shreeshrii · 2022-03-26T02:24:11Z

Looks like someone did this already: OpenMP benchmark

That test by @zdenop uses one image 15 times. Your tests use many more combinations.

We ran a comparison between a pre-release of 4.0 and the current 5.0 on AVX2 and SSE hardware on Windows that I'll share just for grins. The 4.0 was built with floating point set to fast, COMDAT folding, OpenMP and was PGO optimized. The 5.0 build also used floating point 'fast' and COMDAT folding, but without OpenMP and without PGO optimization.
2,880 combinations of settings and images were tested for each AVX2 and SSE platform. The tests are by no means comprehensive of all possible combinations. For example, only Eng traindata was used, although the Fast, Best and Blended data were all used.

this will be a heavy lift.

I understand.
If possible to do, the results can be added to tessdoc for easy reference. Thanks.

Shreeshrii added question OpenMP labels Feb 6, 2022

Shreeshrii changed the title ~~Best Practices re OPENMP - for training, evaluation and recognition~~ RFC: Best Practices re OPENMP - for training, evaluation and recognition Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Best Practices re OPENMP - for training, evaluation and recognition #3744

RFC: Best Practices re OPENMP - for training, evaluation and recognition #3744

Shreeshrii commented Feb 6, 2022

stweil commented Feb 6, 2022 •

edited

amitdo commented Feb 6, 2022

stweil commented Feb 6, 2022 •

edited

stweil commented Feb 6, 2022

amitdo commented Feb 6, 2022 •

edited

Shreeshrii commented Feb 6, 2022

stweil commented Feb 6, 2022 •

edited

amitdo commented Feb 6, 2022 •

edited

amitdo commented Feb 6, 2022

Shreeshrii commented Feb 6, 2022 •

edited

Shreeshrii commented Feb 6, 2022 •

edited

Shreeshrii commented Feb 6, 2022

stweil commented Feb 6, 2022 •

edited

stweil commented Feb 6, 2022

Shreeshrii commented Feb 6, 2022

zdenop commented Feb 6, 2022

tdhintz commented Feb 16, 2022 •

edited

Shreeshrii commented Feb 17, 2022

tdhintz commented Feb 23, 2022

tdhintz commented Mar 25, 2022

Shreeshrii commented Mar 26, 2022

RFC: Best Practices re OPENMP - for training, evaluation and recognition #3744

RFC: Best Practices re OPENMP - for training, evaluation and recognition #3744

Comments

Shreeshrii commented Feb 6, 2022

stweil commented Feb 6, 2022 • edited

amitdo commented Feb 6, 2022

stweil commented Feb 6, 2022 • edited

stweil commented Feb 6, 2022

amitdo commented Feb 6, 2022 • edited

Shreeshrii commented Feb 6, 2022

stweil commented Feb 6, 2022 • edited

amitdo commented Feb 6, 2022 • edited

amitdo commented Feb 6, 2022

Shreeshrii commented Feb 6, 2022 • edited

Shreeshrii commented Feb 6, 2022 • edited

lstmeval

no OPENMP (--disable-openmp)

Enabled OPENMP

Shreeshrii commented Feb 6, 2022

lstmeval - engImpact

No OPENMP

OPENMP

stweil commented Feb 6, 2022 • edited

stweil commented Feb 6, 2022

Shreeshrii commented Feb 6, 2022

zdenop commented Feb 6, 2022

tdhintz commented Feb 16, 2022 • edited

Shreeshrii commented Feb 17, 2022

tdhintz commented Feb 23, 2022

tdhintz commented Mar 25, 2022

Shreeshrii commented Mar 26, 2022

stweil commented Feb 6, 2022 •

edited

stweil commented Feb 6, 2022 •

edited

amitdo commented Feb 6, 2022 •

edited

stweil commented Feb 6, 2022 •

edited

amitdo commented Feb 6, 2022 •

edited

Shreeshrii commented Feb 6, 2022 •

edited

Shreeshrii commented Feb 6, 2022 •

edited

stweil commented Feb 6, 2022 •

edited

tdhintz commented Feb 16, 2022 •

edited