segmentation fault with --psm 0 #821

pritamdodeja · 2017-04-13T05:13:01Z

Command below results in a segmentation fault
tesseract a.jpg stdout --oem 1 --psm 0 -l eng

Environment details:
Which operating system - Ubuntu 16.10 Yakkety Yak on x86_64
Which version/commit of tesseract - top of Changelog says 2017-03-24 - v4.00.00-alpha
How was tesseract built or - I compiled it from source

Command above works with --psm 3 is used instead.

Pritam Dodeja

amitdo · 2017-04-13T06:48:36Z

Try tesseract a.jpg stdout --oem 0 --psm 0 -l eng

amitdo · 2017-04-13T07:01:12Z

Also try with this image:
https://github.com/tesseract-ocr/tesseract/raw/master/testing/phototest.tif

pritamdodeja · 2017-04-13T19:18:40Z

Find below

tesseract phototest.tif stdout --oem 0 --psm 0 -l eng
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
Page 1
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 15.98
Script: Latin
Script confidence: 460.00

tesseract phototest.tif stdout --oem 1 --psm 0 -l eng
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
Page 1
Segmentation fault (core dumped)

tesseract phototest.tif stdout --oem 1 --psm 3 -l eng
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
Page 1
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.

Pritam

amitdo · 2017-04-13T20:34:48Z

The warnings are ugly but seem harmless.

With --oem 0 and --psm 0 Tesseract works as expected.

With --oem 1 and --psm 0 Tesseract segfault.
The reason - the new LSTM engine has no OSD feature currently, only the older engine has it.

For now, the solution is to always use --oem 0 when using --psm 0

amitdo · 2017-04-13T20:43:51Z

BTW, you should use osd instead of eng with --psm 0.

Using eng will result in always detecting Latin as the script, even if the text is written in another script.

pritamdodeja · 2017-04-21T03:07:28Z

From what I have read, tesseract v4 greatly improves ocr due to LSTM. If I know that my text is going to be of a certain orientation and script (top to bottom and English), how do I take advantage of the newer engine? Thanks for the help and sorry for the delay in my response.

amitdo · 2017-04-21T07:04:42Z

The 4.00 version is in alpha stage. It's not yet considered ready to replace the stable 3.05 version.
There is a plan to add an OSD feature to the LSTM engine.

pritamdodeja · 2017-06-29T18:50:53Z

Is there any update on this? Let me know if you want me to do any testing etc. Thanks!

amitdo · 2017-06-29T19:59:31Z

I'm sorry, but there is no update on this issue.

Shreeshrii · 2017-06-30T03:42:20Z

From what I have read, tesseract v4 greatly improves ocr due to LSTM. If I know that my text is going to be of a certain orientation and script (top to bottom and English), how do I take advantage of the newer engine?

If you want to OCR English text, use the program (latest version built from master branch in github) with default options or specify language as English.

tesseract ./testing/eurotext.tif eurotext

or
tesseract ./testing/eurotext.tif eurotext --oem 1 --psm 6 -l eng

ivanzz1001 · 2017-09-14T07:11:46Z

tesseract4.0.0 alpha, execute the following command:

[root@localhost workspace]# /opt/tesseract4.0/bin/tesseract pic/tesseract-chinese-1.png stdout --psm 0 
Warning. Invalid resolution 0 dpi. Using 70 instead.
Failed loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script detection requested, but osd language failed to load
Estimating resolution as 219
Segmentation fault (core dumped)

But exactly, the osd.traineddata is at the right space:

[root@localhost workspace]# ls /opt/tesseract4.0/share/tessdata/
chi_sim.traineddata       chi_tra.traineddata       configs/                  ori.traineddata           pdf.ttf                   
chi_sim_vert.traineddata  chi_tra_vert.traineddata  eng.traineddata           osd.traineddata           tessconfigs/

then I use the "--oem 0" options, it prints the following:

[root@localhost workspace]# /opt/tesseract4.0/bin/tesseract pic/tesseract-chinese-1.png stdout --psm 0 --oem 0
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

Then I use tesseract3.05.1, It seems that tesseract always detect the script is "Latin", not what I expected

amitdo · 2017-09-14T09:28:22Z

try this:

tesseract in.png out -l osd --psm 0 --oem 0

stweil · 2018-09-20T20:04:24Z

I think this was fixed in commit 27ce472, so this issue could be closed.

zdenop closed this as completed Sep 20, 2018

amitdo added the OSD Orientation and Script Detection label May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segmentation fault with --psm 0 #821

segmentation fault with --psm 0 #821

pritamdodeja commented Apr 13, 2017

amitdo commented Apr 13, 2017

amitdo commented Apr 13, 2017

pritamdodeja commented Apr 13, 2017

amitdo commented Apr 13, 2017 •

edited

amitdo commented Apr 13, 2017 •

edited

pritamdodeja commented Apr 21, 2017

amitdo commented Apr 21, 2017 •

edited

pritamdodeja commented Jun 29, 2017

amitdo commented Jun 29, 2017

Shreeshrii commented Jun 30, 2017

ivanzz1001 commented Sep 14, 2017 •

edited

amitdo commented Sep 14, 2017 •

edited

stweil commented Sep 20, 2018

segmentation fault with --psm 0 #821

segmentation fault with --psm 0 #821

Comments

pritamdodeja commented Apr 13, 2017

amitdo commented Apr 13, 2017

amitdo commented Apr 13, 2017

pritamdodeja commented Apr 13, 2017

amitdo commented Apr 13, 2017 • edited

amitdo commented Apr 13, 2017 • edited

pritamdodeja commented Apr 21, 2017

amitdo commented Apr 21, 2017 • edited

pritamdodeja commented Jun 29, 2017

amitdo commented Jun 29, 2017

Shreeshrii commented Jun 30, 2017

ivanzz1001 commented Sep 14, 2017 • edited

amitdo commented Sep 14, 2017 • edited

stweil commented Sep 20, 2018

amitdo commented Apr 13, 2017 •

edited

amitdo commented Apr 13, 2017 •

edited

amitdo commented Apr 21, 2017 •

edited

ivanzz1001 commented Sep 14, 2017 •

edited

amitdo commented Sep 14, 2017 •

edited