Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault with --psm 0 #821

Closed
pritamdodeja opened this issue Apr 13, 2017 · 13 comments
Closed

segmentation fault with --psm 0 #821

pritamdodeja opened this issue Apr 13, 2017 · 13 comments
Labels
OSD Orientation and Script Detection

Comments

@pritamdodeja
Copy link

Command below results in a segmentation fault
tesseract a.jpg stdout --oem 1 --psm 0 -l eng

Environment details:
Which operating system - Ubuntu 16.10 Yakkety Yak on x86_64
Which version/commit of tesseract - top of Changelog says 2017-03-24 - v4.00.00-alpha
How was tesseract built or - I compiled it from source

Command above works with --psm 3 is used instead.

Pritam Dodeja

@amitdo
Copy link
Collaborator

amitdo commented Apr 13, 2017

Try tesseract a.jpg stdout --oem 0 --psm 0 -l eng

@amitdo
Copy link
Collaborator

amitdo commented Apr 13, 2017

@pritamdodeja
Copy link
Author

Find below

tesseract phototest.tif stdout --oem 0 --psm 0 -l eng
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
Page 1
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 15.98
Script: Latin
Script confidence: 460.00

tesseract phototest.tif stdout --oem 1 --psm 0 -l eng
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
Page 1
Segmentation fault (core dumped)

tesseract phototest.tif stdout --oem 1 --psm 3 -l eng
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
TIFFFetchNormalTag: Warning, ASCII value for tag "Photoshop" does not end in null byte. Forcing it to be null.
Page 1
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.

The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.

Pritam

@amitdo
Copy link
Collaborator

amitdo commented Apr 13, 2017

The warnings are ugly but seem harmless.

With --oem 0 and --psm 0 Tesseract works as expected.

With --oem 1 and --psm 0 Tesseract segfault.
The reason - the new LSTM engine has no OSD feature currently, only the older engine has it.

For now, the solution is to always use --oem 0 when using --psm 0

@amitdo
Copy link
Collaborator

amitdo commented Apr 13, 2017

BTW, you should use osd instead of eng with --psm 0.

Using eng will result in always detecting Latin as the script, even if the text is written in another script.

@pritamdodeja
Copy link
Author

From what I have read, tesseract v4 greatly improves ocr due to LSTM. If I know that my text is going to be of a certain orientation and script (top to bottom and English), how do I take advantage of the newer engine? Thanks for the help and sorry for the delay in my response.

@amitdo
Copy link
Collaborator

amitdo commented Apr 21, 2017

The 4.00 version is in alpha stage. It's not yet considered ready to replace the stable 3.05 version.
There is a plan to add an OSD feature to the LSTM engine.

@pritamdodeja
Copy link
Author

Is there any update on this? Let me know if you want me to do any testing etc. Thanks!

@amitdo
Copy link
Collaborator

amitdo commented Jun 29, 2017

I'm sorry, but there is no update on this issue.

@Shreeshrii
Copy link
Collaborator

From what I have read, tesseract v4 greatly improves ocr due to LSTM. If I know that my text is going to be of a certain orientation and script (top to bottom and English), how do I take advantage of the newer engine?

If you want to OCR English text, use the program (latest version built from master branch in github) with default options or specify language as English.

tesseract ./testing/eurotext.tif eurotext

or
tesseract ./testing/eurotext.tif eurotext --oem 1 --psm 6 -l eng

@ivanzz1001
Copy link
Contributor

ivanzz1001 commented Sep 14, 2017

tesseract4.0.0 alpha, execute the following command:

[root@localhost workspace]# /opt/tesseract4.0/bin/tesseract pic/tesseract-chinese-1.png stdout --psm 0 
Warning. Invalid resolution 0 dpi. Using 70 instead.
Failed loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script detection requested, but osd language failed to load
Estimating resolution as 219
Segmentation fault (core dumped)

But exactly, the osd.traineddata is at the right space:

[root@localhost workspace]# ls /opt/tesseract4.0/share/tessdata/
chi_sim.traineddata       chi_tra.traineddata       configs/                  ori.traineddata           pdf.ttf                   
chi_sim_vert.traineddata  chi_tra_vert.traineddata  eng.traineddata           osd.traineddata           tessconfigs/ 

then I use the "--oem 0" options, it prints the following:

[root@localhost workspace]# /opt/tesseract4.0/bin/tesseract pic/tesseract-chinese-1.png stdout --psm 0 --oem 0
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

Then I use tesseract3.05.1, It seems that tesseract always detect the script is "Latin", not what I expected

@amitdo
Copy link
Collaborator

amitdo commented Sep 14, 2017

try this:

tesseract in.png out -l osd --psm 0 --oem 0

@stweil
Copy link
Contributor

stweil commented Sep 20, 2018

I think this was fixed in commit 27ce472, so this issue could be closed.

@zdenop zdenop closed this as completed Sep 20, 2018
@amitdo amitdo added the OSD Orientation and Script Detection label May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OSD Orientation and Script Detection
Projects
None yet
Development

No branches or pull requests

6 participants