Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pixGenHalftoneMask error #292

Closed
scottb89 opened this issue Mar 24, 2016 · 6 comments · Fixed by #340
Closed

pixGenHalftoneMask error #292

scottb89 opened this issue Mar 24, 2016 · 6 comments · Fixed by #340
Assignees

Comments

@scottb89
Copy link

test

(I'm using head: 3.05.00dev)

When ocr'ing a small one or two line image (with the default settings), I'm getting the following error:
Error in pixGenHalftoneMask: pix too small: w = 610, h = 71

But the ocr is successful.

In textord/imagefind.cpp, FindImages is calling pixGenHalftoneMask (a leptonica function) which is throwing the error and returning null.

FindImages is then executing it's backup plan, which works fine (such a small image doesn't need multiple images regions). It's backup plan is to call pixCreate and then return the result.

The minimum size for pixGenHalftoneMask to work is 100x100 pixels. Perhaps FindImages should
just go with plan 'b' if the size of the image is too small. This would still work fine and avoid throwing the error. I don't know if there is a way for teseract to know what the minimum size is (in case it changes).

@scottb89
Copy link
Author

The min size is defined like this at the top of src/pageseg.c in leptonica:

/* These functions are not intended to work on very low-res images */

static const l_int32 MinWidth = 100;
static const l_int32 MinHeight = 100;

It doesn't seem that they are likely to change.

Here's my patch:

index 05047ca..0084ac8 100644
--- a/textord/imagefind.cpp
+++ b/textord/imagefind.cpp
@@ -72,6 +72,12 @@ Pix* ImageFind::FindImages(Pix* pix) {
pixDisplayWrite(pixr, textord_tabfind_show_images);

// Get the halftone mask directly from Leptonica.

  • if (pixGetWidth(pixr) < 100 || pixGetHeight(pixr) < 100)
  • {
  • // leptonica will throw and error and return null
  • pixDestroy(&pixr);
  • return pixCreate(pixGetWidth(pix), pixGetHeight(pix), 1);
  • }
    l_int32 ht_found = 0;
    Pix *pixht2 = pixGenHalftoneMask(pixr, NULL, &ht_found,
    textord_tabfind_show_images);

@jbreiden
Copy link
Contributor

An alternative is for Leptonica to not spam stderr. I'm honestly not sure which approach is better.

@scottb89
Copy link
Author

Good point. It's returning null, and the null is being handled properly. Automatically emitting the reason for the error should be directable/configurable.

I just dug into it and found NO_CONSOLE_IO referenced from environ.h.
Maybe I needed to turn that on when I built Leptonica. Still a questionable approach.

@amitdo
Copy link
Collaborator

amitdo commented May 30, 2016

@DanBloomberg, any comment?

@DanBloomberg
Copy link

Scott's patch seems reasonable to me.

Testing the size in advance prevents leptonica from returning null,
and from saying that an error occurred.

Slightly simpler just to
return pixCopy(pixr, pix);
which works whether or not pixr exists.

re: musings about whether or not leptonica should "spam stderr":

Just a reminder that leptonica provides a significant amount of control over messages to stderr, both at compile time and at run time, and broken down to the granularity of errors, warnings and information. The question for the leptonica client (tesseract, here) is what level of messages they want to receive on stderr, and in this case it's up to the maintainers. If the error messages (at least) are not on, a user will have no idea why something failed if it was due to bad input to a leptonica function, for example. On the other hand, when failures occur but are caught and handled by the client, the user may be unnecessarily concerned, when things are actually going ok.

FWIW, when I am programming and testing, I always use the default MINIMUM_SEVERITY of L_SEVERITY_INFO, which turns all messages on.

@amitdo
Copy link
Collaborator

amitdo commented May 30, 2016

OK. Thanks for your reply, Dan.

Just a quote:

To spam, or not to spam, that is the question.

;-)

@amitdo amitdo added the bug label May 31, 2016
@amitdo amitdo self-assigned this Jun 2, 2016
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
zvezdochiot pushed a commit to ImageProcessing-ElectronicPublications/tesseract that referenced this issue Mar 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants