Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to question: OCR only part of scanned image #34

Open
LostAccount opened this issue Jan 1, 2021 · 0 comments
Open

How to question: OCR only part of scanned image #34

LostAccount opened this issue Jan 1, 2021 · 0 comments

Comments

@LostAccount
Copy link

LostAccount commented Jan 1, 2021

Hello

I have only started using tesseract with ocrmypdf.

I issue a command like this ocrmypdf input_pdf_or_image output_pdf

This is not a ocrmypdf question.

Question
Is there any way to mask or draw a bounding box around a scanned images that will be intercepted as a region in the image that tesseract will know to ignore? Some of my scanned images have graphics or tables in them that get OCR'd and although appreciated I would rather exclude these because the resulting PDF will contain selectable text which is unwanted.

Any ideas or potential solutions would be very much appreciated as I have been trying to find workarounds for days now.

➜ ~ tesseract --version
tesseract 4.1.1
leptonica-1.80.0
libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.2.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1
Found AVX
Found SSE

Kind regards
—Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant