Unable to process pdfs - Windows #43

fraserpage · 2016-04-20T20:35:34Z

I'm seeing the following on Windows 10. You assistance would be greatly appreciated.

Syntax Warning: Bad annotation destination
Syntax Warning: Bad annotation destination

I see about 30 lines of the above when using trying to process a pdf with pypdfocr filename.pdf.
I see the below with any usage.

WARNING: Could not execute identify to calculate DPI (try installing imagemagick?), so defaulting to 300dpi
Traceback (most recent call last):
File "", line 495, in
File "", line 492, in main
File "", line 474, in go
File "", line 480, in _convert_and_file_email
File "", line 359, in run_conversion
File "C:\Users\Virantha Ekanayake\dev\pypdfocr\build\pypdfocr_windows\out00-PYZ.pyz\pypdfocr_tesseract", line 130, in make_hocr_from_pnms
File "C:\Users\Virantha Ekanayake\dev\pypdfocr\build\pypdfocr_windows\out00-PYZ.pyz\pypdfocr_tesseract", line 96, in _is_version_uptodate
ValueError: invalid literal for int() with base 10: '00dev'

All dependencies are installed.

flothesof · 2016-06-13T12:42:51Z

Hi @fraserpage

I've had the same issue. In my case, the code for parsing the version string used by tesseract does not work as intended by the author. In particular, my version string was 3.05.00dev which caused the same error as you when the script tried to parse it and determine whether it was correct.

As a workaround, you can add the following bold lines to the file pypdfocr_tesseract.py found in python27\Lib\site-packages\pypdfocr:

for line in ret_output.splitlines():
            if 'tesseract' in line:
                ver_str = line.split(' ')[1]
                **if ver_str.endswith('dev'):
                    ver_str = ver_str[:-3]**

Hope this helps,

Florian

fraserpage · 2016-06-14T19:35:19Z

Thanks very much @flothesof! That got it working for me.

I'm still seeing the warning about imagemagick. Any clues on that one?
WARNING: Could not execute identify to calculate DPI (try installing imagemagick?), so defaulting to 300dpi

flothesof · 2016-06-14T21:58:24Z

Hey there!

The warning is normal, the program is just telling us it would like to do
some additional checks before adding ocr. I've found no problems with the
default resolution while using it.

Best regards
Florian
Le 14 juin 2016 21:35, "fraserpage" notifications@github.com a écrit :

Thanks very much @flothesof https://github.com/flothesof! That got it
working for me.

I'm still seeing the warning about imagemagick. Any clues on that one?
WARNING: Could not execute identify to calculate DPI (try installing
imagemagick?), so defaulting to 300dpi

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#43 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACQFXaASjNyIuOBM-QYuthAuoKnoaWFLks5qLwJ4gaJpZM4IMFrJ
.

fraserpage · 2016-06-15T14:45:55Z

Got it. Thanks for your help!

virantha · 2016-06-23T18:10:25Z

Going to reopen and fix this in source for next release. Thanks for pointing this out, folks!

dwmcqueen · 2016-08-11T16:54:41Z

Hi - can the exe be fixed with this same patch?

rasa · 2017-01-08T01:37:13Z

@flothesof: The warning message is actually not normal, but is reporting an error on Windows. This has been fixed in #54

qi55wyqu · 2017-11-08T01:56:13Z

I'm still running into this problem with the word alpha in Version 0.9.1
My added fix for this (based on flothesof's answer):

checkFileEndings = ['dev', 'alpha']
for line in ret_output.splitlines():
    if 'tesseract' in line:
        ver_str = line.split(' ')[1]
        for fileEnding in checkFileEndings:
            if ver_str.endswith(fileEnding):
                ver_str = ver_str[:-len(fileEnding)]

fraserpage closed this as completed Jun 15, 2016

virantha reopened this Jun 23, 2016

virantha added this to the 0.9.1 milestone Jun 23, 2016

virantha self-assigned this Jun 23, 2016

dwmcqueen mentioned this issue Aug 11, 2016

Having Problem with pypdfocr on Windows 2008 R2 #49

Closed

virantha closed this as completed Oct 11, 2016

rasa mentioned this issue Jan 8, 2017

Fixes: WARNING: Could not execute identify to calculate DPI #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to process pdfs - Windows #43

Unable to process pdfs - Windows #43

fraserpage commented Apr 20, 2016 •

edited

flothesof commented Jun 13, 2016

fraserpage commented Jun 14, 2016

flothesof commented Jun 14, 2016

fraserpage commented Jun 15, 2016

virantha commented Jun 23, 2016

dwmcqueen commented Aug 11, 2016

rasa commented Jan 8, 2017

qi55wyqu commented Nov 8, 2017 •

edited

Unable to process pdfs - Windows #43

Unable to process pdfs - Windows #43

Comments

fraserpage commented Apr 20, 2016 • edited

flothesof commented Jun 13, 2016

fraserpage commented Jun 14, 2016

flothesof commented Jun 14, 2016

fraserpage commented Jun 15, 2016

virantha commented Jun 23, 2016

dwmcqueen commented Aug 11, 2016

rasa commented Jan 8, 2017

qi55wyqu commented Nov 8, 2017 • edited

fraserpage commented Apr 20, 2016 •

edited

qi55wyqu commented Nov 8, 2017 •

edited