hangs on moderately complex .pdf #7

Closed
rbhughes opened this Issue Jul 14, 2010 · 14 comments

Comments

Projects
None yet
4 participants

I have a few .pdf files that cause pdf-reader to get stuck in a buffering loop. In one case, it appeared that the parser encountered a double nil, and I was able to get around it by adding "@io.pos=@pos+1" in buffer.rb's prepare_literal_token. The parser then makes it about twice as far into the file before hanging again (100% CPU), but I can't find the cause this time. I would be happy to send you the .pdf file and welcome any pointers. Thanks!

I think I tracked this issue down to lines 239..241.

If you duplicate the check and change it a bit, the PDF file gets completely parsed.

See: http://gist.github.com/478422 for my bugfix on the buffer.rb file (def prepare_literal_token)

The bugfix worked for one problematic .pdf, but unfortunately I have another .pdf causing the same-ish problem. The buffer can't seem to find a ")" to terminate the string. There's only one "%%EOF" and it's where it ought to be. Any tips will be greatly appreciated; I can send you the .pdf too if you like.

Please do send me the PDF file, maybe you could just link the PDF here, so Yob can see it too :-)

I'm testing against about 60 random .pdfs and the hang happens on two, both from the same source. Here is one of them: http://dl.dropbox.com/u/9209734/Petrel2007_1PreliminarySystemRequirementDecUpdate.pdf
Thanks!

I've parsed this PDF with success... No problems, just takes a while (38 secs) but no hang... Have you implemented my patch?

Just to be sure, I uninstalled/reinstalled the gem (0.8.5), applied the buffer.rb patch and re-tested the file I sent. I let it run for about 8 minutes before killing it. If it matters, this is running on Ruby 1.8.7 (2010-01-10 patchlevel 249) [i386-mingw32] on Windows XP. I could try OS X, but it needs to run on Windows, unfortunately.

Owner

yob commented Nov 13, 2010

Hi guys,

I've finally got some spare time to catch up on PDF::Reader bugs. Sorry about the massive lag.

David - I've implemented the fix in your gist, thanks for doing the research.

rbhughes - I'm happy to look into your failing PDF if you still have it and can email it to me.

If you find a fix, please note it here, Im looking forward to it!

Owner

yob commented Nov 13, 2010

David - do you still have any files that trigger hangs?

yob - I'll look for them, I do have some, but I will post them as I find them, ok?

Owner

yob commented Nov 15, 2010

sure, thanks.

Just 1 or 2 will probably do it. Send them to my email address - james@yob.id.au

sci-phi commented Jun 7, 2011

I was trying to use the version example and it apparently hangs up on some PDFs - specifically testing with the PDF 1.4 spec PDF from adobe ("http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf") using pdf-reader v0.9.2

Other trivial PDFs work fine, but I have much, much bigger PDFs I would want to get metadata from so I need to know if this issue is size-related, or otherwise

Owner

yob commented Jul 8, 2011

@sci-phi - can you try out the latest released version? It has a fix that I think might help

Owner

yob commented Apr 7, 2013

The internals of pdf-reader has changed a lot since this was opened so I'll close this issue. Please open a new issue if related problems occur on a more recent release.

@yob yob closed this Apr 7, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment