hangs on moderately complex .pdf #7

rbhughes opened this Issue Jul 14, 2010 · 14 comments


None yet
4 participants

I have a few .pdf files that cause pdf-reader to get stuck in a buffering loop. In one case, it appeared that the parser encountered a double nil, and I was able to get around it by adding "@io.pos=@pos+1" in buffer.rb's prepare_literal_token. The parser then makes it about twice as far into the file before hanging again (100% CPU), but I can't find the cause this time. I would be happy to send you the .pdf file and welcome any pointers. Thanks!

I think I tracked this issue down to lines 239..241.

If you duplicate the check and change it a bit, the PDF file gets completely parsed.

See: http://gist.github.com/478422 for my bugfix on the buffer.rb file (def prepare_literal_token)

The bugfix worked for one problematic .pdf, but unfortunately I have another .pdf causing the same-ish problem. The buffer can't seem to find a ")" to terminate the string. There's only one "%%EOF" and it's where it ought to be. Any tips will be greatly appreciated; I can send you the .pdf too if you like.

Please do send me the PDF file, maybe you could just link the PDF here, so Yob can see it too :-)

I'm testing against about 60 random .pdfs and the hang happens on two, both from the same source. Here is one of them: http://dl.dropbox.com/u/9209734/Petrel2007_1PreliminarySystemRequirementDecUpdate.pdf

I've parsed this PDF with success... No problems, just takes a while (38 secs) but no hang... Have you implemented my patch?

Just to be sure, I uninstalled/reinstalled the gem (0.8.5), applied the buffer.rb patch and re-tested the file I sent. I let it run for about 8 minutes before killing it. If it matters, this is running on Ruby 1.8.7 (2010-01-10 patchlevel 249) [i386-mingw32] on Windows XP. I could try OS X, but it needs to run on Windows, unfortunately.


yob commented Nov 13, 2010

Hi guys,

I've finally got some spare time to catch up on PDF::Reader bugs. Sorry about the massive lag.

David - I've implemented the fix in your gist, thanks for doing the research.

rbhughes - I'm happy to look into your failing PDF if you still have it and can email it to me.

If you find a fix, please note it here, Im looking forward to it!


yob commented Nov 13, 2010

David - do you still have any files that trigger hangs?

yob - I'll look for them, I do have some, but I will post them as I find them, ok?


yob commented Nov 15, 2010

sure, thanks.

Just 1 or 2 will probably do it. Send them to my email address - james@yob.id.au

sci-phi commented Jun 7, 2011

I was trying to use the version example and it apparently hangs up on some PDFs - specifically testing with the PDF 1.4 spec PDF from adobe ("http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf") using pdf-reader v0.9.2

Other trivial PDFs work fine, but I have much, much bigger PDFs I would want to get metadata from so I need to know if this issue is size-related, or otherwise


yob commented Jul 8, 2011

@sci-phi - can you try out the latest released version? It has a fix that I think might help


yob commented Apr 7, 2013

The internals of pdf-reader has changed a lot since this was opened so I'll close this issue. Please open a new issue if related problems occur on a more recent release.

@yob yob closed this Apr 7, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment