Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

page_count undefined method `[]' for nil:NilClass #76

Open
sandsfish opened this Issue Jan 18, 2013 · 9 comments

Comments

Projects
None yet
2 participants

In v1.3, v1.2, v1.0, when I run the code to iterate through all pages:

pdf_textfile = File.open('aero_text.txt', 'w')
reader.pages.each do |page|
pdf_textfile << page.text
end
pdf_textfile.close

I get the output:

gems/pdf-reader-1.0.0/lib/pdf/reader.rb:138:in page_count': undefined method[]' for nil:NilClass (NoMethodError)
from /Users/sands/.rvm/gems/ruby-1.9.2-p320@newdev/gems/pdf-reader-1.0.0/lib/pdf/reader.rb:224:in pages' from pdf2text.rb:11:in

'

This refers to the pages[] hash being nil for some reason, in reader.rb:

def page_count
  pages = @objects.deref(root[:Pages])
  @page_count ||= pages[:Count]
end

The reader initializes on the pdf file correctly because I can call reader.version and it reports back fine, but getting to the page level (on OS/X 10.8.2) simply doesn't work for this PDF, and no clues as to why are provided by the error message.

Cheers,

Sands Fish

Owner

yob commented Jan 18, 2013

Thanks for the report.

To understand the cause I'd really have to see the problem PDF. Are you
able to share it with me via email (james@yob.I'd.au)?
On 18/01/2013 6:18 AM, "Sands Fish" notifications@github.com wrote:

In v1.3, v1.2, v1.0, when I run the code to iterate through all pages:

pdf_textfile = File.open('aero_text.txt', 'w')
reader.pages.each do |page|
pdf_textfile << page.text
end
pdf_textfile.close

I get the output:

gems/pdf-reader-1.0.0/lib/pdf/reader.rb:138:in page_count': undefined
method[]' for nil:NilClass (NoMethodError)
from /Users/sands/.rvm/gems/ruby-1.9.2-p320@newdev/gems/pdf-reader-1.0.0/lib/pdf/reader.rb:224:in
pages'
from pdf2text.rb:11:in'

This refers to the pages[] hash being nil for some reason, in reader.rb:

def page_count
pages = @objects.deref(root[:Pages])
@page_count ||= pages[:Count]
end

The reader initializes on the pdf file correctly because I can call
reader.version and it reports back fine, but getting to the page level (on
OS/X 10.8.2) simply doesn't work for this PDF, and no clues as to why are
provided by the error message.

Cheers,

Sands Fish


Reply to this email directly or view it on GitHubhttps://github.com/yob/pdf-reader/issues/76.

James, does your email address have a single-quote character in it? Doesn't like it in GMail. Will send the PDF once I can.

-S

Owner

yob commented Jan 20, 2013

Damn you autocorrect. My address is james@yob.id.au
On 20/01/2013 12:57 AM, "Sands Fish" notifications@github.com wrote:

James, does your email address have a single-quote character in it?
Doesn't like it in GMail. Will send the PDF once I can.

-S


Reply to this email directly or view it on GitHubhttps://github.com/yob/pdf-reader/issues/76#issuecomment-12464136.

Owner

yob commented Jan 20, 2013

Thanks for the file. If we can discover the underlying issue I'll manually create a new file for a test case and delete your sample.

When I use the pdf_text binary to try and trigger the same issue you're getting, I see a different exception.

⚡ pdf_text foo.pdf
/home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/filter/flate.rb:34:in `rescue in filter': Error occured  hile inflating a compressed stream (Zlib::DataError: invalid distance too far back) (PDF::Reader::MalformedPDFError)
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/filter/flate.rb:17:in `filter'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:63:in `block in unfiltered_data'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:in `each'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:in `each_with_index'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:in `unfiltered_data'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_stream.rb:11:in `initialize'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:86:in `new'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:86:in `[]'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:97:in `object'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader.rb:138:in `page_count'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader.rb:225:in `pages'
    from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/bin/pdf_text:11:in `<top (required)>'
    from /home/jh/.gem/ruby/1.9.1/bin/pdf_text:23:in `load'
    from /home/jh/.gem/ruby/1.9.1/bin/pdf_text:23:in `<main>'

Do you get anything like that or always the nil exception? What version of ruby are you running?

James, always this one. Version info below...

sands$ *ruby pdf2text.rb aeronautics-gravity-reducing-propulsion.pdf *
PDF Version : 1.6
/gems/pdf-reader-1.0.0/lib/pdf/reader.rb:138:in page_count': undefined method[]' for nil:NilClass (NoMethodError)
from /Users/sands/.rvm/gems/ruby-1.9.2-p320@newdev/gems/pdf-reader-1.0.0/lib/pdf/reader.rb:224:in
pages' from pdf2text.rb:11:in

'

sands$* ruby -v*
ruby 1.9.2p320 (2012-04-20 revision 35421) [x86_64-darwin12.2.0]

sands$ gem list |grep pdf
pdf-reader (1.0.0)

On Sun, Jan 20, 2013 at 6:22 AM, James Healy notifications@github.comwrote:

Thanks for the file. If we can discover the underlying issue I'll manually
create a new file for a test case and delete your sample.

When I use the pdf_text binary to try and trigger the same issue you're
getting, I see a different exception.

⚡ pdf_text aeronautics-gravity-reducing-propulsion.pdf
/home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/filter/flate.rb:34:in rescue in filter': Error occured hile inflating a compressed stream (Zlib::DataError: invalid distance too far back) (PDF::Reader::MalformedPDFError) from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/filter/flate.rb:17:infilter'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:63:in block in unfiltered_data' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:ineach'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:in each_with_index' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:inunfiltered_data'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_stream.rb:11:in initialize' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:86:innew'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:86:in []' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:97:inobject'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader.rb:138:in page_count' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader.rb:225:inpages'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/bin/pdf_text:11:in <top (required)>' from /home/jh/.gem/ruby/1.9.1/bin/pdf_text:23:inload'
from /home/jh/.gem/ruby/1.9.1/bin/pdf_text:23:in `

'

Do you get anything like that or always the nil exception? What version of
ruby are you running?


Reply to this email directly or view it on GitHubhttps://github.com/yob/pdf-reader/issues/76#issuecomment-12469100.

Owner

yob commented Jan 22, 2013

Can you paste the contents of pdf2text.rb?

note that i'm not clear on how to access the page content for the

aggregation i'm attempting, but it errors out before it gets there, so it's
moot for now

require 'pdf-reader'

reader = PDF::Reader.new("aeronautics-gravity-reducing-propulsion.pdf")

puts "PDF Version : #{reader.pdf_version}"

pdf_textfile = File.open('aero_text.txt', 'w')

reader.pages.each do |page|
    pdf_textfile << page.text   # or page.raw_content ?
end

pdf_textfile.close

On Tue, Jan 22, 2013 at 4:32 AM, James Healy notifications@github.comwrote:

Can you paste the contents of pdf2text.rb?


Reply to this email directly or view it on GitHubhttps://github.com/yob/pdf-reader/issues/76#issuecomment-12536550.

Owner

yob commented Feb 25, 2013

Unfortunately I can't reproduce this error on my system, so I can't fix it. I'll leave the ticket open in case I have a flash of inspiration.

sorry!

Ah, that's too bad. Maybe I can find another system to attempt it on and
rule out a part of the stack that might be at fault.
On Feb 25, 2013 6:08 AM, "James Healy" notifications@github.com wrote:

Unfortunately I can't reproduce this error on my system, so I can't fix
it. I'll leave the ticket open in case I have a flash of inspiration.

sorry!


Reply to this email directly or view it on GitHubhttps://github.com/yob/pdf-reader/issues/76#issuecomment-14035972.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment