page.text failing with: Undefined method `[]' #91

I have loaded a PDF (produced by our invoice system) into pdf-reader, but any attempt to extract page text is met with the following type of error:

1.9.3-p392 :006 > reader ='test.pdf')
 => #<PDF::Reader:0x838ccac @cache=<PDF::Reader::ObjectCache size: 0>, @objects=<PDF::Reader::ObjectHash size: 67>>
1.9.3-p392 :007 > reader.pages.first.text
NoMethodError: undefined method `[]' for #<PDF::Reader::Reference:0x83cd194 @id=3, @gen=0>
from /home/duncan/.rvm/gems/ruby-1.9.3-p392/gems/pdf-reader-1.3.2/lib/pdf/reader/page_layout.rb:17:in `initialize'
from /home/duncan/.rvm/gems/ruby-1.9.3-p392/gems/pdf-reader-1.3.2/lib/pdf/reader/page_text_receiver.rb:49:in `new'
from /home/duncan/.rvm/gems/ruby-1.9.3-p392/gems/pdf-reader-1.3.2/lib/pdf/reader/page_text_receiver.rb:49:in `content'
from /home/duncan/.rvm/gems/ruby-1.9.3-p392/gems/pdf-reader-1.3.2/lib/pdf/reader/page.rb:76:in `text'
from (irb):7
from /home/duncan/.rvm/rubies/ruby-1.9.3-p392/bin/irb:16:in `<main>'

System details:

  • ruby-1.9.3-p392 [ i686 ] via RVM
  • pdf-reader 1.3.2

The problem doesn't seem to occur with other PDFs.

yob commented Mar 14, 2013

Thanks for the report. Are you able to email me a sample PDF that triggers this exception? My address is


I'm already working on that ... catch is the PDF that's going bang is an actual customer invoice with commercially sensitive data on it.

I've asked folks for a de-identified PDF, and if that reproduces the error, I'll email it to you immediately.

@yob yob added a commit that closed this issue May 12, 2013
@yob the MediaBox might be an indirect object
* fixes #91
@yob yob closed this in 7871128 May 12, 2013

I never managed to get my hands on a de-identified PDF that reproduced the issue. Sorry :-/

@endymion endymion added a commit to endymion/pdf-reader-issue-91-demonstration that referenced this issue Dec 6, 2013
@endymion endymion Demonstrates the problem. 0ee2f2f
endymion commented Dec 6, 2013

Hi, I have a PDF that isn't too sensitive that demonstrates the problem. I was tinkering with the gem and I ran into the problem so I packed up what I was working on and pushed it to Github:

demo project:


I'm moving on to some other solution so this is not a problem for me. Just hoping to contribute...


hey, i'm having the a similar issue with some PDF's. see below for the error. it's not that it can't read the PDF because i can walk the tree correctly and i'm getting the text correctly out if it with walking and a seemingly related error with other PDFs that also 'walk' fine but where also the build in text exctraction doesn't work.

error 1
/Users/upnxt/.rvm/gems/ruby-2.0.0-p247/gems/pdf-reader-1.3.3/lib/pdf/reader/page_layout.rb:17:in initialize': undefined method[]' for #<PDF::Reader::Reference:0x007faeba36a180 @id=75, @gen=0> (NoMethodError)

error 2
/Users/upnxt/.rvm/gems/ruby-2.0.0-p247/gems/pdf-reader-1.3.3/lib/pdf/reader/width_calculator/built_in.rb:93:in `glyph_width': Unknown glyph width for 160 Helvetica-Bold (ArgumentError)

i made a very simple walker for extracting the text (but it doesn't handle whitespace well in many cases)

    class TextWalker
      def mine(text)
        @gold << text
      def show_text(text) 
        mine text
      def show_text_with_positioning(text)
        extracted = { |v, i| i.even? }.join()
        mine extracted
