Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown font F1 (PDF::Reader::MalformedPDFError) bugfix #17

Closed
wants to merge 2 commits into from

Conversation

posgen
Copy link

@posgen posgen commented Jul 22, 2011

the alpha branch bugfix didn't work for me. I have fixed it, please check it out

Just today I released a alpha version of PDF::Reader (0.11.0.alpha)
with a new improved API and I'm wondering if the new API has the same
issue. If you have time to give it a go, checkout the README in the
alpha gem for a guide on getting started.
http://groups.google.com/group/pdf-reader/browse_thread/thread/7c778a1b0b63a846/c4fad15552af8397?lnk=gst&q=Unknown+font#c4fad15552af8397

@yob
Copy link
Owner

yob commented Jan 4, 2012

I'm closing this old issue for now, please re-open it if it's still an applicable to the latest rc release

@yob yob closed this Jan 4, 2012
@Pumbus
Copy link

Pumbus commented Mar 8, 2012

Just faced with this issue in latest 1.0.0 version, so i believe this is still worth to look at.

@yob
Copy link
Owner

yob commented Mar 9, 2012

@Pumbus can you provide a simple code snippet and sample PDF that trigger this in 1.0?

@Pumbus
Copy link

Pumbus commented Mar 12, 2012

I will try, but it contains client's personal info which could be an issue. In the main time is there anything apart from pdf example i can do to help with fixing it? Maybe some debug info i can provide or something?

@yob
Copy link
Owner

yob commented Mar 12, 2012

Unfortunately it's diabolically difficult to debug these kind of issues without the target PDF. Can you start by posting just some sample code that triggers the exception?

@Pumbus
Copy link

Pumbus commented Mar 12, 2012

Roger, will do my best to get perm for post file here. As for code it's pretty simple:

pdf = File.new(self.file).read
receiver = PageTextReceiver.new
PDF::Reader.string(pdf, receiver)
receiver.content

@yob
Copy link
Owner

yob commented Mar 12, 2012

Right. That's the old deprecated API that is I'm not keen on supporting. Can you try this code instead?

reader = PDF::Reader.new("filename.pdf")
reader.page.each do |page|
    puts page.text
end

@Pumbus
Copy link

Pumbus commented Mar 13, 2012

Heh, that's done the trick! Thank you!

Im able to parse pdf files were raising MalformedPDFError now, however some others which were working fine with pdf-reader 0.10.0 got broken now:

reader = PDF::Reader.new('file.pdf')
=> #<PDF::Reader:0x00000003cd66f0 @objects=<PDF::Reader::ObjectHash size: 105>>
ruby-1.9.2-p180 :019 > reader.pages.map(&:text)
TypeError: can't convert nil into String
from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.0.0/lib/pdf/reader/buffer.rb:214:in prepare_inline_token' from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.0.0/lib/pdf/reader/buffer.rb:158:inblock in prepare_tokens'
from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.0.0/lib/pdf/reader/buffer.rb:153:in times' from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.0.0/lib/pdf/reader/buffer.rb:153:inprepare_tokens'
from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.0.0/lib/pdf/reader/buffer.rb:107:in token' from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.0.0/lib/pdf/reader/parser.rb:71:inparse_token'
from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.0.0/lib/pdf/reader/page.rb:128:in content_stream' from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.0.0/lib/pdf/reader/page.rb:95:inwalk'
from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.0.0/lib/pdf/reader/page.rb:65:in text' from (irb):19:inmap'
from (irb):19
from /home/roman/.rvm/rubies/ruby-1.9.2-p180/bin/irb:16:in `

'
ruby-1.9.2-p180 :020 >

Trying to figure out why now, but please help if you can. Also maybe it could be useful to mark old deprecated API as deprecated. I didn't know it's deprecated until you've told me, so could be confused for others as well.

@yob
Copy link
Owner

yob commented Mar 13, 2012

The prepare_inline_token bug is fixed in master. Can you add pdf-reader to your Gemfile as a :git entry? I'm planning to release 1.1 before too long, but using a :git entry will let you continue working in the interim.

The deprecation is only soft for now. The old API should still work as it used to, it's just not getting any active development and all documentation/examples use the new API. In a future release I'll start printing a deprecation warning.

@Pumbus
Copy link

Pumbus commented May 15, 2012

Hey James!

Just got back to that issue. Sorry it didn't happen earlier :)
Update to pdf-reader-1.1.1 just now and started to test it with my code, but faced with the same prepare_inline_token bug as before:

reader = PDF::Reader.new('file.pdf')
=> #<PDF::Reader:0x00000001ac8180 @objects=<PDF::Reader::ObjectHash size: 113>>
ruby-1.9.2-p180 :003 > reader.pages.map(&:text)
TypeError: can't convert nil into String
from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.1.1/lib/pdf/reader/buffer.rb:219:in prepare_inline_token' from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.1.1/lib/pdf/reader/buffer.rb:159:inblock in prepare_tokens'
from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.1.1/lib/pdf/reader/buffer.rb:154:in times' from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.1.1/lib/pdf/reader/buffer.rb:154:inprepare_tokens'
from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.1.1/lib/pdf/reader/buffer.rb:108:in token' from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.1.1/lib/pdf/reader/parser.rb:71:inparse_token'
from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.1.1/lib/pdf/reader/page.rb:128:in content_stream' from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.1.1/lib/pdf/reader/page.rb:95:inwalk'
from /home/roman/.rvm/gems/ruby-1.9.2-p180/gems/pdf-reader-1.1.1/lib/pdf/reader/page.rb:65:in text' from (irb):3:inmap'
from (irb):3

Is that supposed to be fixed in 1.1.1 or maybe something changed there?

@yob
Copy link
Owner

yob commented May 20, 2012

I'll need a copy of a file that is triggering this exception.

Can you please open a new ticket for it and email me a sample file?

@Pumbus
Copy link

Pumbus commented May 22, 2012

Hi James,

Ryan Stawarz should be contacting you any minute now about this issue,
as he found what caused it in gem's code.

Thank you,
-Alex

On 05/20/2012 06:19 AM, James Healy wrote:

I'll need a copy of a file that is triggering this exception.

Can you please open a new ticket for it and email me a sample file?


Reply to this email directly or view it on GitHub:
#17 (comment)

@rstawarz
Copy link
Contributor

Added issue #55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants