Skip to content

Commit

Permalink
improve text extraction of built in fonts
Browse files Browse the repository at this point in the history
* if the font includes a difference table (allowing it to use non ASCII
  chars) then we need to do some extra work to extract the glyph width
  from the relevant AFM file
  • Loading branch information
yob committed Nov 26, 2012
1 parent a661d3d commit f9c4635
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 0 deletions.
17 changes: 17 additions & 0 deletions lib/pdf/reader/encoding.rb
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -111,6 +111,23 @@ def int_to_utf8_string(glyph_code)
@string_cache[glyph_code] ||= internal_int_to_utf8_string(glyph_code) @string_cache[glyph_code] ||= internal_int_to_utf8_string(glyph_code)
end end


# convert an integer glyph code into an Adobe glyph name.
#
# int_to_name(65)
# => :A
#
# TODO: this needs to be expanded to return the appropriate name for standard
# glyph codes in the encoding. 65 to :A, etc. At the moment it only
# handles glyphs in the difference table
#
def int_to_name(glyph_code)
if @enc_name == "Identity-H" || @enc_name == "Identity-V"
nil
else
@differences[glyph_code]
end
end

private private


def internal_int_to_utf8_string(glyph_code) def internal_int_to_utf8_string(glyph_code)
Expand Down
14 changes: 14 additions & 0 deletions lib/pdf/reader/width_calculator/built_in.rb
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@
require 'afm' require 'afm'
require 'pdf/reader/synchronized_cache' require 'pdf/reader/synchronized_cache'


module AFM
# this is a monkey patch for the AFM gem. hopefully my patch will be accepted
# upstream and I can drop this
class Font
def metrics_for_name(name)
@char_metrics[name.to_s]
end
end
end

class PDF::Reader class PDF::Reader
module WidthCalculator module WidthCalculator


Expand All @@ -28,6 +38,10 @@ def glyph_width(code_point)
return 0 if code_point.nil? || code_point < 0 return 0 if code_point.nil? || code_point < 0


m = @metrics.metrics_for(code_point) m = @metrics.metrics_for(code_point)
if m.nil?
name = @font.encoding.int_to_name(code_point)
m = @metrics.metrics_for_name(name)
end
m[:wx] m[:wx]
end end


Expand Down

0 comments on commit f9c4635

Please sign in to comment.