Skip to content

Commit

Permalink
experiment with caching the tokenised content of Form XObjects
Browse files Browse the repository at this point in the history
* Form XObjects are designed to save disk space and parsing time by
  reusing content across pages
* it doesn't make sense to parse them over and over
* This is a silly global variable cache to just test the theory and see
  what happens to memory usage and CPU time
  • Loading branch information
yob committed Jun 10, 2012
1 parent a7adfdd commit 0c3f55b
Showing 1 changed file with 23 additions and 4 deletions.
27 changes: 23 additions & 4 deletions lib/pdf/reader/form_xobject.rb
Expand Up @@ -65,12 +65,31 @@ def callback(receivers, name, params=[])
end end
end end


def md5(data)
digest = Digest::MD5.new
digest << data
digest.hexdigest
end

def get_tokens(instructions)
cache_key = md5(instructions)
$global_cache ||= {}
$global_cache[cache_key] ||= begin
tokens = []
buffer = Buffer.new(StringIO.new(instructions), :content_stream => true)
parser = Parser.new(buffer, @objects)
while (token = parser.parse_token(PagesStrategy::OPERATORS))
tokens << token
end
tokens
end
end

def content_stream(receivers, instructions) def content_stream(receivers, instructions)
buffer = Buffer.new(StringIO.new(instructions), :content_stream => true) params = []
parser = Parser.new(buffer, @objects) tokens = get_tokens(instructions)
params = []


while (token = parser.parse_token(PagesStrategy::OPERATORS)) while token = tokens.shift
if token.kind_of?(Token) and PagesStrategy::OPERATORS.has_key?(token) if token.kind_of?(Token) and PagesStrategy::OPERATORS.has_key?(token)
callback(receivers, PagesStrategy::OPERATORS[token], params) callback(receivers, PagesStrategy::OPERATORS[token], params)
params.clear params.clear
Expand Down

0 comments on commit 0c3f55b

Please sign in to comment.