Skip to content
Browse files

experiment with caching the tokenised content of Form XObjects

* Form XObjects are designed to save disk space and parsing time by
  reusing content across pages
* it doesn't make sense to parse them over and over
* This is a silly global variable cache to just test the theory and see
  what happens to memory usage and CPU time
  • Loading branch information...
1 parent a7adfdd commit 0c3f55b43bea8ed910fcb89bf50808a489af5d10 @yob committed
Showing with 23 additions and 4 deletions.
  1. +23 −4 lib/pdf/reader/form_xobject.rb
View
27 lib/pdf/reader/form_xobject.rb
@@ -65,12 +65,31 @@ def callback(receivers, name, params=[])
end
end
+ def md5(data)
+ digest = Digest::MD5.new
+ digest << data
+ digest.hexdigest
+ end
+
+ def get_tokens(instructions)
+ cache_key = md5(instructions)
+ $global_cache ||= {}
+ $global_cache[cache_key] ||= begin
+ tokens = []
+ buffer = Buffer.new(StringIO.new(instructions), :content_stream => true)
+ parser = Parser.new(buffer, @objects)
+ while (token = parser.parse_token(PagesStrategy::OPERATORS))
+ tokens << token
+ end
+ tokens
+ end
+ end
+
def content_stream(receivers, instructions)
- buffer = Buffer.new(StringIO.new(instructions), :content_stream => true)
- parser = Parser.new(buffer, @objects)
- params = []
+ params = []
+ tokens = get_tokens(instructions)
- while (token = parser.parse_token(PagesStrategy::OPERATORS))
+ while token = tokens.shift
if token.kind_of?(Token) and PagesStrategy::OPERATORS.has_key?(token)
callback(receivers, PagesStrategy::OPERATORS[token], params)
params.clear

0 comments on commit 0c3f55b

Please sign in to comment.
Something went wrong with that request. Please try again.