Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Restructured node mapping so common cases are evaluated first, saving unnecessary comparisons. #93

Merged
merged 1 commit into from

2 participants

@nirvdrum

The execution savings on this one weren't massive, but with this change, Regexp#=== and String#=== moved further down the hotspot list. Without #92 applied, Regexp#=== was the # 1 hotspot. With it applied, it moved down to # 3. With this change applied on top of that, it moves down to # 6. String#=== moves from # 11 to # 37.

I was testing this change out with two competing YAML files. One makes heavy use of serialized ruby/object and the other makes heavy use of ruby/struct. I gave ruby/struct preference, but it was rather arbitrary. I broke the tie by giving preference to VCR, which serializes a lot of structs, and seems to be a common use of Psych. I could go either way with it though.

Another interesting idea would be to implement the matches as a self-organizing list, which would allow the engine to adapt to the underlying document structure. Each match would have to be order-independent, but I think in this case they are (accounted for by moving Rational and Complex under the general ruby/object match).

@tenderlove
Owner

Rather than doing this, why don't we combine all of the when to 1 regexp? Then we could delegate to a method based on a hash lookup. e.g. this:

        when /^!ruby\/array:(.*)$/
          # ...
        when /^!ruby\/struct:?(.*)?$/
          # ...

could become this:

        when /^(!ruby\/(?:array|struct)):(.*)$/
          send REVIVE[$1], $2

I think most of the when clauses could be collapsed in that way. The resulting regexp would probably be ridiculous, but it would eventually result in one call to Regexp#===. wdyt?

@nirvdrum

I'm a fan of that idea. It's similar in spirit to the self-organizing list without a lot of the complexity. You just need to ensure that the matches delegate properly. E.g., '!ruby/object:Complex', '!ruby/object:Rational', and !ruby/object:Pathname all match the same general pattern but need very different bodies. The first two are easily looked up in a hash, the third one not so much. So, you'd need to special-case that or dynamically add closures for each custom class.

As you note, this does come at the cost of clarity. Documentation here would help a lot.

@tenderlove
Owner

I think we can handle the ruby/object case like so:

def complex match
  p :complex => match
end

def rational match
  p :rational => match
end

def object match
  p :object => [match, (match.string.split(':', 2)[1] || "Object")]
end

METHODS = {
  '!ruby/object:Complex'  => :complex,
  '!ruby/object:Rational' => :rational,
  '!ruby/object'          => :object,
}

re = %r{
  ^!ruby/object:(?:Complex|Rational)$ |
  ^!ruby/object
}x

[
  '!ruby/object:Complex',
  '!ruby/object:Rational',
  '!ruby/object:MyObject',
  '!ruby/object',
].each do |str|
  match = re.match(str)
  send METHODS[match.to_s], match
end

Do you want to try this and see what your benchmarks yield?

@nirvdrum

So, the number of Regexp#=== calls drops off, but it's equally offset by the number of hash code calculations and hash lookups. It's almost an even trade-off. I'll play with a couple more ideas in this space, but as of now the two approaches seem to be a toss-up so it's just a matter of whichever is more aesthetically pleasing.

@tenderlove
Owner

Wow, really? I'm surprised that hash calculation on a string would be so expensive. Do you have the diffs (and a benchmark) around so I can poke at it?

@nirvdrum

I pushed what I had up here (still very much in-progress): https://github.com/nirvdrum/psych/tree/new_mapping_approach

I'm currently working with a YAML file the author of #84 emailed me. I attached the test script, YAML, and generated profile output here: https://gist.github.com/25b6c191482276f5624b

Note in the new.profile how far Kernel#hash and Hash#empty climbed.

@tenderlove
Owner

Ugh. I've come to about the same conclusion. I added a branch here with the hash lookups and delegate. I've also updated the benchmark with an iteration per second test, and also a null test here. So the fastest we can possibly get is the null parsing speed. Otherwise, we'll need to start looking at improving libyaml ourselves.

Anyway, I'm happy to merge this PR, or branch. Which do you like better?

@tenderlove
Owner

bump?

@nirvdrum

Sorry, I'm going to look at this tonight. With both JRuby 1.7.1 and MRI 1.9.3-p327 augmenting their hash implementations, I'd like to see if the hash variant performs any better now.

@tenderlove
Owner

@nirvdrum no problem! Thanks for taking a look. I appreciate it.

@nirvdrum

I couldn't find any material difference even with newer rubies. So, I guess it's just a stylistic thing. I find the current regexp match a bit easier to follow than a map to method names. Thus, my vote would be to merge this pull request over the other branch. But, I don't think the other branch is so hard to read that I'd oppose merging that either.

@tenderlove
Owner

I'll pull in yours because I think I like it better.

@tenderlove tenderlove merged commit 2ceb7b6 into from
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Oct 13, 2012
  1. @nirvdrum
This page is out of date. Refresh to see the latest.
Showing with 37 additions and 35 deletions.
  1. +37 −35 lib/psych/visitors/to_ruby.rb
View
72 lib/psych/visitors/to_ruby.rb
@@ -141,28 +141,6 @@ def visit_Psych_Nodes_Mapping o
return revive_hash({}, o) unless o.tag
case o.tag
- when /^!(?:str|ruby\/string)(?::(.*))?/, 'tag:yaml.org,2002:str'
- klass = resolve_class($1)
- members = Hash[*o.children.map { |c| accept c }]
- string = members.delete 'str'
-
- if klass
- string = klass.allocate.replace string
- register(o, string)
- end
-
- init_with(string, members.map { |k,v| [k.to_s.sub(/^@/, ''),v] }, o)
- when /^!ruby\/array:(.*)$/
- klass = resolve_class($1)
- list = register(o, klass.allocate)
-
- members = Hash[o.children.map { |c| accept c }.each_slice(2).to_a]
- list.replace members['internal']
-
- members['ivars'].each do |ivar, v|
- list.instance_variable_set ivar, v
- end
- list
when /^!ruby\/struct:?(.*)?$/
klass = resolve_class($1)
@@ -187,6 +165,43 @@ def visit_Psych_Nodes_Mapping o
Struct.new(*h.map { |k,v| k.to_sym }).new(*h.map { |k,v| v })
end
+ when /^!ruby\/object:?(.*)?$/
+ name = $1 || 'Object'
+
+ if name == 'Complex'
+ h = Hash[*o.children.map { |c| accept c }]
+ register o, Complex(h['real'], h['image'])
+ elsif name == 'Rational'
+ h = Hash[*o.children.map { |c| accept c }]
+ register o, Rational(h['numerator'], h['denominator'])
+ else
+ obj = revive((resolve_class(name) || Object), o)
+ obj
+ end
+
+ when /^!(?:str|ruby\/string)(?::(.*))?/, 'tag:yaml.org,2002:str'
+ klass = resolve_class($1)
+ members = Hash[*o.children.map { |c| accept c }]
+ string = members.delete 'str'
+
+ if klass
+ string = klass.allocate.replace string
+ register(o, string)
+ end
+
+ init_with(string, members.map { |k,v| [k.to_s.sub(/^@/, ''),v] }, o)
+ when /^!ruby\/array:(.*)$/
+ klass = resolve_class($1)
+ list = register(o, klass.allocate)
+
+ members = Hash[o.children.map { |c| accept c }.each_slice(2).to_a]
+ list.replace members['internal']
+
+ members['ivars'].each do |ivar, v|
+ list.instance_variable_set ivar, v
+ end
+ list
+
when '!ruby/range'
h = Hash[*o.children.map { |c| accept c }]
register o, Range.new(h['begin'], h['end'], h['excl'])
@@ -206,19 +221,6 @@ def visit_Psych_Nodes_Mapping o
end
set
- when '!ruby/object:Complex'
- h = Hash[*o.children.map { |c| accept c }]
- register o, Complex(h['real'], h['image'])
-
- when '!ruby/object:Rational'
- h = Hash[*o.children.map { |c| accept c }]
- register o, Rational(h['numerator'], h['denominator'])
-
- when /^!ruby\/object:?(.*)?$/
- name = $1 || 'Object'
- obj = revive((resolve_class(name) || Object), o)
- obj
-
when /^!map:(.*)$/, /^!ruby\/hash:(.*)$/
revive_hash resolve_class($1).new, o
Something went wrong with that request. Please try again.