Restructured node mapping so common cases are evaluated first, saving unnecessary comparisons. #93

nirvdrum · 2012-10-13T03:41:25Z

The execution savings on this one weren't massive, but with this change, Regexp#=== and String#=== moved further down the hotspot list. Without #92 applied, Regexp#=== was the # 1 hotspot. With it applied, it moved down to # 3. With this change applied on top of that, it moves down to # 6. String#=== moves from # 11 to # 37.

I was testing this change out with two competing YAML files. One makes heavy use of serialized ruby/object and the other makes heavy use of ruby/struct. I gave ruby/struct preference, but it was rather arbitrary. I broke the tie by giving preference to VCR, which serializes a lot of structs, and seems to be a common use of Psych. I could go either way with it though.

Another interesting idea would be to implement the matches as a self-organizing list, which would allow the engine to adapt to the underlying document structure. Each match would have to be order-independent, but I think in this case they are (accounted for by moving Rational and Complex under the general ruby/object match).

… unnecessary comparisons.

tenderlove · 2012-10-13T19:13:35Z

Rather than doing this, why don't we combine all of the when to 1 regexp? Then we could delegate to a method based on a hash lookup. e.g. this:

        when /^!ruby\/array:(.*)$/
          # ...
        when /^!ruby\/struct:?(.*)?$/
          # ...

could become this:

        when /^(!ruby\/(?:array|struct)):(.*)$/
          send REVIVE[$1], $2

I think most of the when clauses could be collapsed in that way. The resulting regexp would probably be ridiculous, but it would eventually result in one call to Regexp#===. wdyt?

nirvdrum · 2012-10-13T19:35:19Z

I'm a fan of that idea. It's similar in spirit to the self-organizing list without a lot of the complexity. You just need to ensure that the matches delegate properly. E.g., '!ruby/object:Complex', '!ruby/object:Rational', and !ruby/object:Pathname all match the same general pattern but need very different bodies. The first two are easily looked up in a hash, the third one not so much. So, you'd need to special-case that or dynamically add closures for each custom class.

As you note, this does come at the cost of clarity. Documentation here would help a lot.

tenderlove · 2012-10-13T22:18:51Z

I think we can handle the ruby/object case like so:

def complex match
  p :complex => match
end

def rational match
  p :rational => match
end

def object match
  p :object => [match, (match.string.split(':', 2)[1] || "Object")]
end

METHODS = {
  '!ruby/object:Complex'  => :complex,
  '!ruby/object:Rational' => :rational,
  '!ruby/object'          => :object,
}

re = %r{
  ^!ruby/object:(?:Complex|Rational)$ |
  ^!ruby/object
}x

[
  '!ruby/object:Complex',
  '!ruby/object:Rational',
  '!ruby/object:MyObject',
  '!ruby/object',
].each do |str|
  match = re.match(str)
  send METHODS[match.to_s], match
end

Do you want to try this and see what your benchmarks yield?

nirvdrum · 2012-10-15T22:43:40Z

So, the number of Regexp#=== calls drops off, but it's equally offset by the number of hash code calculations and hash lookups. It's almost an even trade-off. I'll play with a couple more ideas in this space, but as of now the two approaches seem to be a toss-up so it's just a matter of whichever is more aesthetically pleasing.

tenderlove · 2012-10-16T00:15:03Z

Wow, really? I'm surprised that hash calculation on a string would be so expensive. Do you have the diffs (and a benchmark) around so I can poke at it?

nirvdrum · 2012-10-16T00:36:25Z

I pushed what I had up here (still very much in-progress): https://github.com/nirvdrum/psych/tree/new_mapping_approach

I'm currently working with a YAML file the author of #84 emailed me. I attached the test script, YAML, and generated profile output here: https://gist.github.com/25b6c191482276f5624b

Note in the new.profile how far Kernel#hash and Hash#empty climbed.

tenderlove · 2012-10-17T03:35:14Z

Ugh. I've come to about the same conclusion. I added a branch here with the hash lookups and delegate. I've also updated the benchmark with an iteration per second test, and also a null test here. So the fastest we can possibly get is the null parsing speed. Otherwise, we'll need to start looking at improving libyaml ourselves.

Anyway, I'm happy to merge this PR, or branch. Which do you like better?

tenderlove · 2012-11-17T03:16:54Z

bump?

nirvdrum · 2012-11-19T13:27:30Z

Sorry, I'm going to look at this tonight. With both JRuby 1.7.1 and MRI 1.9.3-p327 augmenting their hash implementations, I'd like to see if the hash variant performs any better now.

tenderlove · 2012-11-19T16:57:55Z

@nirvdrum no problem! Thanks for taking a look. I appreciate it.

nirvdrum · 2012-12-17T11:57:19Z

I couldn't find any material difference even with newer rubies. So, I guess it's just a stylistic thing. I find the current regexp match a bit easier to follow than a map to method names. Thus, my vote would be to merge this pull request over the other branch. But, I don't think the other branch is so hard to read that I'd oppose merging that either.

tenderlove · 2012-12-17T17:58:34Z

I'll pull in yours because I think I like it better.

Restructured node mapping so common cases are evaluated first, saving…

2ceb7b6

… unnecessary comparisons.

tenderlove merged commit 2ceb7b6 into ruby:master Dec 17, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructured node mapping so common cases are evaluated first, saving unnecessary comparisons. #93

Restructured node mapping so common cases are evaluated first, saving unnecessary comparisons. #93

nirvdrum commented Oct 13, 2012

tenderlove commented Oct 13, 2012

nirvdrum commented Oct 13, 2012

tenderlove commented Oct 13, 2012

nirvdrum commented Oct 15, 2012

tenderlove commented Oct 16, 2012

nirvdrum commented Oct 16, 2012

tenderlove commented Oct 17, 2012

tenderlove commented Nov 17, 2012

nirvdrum commented Nov 19, 2012

tenderlove commented Nov 19, 2012

nirvdrum commented Dec 17, 2012

tenderlove commented Dec 17, 2012

Restructured node mapping so common cases are evaluated first, saving unnecessary comparisons. #93

Restructured node mapping so common cases are evaluated first, saving unnecessary comparisons. #93

Conversation

nirvdrum commented Oct 13, 2012

tenderlove commented Oct 13, 2012

nirvdrum commented Oct 13, 2012

tenderlove commented Oct 13, 2012

nirvdrum commented Oct 15, 2012

tenderlove commented Oct 16, 2012

nirvdrum commented Oct 16, 2012

tenderlove commented Oct 17, 2012

tenderlove commented Nov 17, 2012

nirvdrum commented Nov 19, 2012

tenderlove commented Nov 19, 2012

nirvdrum commented Dec 17, 2012

tenderlove commented Dec 17, 2012