Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructured node mapping so common cases are evaluated first, saving unnecessary comparisons. #93

Merged
merged 1 commit into from Dec 17, 2012

Conversation

nirvdrum
Copy link
Contributor

The execution savings on this one weren't massive, but with this change, Regexp#=== and String#=== moved further down the hotspot list. Without #92 applied, Regexp#=== was the # 1 hotspot. With it applied, it moved down to # 3. With this change applied on top of that, it moves down to # 6. String#=== moves from # 11 to # 37.

I was testing this change out with two competing YAML files. One makes heavy use of serialized ruby/object and the other makes heavy use of ruby/struct. I gave ruby/struct preference, but it was rather arbitrary. I broke the tie by giving preference to VCR, which serializes a lot of structs, and seems to be a common use of Psych. I could go either way with it though.

Another interesting idea would be to implement the matches as a self-organizing list, which would allow the engine to adapt to the underlying document structure. Each match would have to be order-independent, but I think in this case they are (accounted for by moving Rational and Complex under the general ruby/object match).

@tenderlove
Copy link
Member

Rather than doing this, why don't we combine all of the when to 1 regexp? Then we could delegate to a method based on a hash lookup. e.g. this:

        when /^!ruby\/array:(.*)$/
          # ...
        when /^!ruby\/struct:?(.*)?$/
          # ...

could become this:

        when /^(!ruby\/(?:array|struct)):(.*)$/
          send REVIVE[$1], $2

I think most of the when clauses could be collapsed in that way. The resulting regexp would probably be ridiculous, but it would eventually result in one call to Regexp#===. wdyt?

@nirvdrum
Copy link
Contributor Author

I'm a fan of that idea. It's similar in spirit to the self-organizing list without a lot of the complexity. You just need to ensure that the matches delegate properly. E.g., '!ruby/object:Complex', '!ruby/object:Rational', and !ruby/object:Pathname all match the same general pattern but need very different bodies. The first two are easily looked up in a hash, the third one not so much. So, you'd need to special-case that or dynamically add closures for each custom class.

As you note, this does come at the cost of clarity. Documentation here would help a lot.

@tenderlove
Copy link
Member

I think we can handle the ruby/object case like so:

def complex match
  p :complex => match
end

def rational match
  p :rational => match
end

def object match
  p :object => [match, (match.string.split(':', 2)[1] || "Object")]
end

METHODS = {
  '!ruby/object:Complex'  => :complex,
  '!ruby/object:Rational' => :rational,
  '!ruby/object'          => :object,
}

re = %r{
  ^!ruby/object:(?:Complex|Rational)$ |
  ^!ruby/object
}x

[
  '!ruby/object:Complex',
  '!ruby/object:Rational',
  '!ruby/object:MyObject',
  '!ruby/object',
].each do |str|
  match = re.match(str)
  send METHODS[match.to_s], match
end

Do you want to try this and see what your benchmarks yield?

@nirvdrum
Copy link
Contributor Author

So, the number of Regexp#=== calls drops off, but it's equally offset by the number of hash code calculations and hash lookups. It's almost an even trade-off. I'll play with a couple more ideas in this space, but as of now the two approaches seem to be a toss-up so it's just a matter of whichever is more aesthetically pleasing.

@tenderlove
Copy link
Member

Wow, really? I'm surprised that hash calculation on a string would be so expensive. Do you have the diffs (and a benchmark) around so I can poke at it?

@nirvdrum
Copy link
Contributor Author

I pushed what I had up here (still very much in-progress): https://github.com/nirvdrum/psych/tree/new_mapping_approach

I'm currently working with a YAML file the author of #84 emailed me. I attached the test script, YAML, and generated profile output here: https://gist.github.com/25b6c191482276f5624b

Note in the new.profile how far Kernel#hash and Hash#empty climbed.

@tenderlove
Copy link
Member

Ugh. I've come to about the same conclusion. I added a branch here with the hash lookups and delegate. I've also updated the benchmark with an iteration per second test, and also a null test here. So the fastest we can possibly get is the null parsing speed. Otherwise, we'll need to start looking at improving libyaml ourselves.

Anyway, I'm happy to merge this PR, or branch. Which do you like better?

@tenderlove
Copy link
Member

bump?

@nirvdrum
Copy link
Contributor Author

Sorry, I'm going to look at this tonight. With both JRuby 1.7.1 and MRI 1.9.3-p327 augmenting their hash implementations, I'd like to see if the hash variant performs any better now.

@tenderlove
Copy link
Member

@nirvdrum no problem! Thanks for taking a look. I appreciate it.

@nirvdrum
Copy link
Contributor Author

I couldn't find any material difference even with newer rubies. So, I guess it's just a stylistic thing. I find the current regexp match a bit easier to follow than a map to method names. Thus, my vote would be to merge this pull request over the other branch. But, I don't think the other branch is so hard to read that I'd oppose merging that either.

@tenderlove
Copy link
Member

I'll pull in yours because I think I like it better.

@tenderlove tenderlove merged commit 2ceb7b6 into ruby:master Dec 17, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants