New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructured node mapping so common cases are evaluated first, saving unnecessary comparisons. #93
Conversation
… unnecessary comparisons.
Rather than doing this, why don't we combine all of the when /^!ruby\/array:(.*)$/
# ...
when /^!ruby\/struct:?(.*)?$/
# ... could become this: when /^(!ruby\/(?:array|struct)):(.*)$/
send REVIVE[$1], $2 I think most of the |
I'm a fan of that idea. It's similar in spirit to the self-organizing list without a lot of the complexity. You just need to ensure that the matches delegate properly. E.g., As you note, this does come at the cost of clarity. Documentation here would help a lot. |
I think we can handle the def complex match
p :complex => match
end
def rational match
p :rational => match
end
def object match
p :object => [match, (match.string.split(':', 2)[1] || "Object")]
end
METHODS = {
'!ruby/object:Complex' => :complex,
'!ruby/object:Rational' => :rational,
'!ruby/object' => :object,
}
re = %r{
^!ruby/object:(?:Complex|Rational)$ |
^!ruby/object
}x
[
'!ruby/object:Complex',
'!ruby/object:Rational',
'!ruby/object:MyObject',
'!ruby/object',
].each do |str|
match = re.match(str)
send METHODS[match.to_s], match
end Do you want to try this and see what your benchmarks yield? |
So, the number of Regexp#=== calls drops off, but it's equally offset by the number of hash code calculations and hash lookups. It's almost an even trade-off. I'll play with a couple more ideas in this space, but as of now the two approaches seem to be a toss-up so it's just a matter of whichever is more aesthetically pleasing. |
Wow, really? I'm surprised that hash calculation on a string would be so expensive. Do you have the diffs (and a benchmark) around so I can poke at it? |
I pushed what I had up here (still very much in-progress): https://github.com/nirvdrum/psych/tree/new_mapping_approach I'm currently working with a YAML file the author of #84 emailed me. I attached the test script, YAML, and generated profile output here: https://gist.github.com/25b6c191482276f5624b Note in the new.profile how far Kernel#hash and Hash#empty climbed. |
Ugh. I've come to about the same conclusion. I added a branch here with the hash lookups and delegate. I've also updated the benchmark with an iteration per second test, and also a null test here. So the fastest we can possibly get is the null parsing speed. Otherwise, we'll need to start looking at improving libyaml ourselves. Anyway, I'm happy to merge this PR, or branch. Which do you like better? |
bump? |
Sorry, I'm going to look at this tonight. With both JRuby 1.7.1 and MRI 1.9.3-p327 augmenting their hash implementations, I'd like to see if the hash variant performs any better now. |
@nirvdrum no problem! Thanks for taking a look. I appreciate it. |
I couldn't find any material difference even with newer rubies. So, I guess it's just a stylistic thing. I find the current regexp match a bit easier to follow than a map to method names. Thus, my vote would be to merge this pull request over the other branch. But, I don't think the other branch is so hard to read that I'd oppose merging that either. |
I'll pull in yours because I think I like it better. |
The execution savings on this one weren't massive, but with this change, Regexp#=== and String#=== moved further down the hotspot list. Without #92 applied, Regexp#=== was the # 1 hotspot. With it applied, it moved down to # 3. With this change applied on top of that, it moves down to # 6. String#=== moves from # 11 to # 37.
I was testing this change out with two competing YAML files. One makes heavy use of serialized ruby/object and the other makes heavy use of ruby/struct. I gave ruby/struct preference, but it was rather arbitrary. I broke the tie by giving preference to VCR, which serializes a lot of structs, and seems to be a common use of Psych. I could go either way with it though.
Another interesting idea would be to implement the matches as a self-organizing list, which would allow the engine to adapt to the underlying document structure. Each match would have to be order-independent, but I think in this case they are (accounted for by moving Rational and Complex under the general ruby/object match).