Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeRewriter optimizations and default diagnostics engine #721

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

marcandre
Copy link
Contributor

Using memory_profiler on RuboCop revealed a lot of allocations from TreeRewriter.

Before looking at reducing the number of allocated TreeRewriter, I thought it best to look at that class first. Clearly, I had not considered object allocations when writing it. Most are to handle clobbering exceptions which should be a rare occurrences and so should be avoided.

This PR fixes all these issues and introduces TreeRewriter.default_diagnostics, which can be useful for consumers as well as avoid allocating a lambda each time.

Let me know if multiple PRs would be preferable.

Below are memory allocations for RuboCop running on 67 files. Lines marked with '*' disappear with this PR, the line 'ok' remains as it allocated the root action, which will be typically necessary.

allocated memory by location
  19.11 MB  /Users/mal/rubocop-ast/lib/rubocop/ast/node_pattern.rb:776
  12.56 MB  /Users/mal/.rvm/gems/ruby-2.7.1/gems/rubocop-rspec-1.39.0/lib/rubocop/cop/rspec/cop.rb:69
   5.19 MB  /Users/mal/rubocop/lib/rubocop/cop/base.rb:193
*  4.44 MB  /Users/mal/parser/lib/parser/source/tree_rewriter.rb:110
   4.04 MB  /Users/mal/rubocop/lib/rubocop/cop/team.rb:47
   3.89 MB  /Users/mal/rubocop/lib/rubocop/cop/commissioner.rb:86
   2.84 MB  /Users/mal/.rvm/rubies/ruby-2.7.1/lib/ruby/2.7.0/set.rb:94
   2.80 MB  /Users/mal/rubocop-ast/lib/rubocop/ast/node.rb:569
   2.79 MB  /Users/mal/.rvm/rubies/ruby-2.7.1/lib/ruby/2.7.0/find.rb:49
   2.64 MB  /Users/mal/parser/lib/parser/source/buffer.rb:197
ok 2.32 MB  /Users/mal/parser/lib/parser/source/tree_rewriter.rb:117
   2.32 MB  /Users/mal/rubocop/lib/rubocop/cop/base.rb:288
   2.22 MB  /Users/mal/rubocop/lib/rubocop/cop/metrics/utils/code_length_calculator.rb:73
   2.18 MB  /Users/mal/.rvm/rubies/ruby-2.7.1/lib/ruby/2.7.0/find.rb:51
   2.11 MB  /Users/mal/rubocop/lib/rubocop/cop/commissioner.rb:104
*  2.11 MB  /Users/mal/parser/lib/parser/source/tree_rewriter.rb:103
*  2.11 MB  /Users/mal/parser/lib/parser/source/tree_rewriter.rb:366
   2.11 MB  /Users/mal/rubocop/lib/rubocop/cop/corrector.rb:26
*  1.90 MB  /Users/mal/parser/lib/parser/source/tree_rewriter.rb:113
   1.90 MB  /Users/mal/rubocop/lib/rubocop/cop/base.rb:293
   1.86 MB  /Users/mal/rubocop-ast/lib/rubocop/ast/node/mixin/method_dispatch_node.rb:34
   1.78 MB  /Users/mal/parser/lib/parser.rb:67
   1.58 MB  /Users/mal/.rvm/rubies/ruby-2.7.1/lib/ruby/2.7.0/psych/tree_builder.rb:97
   1.54 MB  /Users/mal/parser/lib/parser/source/range.rb:133
   1.54 MB  /Users/mal/rubocop-ast/lib/rubocop/ast/node.rb:62
   1.43 MB  /Users/mal/rubocop-ast/lib/rubocop/ast/builder.rb:73
   1.36 MB  /Users/mal/rubocop-ast/lib/rubocop/ast/node_pattern.rb:772
   1.32 MB  /Users/mal/rubocop-ast/lib/rubocop/ast/node_pattern.rb:737
   1.32 MB  /Users/mal/parser/lib/parser/ruby24.rb:1798
   1.31 MB  (eval):3
   1.31 MB  /Users/mal/rubocop-ast/lib/rubocop/ast/node.rb:151
   1.31 MB  /Users/mal/parser/lib/parser/ruby24.rb:878
   1.26 MB  /Users/mal/rubocop/lib/rubocop/cop/naming/predicate_name.rb:72
   1.24 MB  /Users/mal/rubocop/lib/rubocop/core_ext/string.rb:20
   1.21 MB  /Users/mal/parser/lib/parser/lexer.rb:23613

Copy link
Collaborator

@iliabylich iliabylich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delay. I fully support the idea, thanks a lot. I've left a few design comments

# Provides access to a diagnostic engine.
# By default: self.class.default_diagnostics
#
def diagnostics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it still could be set in the constructor. Having initialization in a single place (like it was before) seems to be more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it would be in the constructor if it didn't need to be dup'ed, but it does...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but even if you need .dup you still can do it in the constructor. Initialization of this field doesn't have to be lazy

lib/parser/source/tree_rewriter.rb Show resolved Hide resolved
lib/parser/source/tree_rewriter.rb Show resolved Hide resolved
def check_policy_validity
invalid = @policy.values - ACTIONS
raise ArgumentError, "Invalid policy: #{invalid.join(', ')}" unless invalid.empty?
ACTIONS = %i[accept warn raise].to_set.freeze
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually better to use sets for 3 symbols? IIRC sets in ruby are hashes, and small hashes are arrays. Do you get anything from this to_set? I thought it makes it slower because computing .hash and comparing it is slower than comparing symbols that are numbers internally. Not a blocker at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. I'm not sure I can explain why, but even for a three element set, a non matching include check is faster on a Set than on an Array, so is a matching check on the 2nd or 3rd element. Only a matching check on the first element seems faster on an Array.
Beyond performance, I feel that an object to call include? on should be a Set, and Set should optimize that call...

def policy(event)
return :raise if event == :crossing_insertions

instance_variable_get(EVENT_TO_POLICY.fetch(event))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rewrite it to case. The difference in terms of speed it too small but it makes it harder to follow. Also, this way you can combine it with return :raise above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. You're absolutely right.

# We need a range that would be jugded as containing all other ranges,
# including 0...0 and size...size:
all_encompassing_range = @source_buffer.source_range.adjust(begin_pos: -1, end_pos: +1)
@action_root = TreeRewriter::Action.new(all_encompassing_range, @enforcer)
@action_root = TreeRewriter::Action.new(all_encompassing_range, self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing self as an argument is usually a sign of a bad composition (just like passing a private instance method 😄 )

Also, the argument on the TreeRewriter::Action is not an enforcer anymore, it's a tree_rewriter, and so it introduces a bi-directional dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very legitimate comment. I didn't document it, but I thought that asking for an object responding to call, or an object responding to enforce_policy was pretty similar. That's also the idea in not renaming it; it's an enforcer and needs responding to a single call enforce_policy... It just so happens that TreeRewriter responds to enforce_policy ;-)

Do you have a suggestion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like both original/new implementations from this POV. Bi-directional relationships between objects cause troubles from time to time, so I'd try to avoid it.

Previously only a single private method was shared with Action, and so it was a smaller violation of encapsulation (but it still was a violation). I'd personally keep the original version only for that reason.

At the same time your implementation looks better to me in terms of types and ifaces. Both implementation have issues, the question is which one has more downsides and I don't see "the only" answer here. Up to you to decide.

diag = Parser::Diagnostic.new(POLICY_TO_LEVEL[action], event, arguments, range)
@diagnostics.process(diag)
engine = @diagnostics || self.class.default_diagnostics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be just diagnostic (or @diagnostic if you move initialization back to constructor)

@marcandre
Copy link
Contributor Author

As usual, one of the most insightful code review I ever get 🙇‍♂️
I'll amend the PR tomorrow; let me know what you suggest about the callback to enforce policy though

@marcandre marcandre marked this pull request as draft July 19, 2020 04:30
@gregmolnar
Copy link

@marcandre do you still plan to get this merged?

@marcandre
Copy link
Contributor Author

@gregmolnar thanks for the ping, I completely forgot about this. Let me check it again over the weekend...

@marcandre
Copy link
Contributor Author

Haven't had a second this weekend, still on my "todo list"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants