Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newrelic RPM Performance Regression #59

Closed
mnelson opened this issue Oct 6, 2014 · 24 comments
Closed

Newrelic RPM Performance Regression #59

mnelson opened this issue Oct 6, 2014 · 24 comments
Assignees
Labels

Comments

@mnelson
Copy link
Contributor

mnelson commented Oct 6, 2014

Issues #45 and #51 have both reported issues when using Makara alongside certain versions of newrelic/rpm. In an attempt to be as transparent as possible as well as potentially isolate the issue, the following table will hold the information required to check your stack against other community reports.

Newrelic has released a fix

The range of newrelic releases affected is 3.7.2 <= X < 3.11.2

Makara Version AR Version RPM Version Regression
0.0.x 3.2 3.6.9 no
0.2.x 4.x 3.6.5.130 no
0.2.x 4.x 3.6.6.147 no
0.3.x 4.x 3.6.6.147 no
BROKE --- --- ---
0.2.x 4.x 3.7.2 yes
0.2.x 4.x 3.9.4.245 yes
0.3.x 4.x 3.8.1.221 yes
0.3.x 4.x 3.9.5.251 yes
0.3.x 4.x 3.9.6.257 yes
0.3.x 4.x 3.9.7.266 yes
FIXED --- --- ---
0.x.x x.x 3.11.2 no
@mnelson mnelson self-assigned this Oct 6, 2014
@mcansky
Copy link
Contributor

mcansky commented Oct 6, 2014

👍 thanks @mnelson !

@matkam
Copy link

matkam commented Nov 18, 2014

Trying makara v0.3.0rc3 for the first time with AR 4.1.8 and newrelic_rpm 3.9.7.266, I can confirm performance problems

@briantobin
Copy link

Another for you

Makara 0.3.x + AR 4.x + RPM 3.9.6.257 = regression yes

@jasonrclark
Copy link

Hey there! I'm a developer on newrelic_rpm and got linked to these perf issues today. Major bummer! I'd like to help get to the bottom of it if we can.

I haven't used makara in earnest before, and plugging it into my basic Rails 4 test app didn't show me much perf difference yet. I'm sure I'm missing something in my configuration, though.

Does anyone have a relatively simple duplication case that I could use to dig in? Ideal would be a repo with a Rails app that I could toggle between the versions and see the perf differences, but any indicator where to head would be appreciated.

@mnelson
Copy link
Contributor Author

mnelson commented Mar 3, 2015

Here you go! https://github.com/taskrabbit/makara_newrelic_test

makara_newrelic_test [master] ~> be rake bm_nr
-- create_table("users", {:force=>true})
   -> 0.0580s
-- initialize_schema_migrations_table()
   -> 0.0065s
Rehearsal ------------------------------------------
insert   8.660000   0.200000   8.860000 (  9.134924)
select   2.600000   0.060000   2.660000 (  2.768211)
-------------------------------- total: 11.520000sec

             user     system      total        real
insert   8.670000   0.150000   8.820000 (  9.086117)
select   2.520000   0.040000   2.560000 (  2.667939)
makara_newrelic_test [master] ~> be rake bm
-- create_table("users", {:force=>true})
   -> 0.0358s
-- initialize_schema_migrations_table()
   -> 0.0014s
Rehearsal ------------------------------------------
insert   0.260000   0.030000   0.290000 (  0.418320)
select   0.080000   0.010000   0.090000 (  0.157242)
--------------------------------- total: 0.380000sec

             user     system      total        real
insert   0.280000   0.020000   0.300000 (  0.419354)
select   0.090000   0.010000   0.100000 (  0.167273)

Let me know if there's anything I can do to help.

@mnelson
Copy link
Contributor Author

mnelson commented Mar 3, 2015

#71 will likely be of interest as well.

@mnelson
Copy link
Contributor Author

mnelson commented Mar 3, 2015

Based on the rubygem releases I believe the issue will be found between 3.6.8.168 - 3.6.9.171. 3.6.8.168 runs fine, 3.6.9 takes 40x longer. newrelic/newrelic-ruby-agent@3.6.8.168...3.6.9.171

@mnelson
Copy link
Contributor Author

mnelson commented Mar 3, 2015

#75

I'm seeing an improvement in this branch - but it's still 4x slower

makara_newrelic_test [master] ~> be rake bm_nr
-- create_table("users", {:force=>true})
   -> 0.0723s
-- initialize_schema_migrations_table()
   -> 0.0040s
Rehearsal ------------------------------------------
insert   1.130000   0.080000   1.210000 (  1.449123)
select   0.490000   0.030000   0.520000 (  0.615719)
--------------------------------- total: 1.730000sec

             user     system      total        real
insert   1.180000   0.050000   1.230000 (  1.425478)
select   0.370000   0.010000   0.380000 (  0.471990)
makara_newrelic_test [master] ~> be rake bm
-- create_table("users", {:force=>true})
   -> 0.0439s
-- initialize_schema_migrations_table()
   -> 0.0017s
Rehearsal ------------------------------------------
insert   0.290000   0.030000   0.320000 (  0.485325)
select   0.080000   0.010000   0.090000 (  0.165335)
--------------------------------- total: 0.410000sec

             user     system      total        real
insert   0.270000   0.020000   0.290000 (  0.419044)
select   0.090000   0.010000   0.100000 (  0.170199)

@jasonrclark
Copy link

Thanks @mnelson, that's going to be super helpful in tracking this down! I probably won't be able to dig deep on this until next week, but having those dups will be huge in getting to the bottom of this. Thanks again!

@jasonrclark
Copy link

As expected, that repro case was awesome ✨. Thanks again @mnelson for putting it together!

The slowdown is happening at this point in newrelic_rpm: https://github.com/newrelic/rpm/blob/5560d8ecff0dcece53a80edbc4535cc60fc188d8/lib/new_relic/agent/instrumentation/active_record_subscriber.rb#L87-L93. On Rails 4.x+, New Relic uses ActiveSupport::Notifications, but the payload only gives us the object_id of the connection. Because we need more information off the connection (database adapter type for instance), we search through the connection list and that's where the bottleneck is.

Here's a slightly nicer version of the same code from the Ruby agent that you can put in a task or script to run:

n = 1000
Benchmark.bm do |x|
  x.report do
    n.times do
      connections = ::ActiveRecord::Base.connection_handler.connection_pool_list.map do |handler|
        handler.connections
      end.flatten

      connection = connections.detect { |cnxn| cnxn.object_id == 1234 }
    end
  end
end

When I run this in a Rails project with the mysql adapter, I get the following timing:

       user     system      total        real
   0.030000   0.000000   0.030000 (  0.033124)

Running that with makara instead (doesn't matter if New Relic's in the picture for this), yields this:

       user     system      total        real
   1.600000   0.030000   1.630000 (  1.625922)

I haven't dug into makara to see about why this path is slow. Stackprof gives some slightly confusing results, but points to Delegator and SimpleDelegator at lot.

FWIW, we don't really like this connection lookup, and are continuing to look for a solution from the Rails end to avoid it. But even if we get an update it'll only be available in some as-yet-unavailable Rails version, so this code is likely to live on even then as a fallback on Rails 4.x.

Any thoughts where to go from here @mnelson?

@mnelson
Copy link
Contributor Author

mnelson commented Mar 12, 2015

@jasonrclark Is there an issue with using ObjectSpace._id2ref(object_id) rather than detecting the object by iterating the connections? I agree that there's a problem somewhere but from my quick attempt an id2ref lookup at least avoids it.

image

Oh, just remembered I use id2ref for the makara logging as well: https://github.com/taskrabbit/makara/blob/master/lib/makara/logging/subscriber.rb#L25

@mnelson
Copy link
Contributor Author

mnelson commented Mar 12, 2015

Btw, the repro is updated with some id2ref rake tasks https://github.com/taskrabbit/makara_newrelic_test

@jasonrclark
Copy link

@mnelson There actually are problems with _id2ref--we had it that way for our original implementation of this code and had to move to the more complex scheme we've got today.

The two issues were that 1) _id2ref performs really badly on Rubinius and 2) it isn't even supported on JRuby without passing a special startup flag (for similar perf reasons).

Given those constraints, we can't go back to using _id2ref. With your knowledge of the project, does it seem possible to optimize the connection code? If so, that seems like the best option to me.

Failing that, I've got an idea or two about how we could let you override the connection retrieval. It's a little awkward because of load ordering (I expect makara will typically be required before newrelic_rpm), but I know how we could do it.

@mnelson
Copy link
Contributor Author

mnelson commented Mar 12, 2015

With a little more digging I've determine that the bottleneck occurs due to the flatten invocation. My guess is that there's something weird going on with object comparison or array determination due to the delegate. Maybe even down at the C level rb_check_array_type. I'll keep poking around.

@mnelson
Copy link
Contributor Author

mnelson commented Mar 12, 2015

FWIW, this is a potential fix for the NR codebase - even though I understand the real problem is in Makara.

        connection = ActiveRecord::Base.connection_handler.connection_pool_list.detect do |l| 
          l.connections.detect do |c| 
            c.object_id == 4
          end
        end

@jasonrclark
Copy link

That's a great idea @mnelson. Tested it out, and with a little tweaking (can't use the outer detect since that returns the connection pool, not the inner connection), I was able to use this instead.

It also neatly avoids a couple extra object allocations, so in addition to helping makara out, it's a small win for every ActiveRecord 4.x client using newrelic_rpm 🎉

This isn't likely to make it in the next release of the agent which is baking internally already for release, but I'll ping here when we have a branch of the agent to test against directly.

jasonrclark added a commit to newrelic/newrelic-ruby-agent that referenced this issue Mar 18, 2015
Based on instacart/makara#59, there were perf
issues in looking up the connection from the pool when on ActiveRecord
4.x using the makara adapter.

While this was a problem with makara itself, @mnelson suggested
alternate lookup code which works with makara and actually allocates
less, so we're updating to use that.
@danicuki
Copy link

Hi guys. I just got into the same issue. What is the workaround while it is not fixed?

@mnelson
Copy link
Contributor Author

mnelson commented Mar 23, 2015

With reference to the table above, you can use one of the makara or rpm versions which does not have the performance issue. I have a branch which somewhat improves the situation but it does not fix it completely. You can absolutely give that branch a try but, as I said, it won't completely fix the issue. The underlying perf issue seems to be in ruby's Delegator class and I'm still working on finding a solution which at the very least avoids the problem.

@danicuki
Copy link

I have tested makara 0.3.x (latest version) with rpm 3.6.6.147 and AR 4.x and it is working fine!

@mnelson
Copy link
Contributor Author

mnelson commented Mar 23, 2015

Thanks @danicuki, added to the table.

@benweint
Copy link

Hey y'all - we've just released version 3.11.2 of the newrelic_rpm gem, which should address this issue. Thanks again for your help in isolating and fixing it!

@mnelson mnelson closed this as completed Apr 26, 2015
@nikhilvs
Copy link

Makara Version ---> 0.3.3
AR Version ---> 3.2.14
RPM Version ---> 3.9.7.266
Is this configuration tested ?

@jasonrclark
Copy link

@nikhilvs Any newrelic_rpm < 3.11.2 and >= 3.7.2 will have the regression. I'd recommend upgrading newrelic_rpm to avoid the problem.

@mnelson, could we mark down when the fix came through in the table to avoid folks having to follow the whole comment thread to find out what happened?

@mnelson
Copy link
Contributor Author

mnelson commented Aug 12, 2015

@jasonrclark done. Thanks.

@instacart instacart locked and limited conversation to collaborators Aug 12, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

8 participants