mkLogEmitter closures are unnecessary for disabled log levels and have perf impact #401

Closed
yunong opened this Issue May 4, 2016 · 21 comments

Projects

None yet

6 participants

@yunong
Contributor
yunong commented May 4, 2016

I was running some performance tests on my Node framework which uses bunyan. The tests configured the logger level to info, while the loggers themselves emitted logs at debug. That is, the test run never emits log records.

My flamegraphs show that I'm spending about 90% of all CPU time on this line: https://github.com/trentm/node-bunyan/blob/cbfaa9a7bd86c658dbb8333c894191d23b65be33/lib/bunyan.js#L954

Even though no log entries are emitted, a closer look at the CPU profiles show that we're creating a new closure everytime we invoke log[level], regardless of the configured level of the logger. This new closure is very expensive on CPU.

We should probably first check to see if we need to emit a record before creating the new closure. i.e. do nothing if the current log level exceeds the invoked log level. This would greatly speedup performance.

@jclulow
jclulow commented May 4, 2016

If we shift things around, please make sure that bunyan still emits DTrace probes for all log levels (even those above the configured level).

@trentm
Owner
trentm commented May 9, 2016

@jclulow Ignorant Q: is there a way for a user of dtrace-provider to know if a dtrace process is attempting to listen for dtrace probes being fired? I.e. is it possible to avoid work for firing a dtrace probe?

@yunong
Contributor
yunong commented May 9, 2016

Here's a flamegraph of a performance test where we're not emitting any log records. You can see that Bunyan is taking up most of the time on CPU. I think the problem here is always creating the closure and log records every time.

@trentm
Owner
trentm commented May 9, 2016

^^^ (from discussion) that flamegraph is an older one about creating child Logger instances.

@jclulow
jclulow commented May 9, 2016

@trentm USDT probes allow for is-enabled checks. It looks like dtrace-provider might use this internally, but I don't think it exposes a probe.isEnabled() to the consumer. Could be a worthwhile RFE, if the call to C++ (for isEnabled()) is not more expensive than creating the closure that never gets called for probe.fire().

@yunong
Contributor
yunong commented May 10, 2016

Here's an example of a flamegraph where most of the time is being spent in mklogrecord, i.e. bunyan:951

@melloc melloc added a commit to melloc/node-bunyan that referenced this issue Oct 7, 2016
@melloc melloc trentm/node-bunyan#401 MkLogEmitter Closure Significantly Hampers Per…
…formance
3d76a94
@melloc
Collaborator
melloc commented Oct 7, 2016

I've put together a change that lifts out the mkRecord closure and the closure for building the USDT probe arguments. It depends on chrisa/node-dtrace-provider#82, so I've uploaded it as a tarball for now, and referenced it from the package.json.

I used the following program to generate flamegraphs showing the effect of the change:

var Logger = require('./lib/bunyan');


var log = new Logger({
    name: 'svc',
    level: 'info'
});

for (var i = 0; true; i++) {
    log.trace({ count: i }, 'hello');
}

And ran it as node foo.js > /dev/null. Here's a flamegraph before the change, and here's one after the change. Additionally, I've added a new benchmark, tools/timenop.js, that @trentm wrote.

~/src/node-bunyan-original : master % node --version
v0.10.46
~/src/node-bunyan-original : master % pfexec nice -20 node tools/timenop.js
Time log.trace() when log level is "info":
 - log.trace:     0.0005972ms per iteration
~/src/node-bunyan : master % pfexec nice -20 node tools/timenop.js
Time log.trace() when log level is "info":
 - log.trace:     0.0003798ms per iteration

The changes were a bit more drastic on my Macbook:

~/src/node-bunyan-original : master % node --version
v0.10.40
~/src/node-bunyan-original : master % sudo nice -20 node tools/timenop.js
Time log.trace() when log level is "info":
 - log.trace:     0.0003408ms per iteration
~/src/node-bunyan : no-closures % sudo nice -20 node tools/timenop.js
Time log.trace() when log level is "info":
 - log.trace:     0.0001625ms per iteration

@trentm ran the benchmark before and after with node v4:

$ node timeclosures.js
Time with/without closures in mkLogEmitter
 - log.trace:     0.0003495ms per iteration
$ node timeclosures.js
Time with/without closures in mkLogEmitter
 - log.trace:     0.0001012ms per iteration

Placing a process.abort() at the end of the benchmark, and taking a core, I also verified that there aren't tons of closures hanging around after the change.

~/src/node-bunyan-original : master (134) % pfexec mdb /var/cores/core.node.69761
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::load /home/cody/src/mdb_v8/build/ia32/mdb_v8.so
mdb_v8 version: 1.1.4 (dev)
V8 version: 3.14.5.9
Autoconfigured V8 support from target
C++ symbol demangling enabled
> ::jsfunctions ! awk '$2 > 10 { print }' | sort -k2 -n
bed16399       11 <anonymous> (as <anon>)                  path.js position 10618
bed15fb5       15 require                                  module.js position 10658
96d37861       22 <anonymous> (as require.resolve)         module.js position 10739
96d3789d       22 <anonymous> (as require.registerEx[...]  module.js position 11219
bed0b9dd       22 <anonymous> (as Object.defineProperty[.  module.js position 10865
bed1f70d      174 <anonymous> (as <anon>)                  native messages.js position 21451
bed0fef5      175 <anonymous> (as DefineOneShotAccessor.h  native messages.js position 15982
bed0ff19      175 <anonymous> (as DefineOneShotAccessor.i  native messages.js position 16054
866080d5    18800 mkRecord                                 /home/cody/src/node-bunyan-original/lib/bunyan.js position 29342
86608115    18800 <anonymous> (as <anon>)                  /home/cody/src/node-bunyan-original/lib/bunyan.js position 32933
~/src/node-bunyan : master (134) % pfexec mdb /var/cores/core.node.68781
Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
> ::load /home/cody/src/mdb_v8/build/ia32/mdb_v8.so
mdb_v8 version: 1.1.4 (dev)
V8 version: 3.14.5.9
Autoconfigured V8 support from target
C++ symbol demangling enabled
> ::jsfunctions ! awk '$2 > 10 { print }' | sort -k2 -n
97815e81       11 <anonymous> (as <anon>)                  path.js position 10618
9781658d       15 require                                  module.js position 10658
9780be99       22 <anonymous> (as Object.defineProperty[.  module.js position 10865
97826fb5       22 <anonymous> (as require.resolve)         module.js position 10739
97826fe5       22 <anonymous> (as require.registerEx[...]  module.js position 11219
9781fb69      174 <anonymous> (as <anon>)                  native messages.js position 21451
978103b1      175 <anonymous> (as DefineOneShotAccessor.h  native messages.js position 15982
978103d5      175 <anonymous> (as DefineOneShotAccessor.i  native messages.js position 16054
@yunong
Contributor
yunong commented Oct 10, 2016

@melloc this is great! @trentm any plans to publish this to npm? We would start using this immediately.

@yunong
Contributor
yunong commented Oct 10, 2016
@trentm
Owner
trentm commented Oct 12, 2016

@yunong Nah, I'm going to setup a for-profit bunyan corp for faster bunyan.

Kidding. I'm waiting on a dtrace-provider release on which this deps. Dave is going to review Cody's change there today, hopefully.

@davepacheco
Contributor

It took me a little while to grok what was going on here, so I've summarized it below. Sorry if this is redundant, but let me know if I've misunderstood.

As @jclulow mentioned, DTrace provides a way for probe implementations to check whether the probe is currently enabled so that they can avoid the expense associated with generating the probe's arguments if the probe isn't enabled. It looks like node-dtrace-provider already uses this to ensure that it only invokes the fire() callback when the probe is enabled. It also looks like the existing bunyan code already uses this correctly. We should only be calling mkRecord if the probe is enabled or if the log level says we should emit this message. In Yunong's case, we shouldn't be calling it at all. That's consistent with the flame graph from May 10. mkRecord (mklogrecord? the function defined at bunyan.js line 954) doesn't appear at all there.

@melloc's change focuses on the closure creation and eliminates two closures created per call to the log emitter function. The test program suggests that this saves about 180-250 microseconds per call in that case, which looks to be about half the time of the whole call. (Thanks for including the flame graphs and test program results on several systems!) That's a nice win in the microbenchmark, but I would expect more realistic programs to spend much less time emitting log messages. In Yunong's flame graph, the function defined at bunyan.js line 951 shows up in 715 out of 2,997 samples, or 23.9% of all of them. Cody's improvements could reduce CPU utilization by at most 12%. Is that what we were expecting?

Cody, do you know how much of the improvement is due to removing the mkRecord closure vs. the fire() callback closure? I think eliminating the mkRecord closure is a no-brainer -- it has essentially no downside. I'm less sure about the addition to the dtrace-provider API. It would be interesting to know how much of the win is a result of eliminating which closure, and it would also be interesting to try using bind() instead. That should have the same effect without having to modify dtrace-provider, though bind() isn't necessarily cheap either.

@trentm
Owner
trentm commented Oct 13, 2016

@davepacheco Using 'timeclosures.js'
(https://gist.github.com/trentm/c3dae02ef249a3cb3ec34a845d34244a) to test the
time of log.trace when the log level is INFO:

  1. without cody's patch:

    $ node timeclosures.js
     - log.trace:     0.0003577ms per iteration
    
  2. with the half of cody's patch that removes just the mkRecord closure:

    - log.trace:     0.0002677ms per iteration
    
  3. with all of cody's patch (also removing the dtrace fire() closure):

    - log.trace:     0.0001012ms per iteration
    

So, also removing the dtrace fire() closure seems quite significant.

I haven't tried with bind() yet.

@yunong
Contributor
yunong commented Oct 17, 2016
@trentm
Owner
trentm commented Oct 17, 2016

@davepacheco (/cc @melloc) Whoa bind() is slow. Like 10x slower.

So adding a number 4. to the list in the previous comment, using this patch:

diff --git a/lib/bunyan.js b/lib/bunyan.js
index 15b83b4..b0683f5 100644
--- a/lib/bunyan.js
+++ b/lib/bunyan.js
@@ -1062,11 +1062,16 @@ function mkLogEmitter(minLevel) {
         }

         if (probes) {
-            probes[minLevel].fire(mkProbeArgs, str, log, minLevel, msgArgs);
+            probes[minLevel].fire(fireTheProbe.bind(
+                undefined, str, log, minLevel, msgArgs));
         }
     }
 }

+function fireTheProbe(str, log, minLevel, msgArgs) {
+    return [ str ||
+        log._emit(mkRecord(log, minLevel, msgArgs), true) ];
+}

 /**
  * The functions below log a record at a specific level.

yields this timing:

 - log.trace:     0.002561ms per iteration
@trentm
Owner
trentm commented Oct 17, 2016

So, bind() is out. Are we able to re-consider the dtrace-provider change?

@davepacheco
Contributor

@trentm We can do that if it's really that big a deal. From my current understanding, it's expected to yield about half of that 12% win. Is that what we were expecting here? (There also seemed to be some confusion about whether this was about CPU utilization or memory usage.)

@trentm
Owner
trentm commented Oct 17, 2016

@yunong It would be great if you had a chance to try with the above patches to show in a real app of yours how much of a win the second half (the part that requires the dtrace-provider API change) is for you.

@lucamaraschi

Any way we can add some benchmark tests in the repo?In this way we could have factual data to share about the improvement.

@trentm
Owner
trentm commented Oct 18, 2016

@lucamaraschi I think the timeclosure.js numbers (and I'll commit that script with this change eventually) shows that for simple Bunyan logger usage there is definitely a perf improvement here. A general benchmark could be interesting to track over time... e.g. to catch changes that unintentionally would hurt perf.

I think dave's main question is about the impact on overall performance the change requiring the dtrace-provider API change would have on some real world apps. I.e. if that change makes log.trace 20% faster, then that is good, but if that only typically means a 1% or less improvement in the app's perf (because the app doesn't spend that much of its time logging), then there is less appetite for adding to the dtrace-provider API.

@melloc melloc added a commit to melloc/node-bunyan that referenced this issue Oct 26, 2016
@melloc melloc trentm/node-bunyan#401 mkLogEmitter closures unnecessary for disabled…
… log levels
340201c
@trentm trentm changed the title from MkLogEmitter Closure Significantly Hampers Performance to mkLogEmitter closures are unnecessary for disabled log levels and have perf impact Oct 26, 2016
@melloc melloc added a commit to melloc/node-bunyan that referenced this issue Oct 26, 2016
@melloc melloc trentm/node-bunyan#401 mkLogEmitter closures unnecessary for disabled…
… log levels and have perf impact
d9655f4
@melloc melloc added a commit to melloc/node-bunyan that referenced this issue Nov 14, 2016
@melloc melloc trentm/node-bunyan#401 mkLogEmitter closures unnecessary for disabled…
… log levels and have perf impact
58d3aac
@melloc melloc added a commit to melloc/node-bunyan that referenced this issue Nov 14, 2016
@melloc melloc trentm/node-bunyan#401 mkLogEmitter closures unnecessary for disabled…
… log levels and have perf impact
3cde7bc
@melloc melloc self-assigned this Nov 14, 2016
@melloc melloc closed this Nov 14, 2016
@trentm
Owner
trentm commented Nov 14, 2016

This is published in bunyan@1.8.5 now.

@yunong
Contributor
yunong commented Nov 14, 2016

👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment