New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Third optimization batch — map fusion #95

Merged
merged 7 commits into from Dec 14, 2017

Conversation

@alexandru
Member

alexandru commented Dec 9, 2017

Third PR addressing performance issues. Previous ones are #90 and #91.

Also addresses issue #92, but without the proposed laws.


In order to avoid stack-overflow errors, repeated .map calls trigger a .flatMap every 128 calls.

The rationale is thus — from my research the default stack size ranges from 320 KB on 32-bits operating systems to 1 MB on 64-bits operating systems. From my own tests a stack frame triggered by a Function1 chaining consumes aproximately 32 bytes. So 128 fused map calls will consume 32 bytes * 128 = 4 KB at a minimum (to which you add whatever else the user is doing on that stack).

In a real flatMap chain, which is what happens with IO, the performance does end up being dominated by .flatMap calls even with .map calls being mixed-in (a benchmark proving this will follow), however the improvements for .map are still good.

The important thing to watch out for in this optimization is that a degrading .flatMap isn't acceptable to make .map faster, but if we can manage a performance boost for fused .map calls without a .flatMap regression, then it's all good.


For benchmarking I've introduced 2 new benchmarks:

  1. MapCallsBenchmark measures the performance of pure map calls (without being mixed within .flatMap loops)
  2. MapStreamBenchmark measures the performance impact of a real-world use-case, showing what to expect from Monix's Iterant and possibly FS2
Benchmark 0.6-a581664 This PR
MapCallsBenchmark.batch120 5552.051 14680.810
MapCallsBenchmark.batch30 4941.152 16196.342
MapCallsBenchmark.one 6682.469 6167.908
MapStreamBenchmark.batch120 2259.460 3176.587
MapStreamBenchmark.batch30 918.054 1389.750
MapStreamBenchmark.one 1418.865 1540.653

The other benchmarks:

Benchmark 0.6-a581664 This PR
AttemptBenchmark.errorRaised 1729.587 1750.409
AttemptBenchmark.happyPath 2373.812 2316.602
HandleErrorBenchmark.errorRaised 1974.071 2001.132
HandleErrorBenchmark.happyPath 3005.539 2944.564
DeepBindBenchmark.async 396.694 401.805
DeepBindBenchmark.delay 6887.681 6601.429
DeepBindBenchmark.pure 7849.842 7697.621
ShallowBindBenchmark.async 90.336 83.502
ShallowBindBenchmark.delay 5355.337 5324.081
ShallowBindBenchmark.pure 6059.119 6239.963

Here I seem to have suffered a slight regression — not sure if this is a fluke or not, since the differences are very small, however if this is real, I need to investigate whether I can gain some extra throughput from somewhere else (win some, lose some).

@codecov-io

This comment has been minimized.

codecov-io commented Dec 9, 2017

Codecov Report

Merging #95 into master will increase coverage by 0.6%.
The diff coverage is 92.3%.

@@            Coverage Diff            @@
##           master      #95     +/-   ##
=========================================
+ Coverage    87.4%   88.01%   +0.6%     
=========================================
  Files          20       20             
  Lines         413      434     +21     
  Branches       35       35             
=========================================
+ Hits          361      382     +21     
  Misses         52       52

@alexandru alexandru changed the title from WIP: Third optimization batch — map fusion to Third optimization batch — map fusion Dec 9, 2017

@alexandru alexandru requested review from djspiewak, mpilquist and rossabaker Dec 9, 2017

@alexandru

This comment has been minimized.

Member

alexandru commented Dec 9, 2017

I might need to improve test coverage.

case Map(source, g, index) =>
// Allowed to do 32 map operations in sequence before
// triggering `flatMap` in order to avoid stack overflows
if (index != 31) Map(source, g.andThen(f), index + 1)

This comment has been minimized.

@daddykotex

daddykotex Dec 10, 2017

Quick question: out of curiosity, where does the 32 number comes from?

This comment has been minimized.

@alexandru

alexandru Dec 10, 2017

Member

Right now it's a number pulled out of thin air (well, I've seen it used somewhere else) — the default stack size varies depending on platform, AFAIK ranging from 64Kb (Windows 32 bits) to 1 MB (on Linux 64 bits).

And we need a comfortable value to not blow people's stacks.

But now that you've mentioned it, I decided to do some testing and be thorough about it.

This comment has been minimized.

@ritschwumm

ritschwumm Dec 10, 2017

sounds like it deserves a comment in the code telling the reader why this number has been chosen

This comment has been minimized.

@daddykotex

daddykotex Dec 11, 2017

Thanks for the answer

@alexandru alexandru changed the title from Third optimization batch — map fusion to WIP: Third optimization batch — map fusion Dec 10, 2017

@alexandru

This comment has been minimized.

Member

alexandru commented Dec 10, 2017

I've got some refactoring ideas and some more testing to do, so placed this PR in the WIP state again.

@pakoito pakoito referenced this pull request Dec 10, 2017

Merged

Refactor IO following Cats #505

@pakoito

This comment has been minimized.

pakoito commented Dec 10, 2017

^^^ No Shame Sundays :D Feel free to review too @alexandru

@alexandru

This comment has been minimized.

Member

alexandru commented Dec 11, 2017

I've updated the description of the PR with the methodology for picking the right maximum number of fused .map calls.

@alexandru

This comment has been minimized.

Member

alexandru commented Dec 11, 2017

Update: added full benchmarking results and current interpretation.

@rossabaker

I like it. A couple quick questions.

private[effect] final val fusionMaxStackDepth =
Option(System.getProperty("cats.effect.fusionMaxStackDepth", ""))
.filter(s => s != null && s.nonEmpty)
.flatMap(s => Try(s.toInt).toOption)

This comment has been minimized.

@rossabaker

rossabaker Dec 11, 2017

Member

We don't want to depend on a logger, but is it worth it to explain on stderr why we choked?

This comment has been minimized.

@alexandru

alexandru Dec 11, 2017

Member

I'd prefer to not do it, since it introduces extra code — but I don't care much and if it's a popular demand, then OK.

What I'm thinking is that people won't modify this parameter unless they are in big trouble and we can have two issues:

  1. given my calculations, the default value seems fine, but we might underestimate stack growth in common usage
  2. we don't control all possible virtual machines, I have no idea for example what's the default stack size on Android or other non-Oracle JVMs

So increasing it won't increase performance unless used for very narrow use-cases and if people hit the stack limit because of this default, then we probably need to lower this limit in the library, with the overriding option being made available only to empower people to fix it without having to wait for another release.

This comment has been minimized.

@rossabaker

rossabaker Dec 12, 2017

Member

Okay, I'll buy that.

@@ -93,9 +93,9 @@ sealed abstract class IO[+A] {
this match {
case ref @ RaiseError(_) => ref
case Map(source, g, index) =>
// Allowed to do 32 map operations in sequence before
// Allowed to do 128 map operations in sequence before

This comment has been minimized.

@rossabaker

rossabaker Dec 11, 2017

Member

Comment is not true in Scala.js or sometimes in the presence of the sysprop.

This comment has been minimized.

@alexandru

alexandru Dec 11, 2017

Member

Yes, I need to update the comment, thanks for pointing it out.

This comment has been minimized.

@alexandru

alexandru Dec 11, 2017

Member

I changed that comment.

@alexandru alexandru changed the title from WIP: Third optimization batch — map fusion to Third optimization batch — map fusion Dec 11, 2017

* Establishes the maximum stack depth for `IO#map` operations
* for JavaScript.
*/
private[effect] final val fusionMaxStackDepth = 32

This comment has been minimized.

@pakoito

pakoito Dec 11, 2017

Isn't the default supposed to be 128?

This comment has been minimized.

@alexandru

alexandru Dec 12, 2017

Member

That is the JavaScript version. I've updated the comment in IO.scala to not mention 128. Also this particular default is now changed to 31 since we are using inequality of the current index for the actual test as a very small performance optimization.

Option(System.getProperty("cats.effect.fusionMaxStackDepth", ""))
.filter(s => s != null && s.nonEmpty)
.flatMap(s => Try(s.toInt).toOption)
.filter(_ > 0)

This comment has been minimized.

@ChristopherDavenport

ChristopherDavenport Dec 11, 2017

Collaborator

Just confirming what this looks like at 1 which is then reduced to 0 is that every operation is flatMapped rather than Map

This comment has been minimized.

@alexandru

alexandru Dec 12, 2017

Member

Yes, that's the intention — which made me realize that when that counter gets reset we should use a Map(this, f, 0) instead of a FlatMap(this, f.andThen(pure)).

@alexandru

This comment has been minimized.

Member

alexandru commented Dec 12, 2017

The PR is ready for merging if you folks agree.

@alexandru

This comment has been minimized.

Member

alexandru commented Dec 14, 2017

Any objections on merging this PR?

@mpilquist mpilquist merged commit edaaf79 into typelevel:master Dec 14, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment