Parallel map2 optimization #3428

durban · 2023-02-17T17:39:53Z

Context:

The "old" parallel map2 used both; #2159 optimized it to avoid tuples, but that implementation was incorrect; #2239 fixed it by starting 2 extra fibers.

What is this:

Wrote some extra tests to detect possible changes in behavior.
In 9e9d015 restored the "old" implementation with both (as allocating a few tuples might be cheaper than starting 2 extra fibers).
In 2c4bc68 created a new implementation, which avoids tuples and starting extra fibers too. Its unclear, if this actually helps (benchmark results pending).

durban · 2023-02-20T02:15:12Z

Benchmark results:

4995c298c (~series/3.4.x):

[info] Benchmark                      (cpuTokens)  (size)   Mode  Cnt    Score   Error  Units
[info] ParallelBenchmark.parTraverse        10000    1000  thrpt   25  377.757 ± 4.005  ops/s


9e9d01551 (old `map2` implementation with `both`):

[info] Benchmark                      (cpuTokens)  (size)   Mode  Cnt    Score   Error  Units
[info] ParallelBenchmark.parTraverse        10000    1000  thrpt   25  422.737 ± 1.574  ops/s


5ffdb35bb (new `map2` implementation without tuples):

[info] Benchmark                      (cpuTokens)  (size)   Mode  Cnt    Score   Error  Units
[info] ParallelBenchmark.parTraverse        10000    1000  thrpt   25  410.672 ± 2.215  ops/s

So yeah, the old (and simplest) implementation (with both) is the fastest. My implementation is faster than the current one, but still slower than the old one.

I propose going back to the old implementation.

djspiewak · 2023-02-20T02:29:30Z

First off, this is a clever and subtle bit of fiber ordering. :-) The supervisor fibers weren't meant to remove the tuple, but rather meant to remove the racePair without requiring Deferred. I didn't think of ordering the nesting in this fashion to eliminate the asymmetry in error cancelation. This is much nicer.

Regarding the tuple-vs-not-tuple, we might want to hold off on that until we adjust traverseViaChain (in Cats). I really would be pretty surprised if tupling were faster, and there might be some confounding factors there.

Finally, parTraverse is a better benchmark than what we had at the time iirc. Overall, I think this is a win. Thanks for looking at this!

durban · 2023-02-20T16:13:27Z

Yeah, it was a nice puzzle to figure out :-)

Avoiding racePair: I assumed the reason for avoiding racePair was to avoid the tuple allocation(s)?

Optimizing traverseViaChain in Cats: sure, sounds good. Should this PR be kept open until then, or should it be merged now?

djspiewak · 2023-02-20T16:50:09Z

Avoiding racePair: I assumed the reason for avoiding racePair was to avoid the tuple allocation(s)?

Also the Either. Also the complexity. Etc etc.

Optimizing traverseViaChain in Cats: sure, sounds good. Should this PR be kept open until then, or should it be merged now?

I'm going to just knock off the traverseViaChain optimization as soon as I can nail the bug I'm working on, so let's sit on this for a few days and see where we net out. This PR may also make it harder for me to isolate the specific bug in question. If I don't get to traverseViaChain by this time next week, let's go ahead with this PR.

armanbilge · 2023-02-20T17:01:48Z

If I don't get to traverseViaChain by this time next week, let's go ahead with this PR.

Since you are so keen on it, I can look at it too :) is the idea basically to revive this PR typelevel/cats#3993?

djspiewak · 2023-02-20T17:36:16Z

Since you are so keen on it, I can look at it too :) is the idea basically to revive this PR

No, simpler than that. Just dynamically typecase on the Applicative in the two traverseViaChain methods. If it extends Defer (or StackSafeMonad, both are an option here, but maybe the former is more correct), then we should skip the Evals and use map2 instead of map2Eval. We also need to double check that this runtime type is actually present on the Parallel applicative presented by Cats Effect.

Should be accompanied by some benchmarks to demonstrate that it's worth it. We can benchmark on Eval in Cats, though obviously IO is the big winner here.

djspiewak · 2023-03-07T22:19:50Z

I think we shouldn't hold this PR for the traverse changes. :-) I'm pretty convinced they'll be very complementary.

durban · 2023-03-08T00:28:53Z

Okay. Should this be merged as is, or should I add a commit to go back to the old implementation (with both), which currently seems faster (see the benchmark results above)?

djspiewak · 2023-04-18T03:01:51Z

Sorry forgot to reply. Let's just go back to the old implementation. I suspect we can still do better, but for now, let's do that.

durban added 10 commits February 17, 2023 18:22

Add parallel map2 test

f3881a9

The same test for PureConc

d7ee73f

Kick CI

e86aef7

More test

4393611

More tests

5b2a804

Kick CI

4995c29

Go back to the old map2 implementation

9e9d015

scalafmt

cb80099

New map2 implementation

2c4bc68

Temporarily disable 2 PureConc tests (probably issue typelevel#3430)

5ffdb35

durban changed the title ~~Parallel map2 test~~ Parallel map2 optimization Feb 20, 2023

durban added 2 commits April 19, 2023 20:40

Go back to the old map2 implementation again

35c8c43

Merge from series/3.4.x

43b8459

durban marked this pull request as ready for review April 19, 2023 20:08

djspiewak approved these changes Apr 20, 2023

View reviewed changes

djspiewak merged commit 300469b into typelevel:series/3.4.x Apr 20, 2023
26 checks passed

durban mentioned this pull request Apr 20, 2023

Follow up #3428 when cats optimizes traverse #3545

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel map2 optimization #3428

Parallel map2 optimization #3428

durban commented Feb 17, 2023 •

edited

Loading

durban commented Feb 20, 2023

djspiewak commented Feb 20, 2023

durban commented Feb 20, 2023

djspiewak commented Feb 20, 2023

armanbilge commented Feb 20, 2023

djspiewak commented Feb 20, 2023 •

edited

Loading

djspiewak commented Mar 7, 2023

durban commented Mar 8, 2023

djspiewak commented Apr 18, 2023

Parallel map2 optimization #3428

Parallel map2 optimization #3428

Conversation

durban commented Feb 17, 2023 • edited Loading

Context:

What is this:

durban commented Feb 20, 2023

Benchmark results:

djspiewak commented Feb 20, 2023

durban commented Feb 20, 2023

djspiewak commented Feb 20, 2023

armanbilge commented Feb 20, 2023

djspiewak commented Feb 20, 2023 • edited Loading

djspiewak commented Mar 7, 2023

durban commented Mar 8, 2023

djspiewak commented Apr 18, 2023

durban commented Feb 17, 2023 •

edited

Loading

djspiewak commented Feb 20, 2023 •

edited

Loading