A few more chain optimizations #4170

johnynek · 2022-04-09T23:25:56Z

reorganize some matches in order I guess they are more likely to match
tighten the type on some loops so we don't have to check against Empty
~~implement foldLeft the same way as length~~ (this was not a benchmark win).
override foldMap to use the iterator (to leverage Monoid.combineAll).

johnynek · 2022-04-09T23:39:20Z

core/src/main/scala/cats/data/Chain.scala

-    while (iter.hasNext) { result = f(result, iter.next()) }
-    result
+    @annotation.tailrec
+    def loop(h: Chain.NonEmpty[A], tail: List[Chain.NonEmpty[A]], acc: B): B =


by passing the destructured list, we avoid an allocation in the Append case (which is common since Append is the way we combine two chains).

satorg · 2022-04-10T02:20:49Z

core/src/main/scala/cats/data/Chain.scala

+    // of the same code as foldLeft
+    @annotation.tailrec
+    def loop(head: Chain.NonEmpty[A], tail: List[Chain.NonEmpty[A]], acc: Long): Int =
+      if (acc < 0L) 1 // head is nonempty


If I am not missing something, then if (acc <= 0L) 1 should work as well.
Because we know that the head is not empty, so seems we can skip looking further if acc == 0 is given.

yep. Good call. Updated.

bplommer · 2022-04-10T10:41:43Z

Would there be any benefit to leaving these abstract in Chain and implementing them separately in Chain.Empty and Chain.NonEmpty?

johnynek · 2022-04-10T23:31:32Z

I ran the benchmarks (and added two). Here are the relevant numbers:

PR:

[info] Benchmark                                  Mode  Cnt          Score           Error  Units
[info] ChainBench.foldLeftLargeChain             thrpt    5         54.823 ±        14.480  ops/s
[info] ChainBench.foldLeftLargeList              thrpt    5        152.206 ±         5.792  ops/s
[info] ChainBench.foldLeftSmallChain             thrpt    5   74133841.930 ±    400335.145  ops/s
[info] ChainBench.foldLeftSmallList              thrpt    5   81573852.962 ±    281124.077  ops/s
[info] ChainBench.lengthLargeChain               thrpt    5     208482.471 ±      1696.443  ops/s
[info] ChainBench.lengthLargeList                thrpt    5        529.225 ±         0.394  ops/s
[info] ChainBench.reverseLargeChain              thrpt    5      12154.755 ±      3341.891  ops/s
[info] ChainBench.reverseLargeList               thrpt    5        131.259 ±       169.766  ops/s

main:
[info] Benchmark                                  Mode  Cnt          Score          Error  Units
[info] ChainBench.foldLeftLargeChain             thrpt    5        132.305 ±        3.237  ops/s
[info] ChainBench.foldLeftLargeList              thrpt    5        152.748 ±        3.895  ops/s
[info] ChainBench.foldLeftSmallChain             thrpt    5   76658960.380 ±   237704.833  ops/s
[info] ChainBench.foldLeftSmallList              thrpt    5   81733659.606 ±    45817.907  ops/s
[info] ChainBench.lengthLargeChain               thrpt    5      12564.835 ±    21329.022  ops/s
[info] ChainBench.lengthLargeList                thrpt    5        536.412 ±       10.714  ops/s
[info] ChainBench.reverseLargeChain              thrpt    5        175.292 ±       49.151  ops/s
[info] ChainBench.reverseLargeList               thrpt    5        174.277 ±       21.204  ops/s

so it looks like the reverse and length changes are big wins, but the foldLeft change is not an improvement. I'll revert that part.

satorg

Looks really cool, thanks!

satorg · 2022-04-11T00:16:26Z

Would there be any benefit to leaving these abstract in Chain and implementing them separately in Chain.Empty and Chain.NonEmpty?

AFAIU, implementing all the logic in an abstract class rather than spreading it among descendants may benefit from code locality i.e. it should be less likely that an instruction cache will be reloading while executing a method of the abstract class. Perhaps @johnynek may correct me if I'm wrong about that.

johnynek · 2022-04-11T00:18:13Z

I really don't know. Of course experiments are the best way to find out but even our current benchmarks might steer us wrong depending on the frequency of Empty, frequency or wrapping and structure of concatenation.

johnynek added 2 commits April 9, 2022 13:24

A few more chain optimizations

a554c77

remove some allocations

86ca23f

johnynek commented Apr 9, 2022

View reviewed changes

johnynek added 3 commits April 9, 2022 13:41

remove unused binding

70fdaf9

optimize reverse

28b048b

fix 3.0.2, optimize lengthCompare

cdc0a1c

satorg reviewed Apr 10, 2022

View reviewed changes

johnynek added 2 commits April 9, 2022 17:42

improve lengthCompare

5df2eb4

also optimize foldRight

db0c1e0

Add a few benchmarks

9fc6179

johnynek added 3 commits April 10, 2022 13:33

revert fold changes based on benchmarks

25b575e

Merge branch 'main' into oscar/20220409_chain_opts

736c949

format

7cb7fea

satorg approved these changes Apr 11, 2022

View reviewed changes

johnynek merged commit ad318c3 into main Apr 11, 2022

armanbilge deleted the oscar/20220409_chain_opts branch April 11, 2022 01:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few more chain optimizations #4170

A few more chain optimizations #4170

johnynek commented Apr 9, 2022 •

edited

johnynek Apr 9, 2022

satorg Apr 10, 2022

johnynek Apr 10, 2022

bplommer commented Apr 10, 2022

johnynek commented Apr 10, 2022

satorg left a comment

satorg commented Apr 11, 2022

johnynek commented Apr 11, 2022

A few more chain optimizations #4170

A few more chain optimizations #4170

Conversation

johnynek commented Apr 9, 2022 • edited

johnynek Apr 9, 2022

Choose a reason for hiding this comment

satorg Apr 10, 2022

Choose a reason for hiding this comment

johnynek Apr 10, 2022

Choose a reason for hiding this comment

bplommer commented Apr 10, 2022

johnynek commented Apr 10, 2022

satorg left a comment

Choose a reason for hiding this comment

satorg commented Apr 11, 2022

johnynek commented Apr 11, 2022

johnynek commented Apr 9, 2022 •

edited