Optimizations for FiberRuntime runloop #8800

kyri-petrou · 2024-04-28T01:04:50Z

This PR contains a few optimizations and micro-optimizations for the FiberRuntime runloop that I've noticed while previously working on #8745 but wanted to optimize them separately. Let's dig into the different things that have been optimized.

1. Checking for messages while running

This PR introduces a wrapper over ConcurrentLinkedQueue which has a weakly consistent unsafeIsEmpty method for checking if the inbox contains messages. Since we check for new messages before evaluating every single effect in the runloop, the overhead of calling poll() is to check whether the queue is empty is non-negligible.

Since adding messages from outside the FiberRuntime is extremely rare (in most cases it's for interruption), I think we can get away with this weakly consistent check in the runLoop itself. Note that we should never rely on the unsafeIsEmpty method at any other point.

2. Stack optimizations

The main optimization here is that we avoid setting entries to null whenever we "pop" an entry from the stack when we are at the "shallow" part of the stack (idx < 128). The main reasons for this is that we assume that entries that are in the shallow part of the stack are more likely that they'll be replaced automatically as the pointer moves up and down, so we don't need to manually GC them.

The other (micro)optimization regarding the stack is having it initialized when we first start evaluating the effect and avoid repeatedly checking whether it's null during the runloop. Since the _stack will be initialized on any kind of effect other than a Sync or Exit, the only drawback of this is a very small overhead when we fork things like ZIO.unit. However, since realistically all effects that are forked will have at least 1 effect that will need to initalize the stack, I think it's better not to initialize it dynamically.

3. Updating `_lastTrace`

Currently, we update _lastTrace whenever the current trace is not null or empty. However, since ZIO's methods require an implicit trace which is propagated to all methods that are called, it's very common for _lastTrace to be updated with the same value multiple times. Since reading the variable is much cheaper than writing it, we first check whether the current trace is different than the old one.

I also added some comments in the PR below with some questions / remarks

Benchmarking results

I only run the NarrowFlatMap and BroadFlatMap benchmarks using 1 thread, let me know if you think we need to run other benchmarks as well

TLDR:

~25% increase in throughput for "narrow" flatmaps (i.e., the stack doesn't need to resize)
~10% increase in throughput for "broad" flatmaps

series/2.x:

| Benchmark                                | size | Mode  | Cnt |     Score   |   Error  | Units |
|------------------------------------------|------|-------|-----|-------------|----------|-------|
| BroadFlatMapBenchmark.zioBroadFlatMap    | 20   | thrpt | 5   | 2838.735    | 19.254   | ops/s |
| NarrowFlatMapBenchmark.zioNarrowFlatMap  | 1000 | thrpt | 5   | 66646.224   | 829.889  | ops/s |

PR:

| Benchmark                               | size | Mode  | Cnt |     Score   |   Error  | Units |
|-----------------------------------------|------|-------|-----|-------------|----------|-------|
| NarrowFlatMapBenchmark.zioBroadFlatMap  | 20   | thrpt | 5   | 3154.488    | 53.074   | ops/s |
| NarrowFlatMapBenchmark.zioNarrowFlatMap | 1000 | thrpt | 5   | 84474.125   | 866.596  | ops/s |

core/shared/src/main/scala/zio/internal/FiberRuntime.scala

kyri-petrou · 2024-04-28T01:14:43Z

core/shared/src/main/scala/zio/internal/FiberRuntime.scala

+
+  private final class Inbox {
+    private[this] val queue    = new java.util.concurrent.ConcurrentLinkedQueue[FiberMessage]()
+    private[this] var _isEmpty = true


Should this be made volatile? The only scenario that kind of worries me is the following:

Thread 1 (runLoop thread) empties the queue (this implies that a message was added right in the previous iteration as well)

Thread 1: calls queue.isEmpty and sets _isEmpty to true

Thread 2 (external) adds message to queue immediately after

Thread 2: sets _isEmpty = false but somehow (2) overrides it with _isEmpty = true

The chances of this exact sequence of events happening are astronomically small to begin with, but is it something we need to cater for? 🤔

This can happen and therefore will happen (Murphy's Law of Concurrency), and making it volatile won't help, you'd have to use an atomic integer to track emptiness if you really wanted to fix it (which would probably defeat the optimization).

I am torn on whether to deal with this now, or bite the bullet and do another ticket (that I have yet to write) on creating a highly optimized concurrent mailbox just for fiber runloop.

@jdegoes I see your point! I'll revert this change in the PR since it's now tracked by #8807 and give it a go at tackling it separately

kyri-petrou · 2024-04-28T01:16:34Z

core/shared/src/main/scala/zio/internal/FiberRuntime.scala

    var message = inbox.poll()

+    // Unfortunately we can't avoid the virtual call to `trace` here
+    if (message ne null) updateLastTrace(cur.trace)


Seems that this was missing (must have missed it when I worked on #8671) and was causing a test to be flaky. This shouldn't add any performance overhead since we're very rarely processing messages from the inbox

kyri-petrou · 2024-04-28T01:20:42Z

@jdegoes Would you be able to review this PR? You might be able to spot some flaws in the optimizations that I might have missed

kyri-petrou · 2024-04-28T01:28:37Z

core/shared/src/main/scala/zio/internal/FiberRuntime.scala

@@ -836,11 +844,6 @@ final class FiberRuntime[E, A](fiberId: FiberId.Runtime, fiberRefs0: FiberRefs,
    startStackIndex: Int,
    currentDepth: Int
  ): Exit[Any, Any] = {
-    assert(running.get)


I removed this since we're already checking that we're running prior to calling this method, that way we avoid calling it repeatedly.

Or should it be added back as a safeguard in case we introduce a bug that sets the flag to false while the fiber is still running?

I think it only has overhead if assertions are enabled for the JVM (albeit that might be all the time). It's mainly designed for bug detection.

I'll add it back 👍

By the way I just had a look at assert; it seems that assertion generation is controlled during compile-time, not at the JVM level. From assert scaladoc:

A set of assert functions are provided for use as a way to document
and dynamically check invariants in code. Invocations of assert can be elided
at compile time by providing the command line option -Xdisable-assertions,
which raises -Xelide-below above elidable.ASSERTION, to the scalac command.

We should probably use the -Xdisable-assertions compiler flag when we generate the published artifacts, but that's probably better done in a separate PR.

Great idea!

kyri-petrou · 2024-04-28T07:18:35Z

core/shared/src/main/scala/zio/internal/FiberRuntime.scala

@@ -771,9 +764,24 @@ final class FiberRuntime[E, A](fiberId: FiberId.Runtime, fiberRefs0: FiberRefs,
  }

  @inline
-  private def popStackFrame(nextStackIndex: Int): Unit = {
-    _stack(nextStackIndex) = null // GC


The more I think about this, the more I realise that it's a pretty dangerous thing.

What if instead we didn't GC when nextStackIndex is below X (perhaps 300 to coincide with trampolining)?

I decided to go with an auto-gc threshold (of 128). Updated the PR description to match the new approach

Even though I love the performance improvement, I think people will complain if we are holding onto (unnecessary) memory for arbitrary long periods of time.

One possibility is to clear out the entries when the run loop begins, basically by starting at stack index, and nulling until the first null.

This opens the door for making a null array, e.g. val NullData = Array.fill[AnyRef](...)(null) and then using the faster arraycopy to null out the extra entries.

However, we'd still have the problem of holding onto memory for an indeterminate amount of time.

How much does this single change contribute to the performance improvements?

One possibility is to clear out the entries when the run loop begins, basically by starting at stack index, and nulling until the first null.

If I'm understanding your recommendation correctly, I believe that this is similar to what's currently implemented, but instead clearing out the entries when the runloop starts we do it when it exits (this also means on every yield / async operation or termination).

Effectively we're holding on to objects unnecessarily only while evaluating synchronous effects, which in almost all cases should be a very small period of time, and only for those objects above the _stackIndex until the first null entry. Once the runloop finishes, the amount of memory we're holding on to will be the same as previously.

This opens the door for making a null array, e.g. val NullData = Array.fillAnyRef(null) and then using the faster arraycopy to null out the extra entries.

It's currently done iteratively but I like this recommendation better 👍

How much does this single change contribute to the performance improvements?

Between 5-10% of increase in throughput depending on the benchmark.

jdegoes · 2024-04-28T21:39:56Z

@kyri-petrou Will do a detailed review tomorrow!

core/shared/src/main/scala/zio/internal/FiberRuntime.scala

jdegoes

Excellent work!

Optimizations for FiberRuntime runloop

8a4cfc3

kyri-petrou commented Apr 28, 2024

View reviewed changes

core/shared/src/main/scala/zio/internal/FiberRuntime.scala Show resolved Hide resolved

kyri-petrou commented Apr 28, 2024

View reviewed changes

kyri-petrou marked this pull request as ready for review April 28, 2024 01:18

kyri-petrou commented Apr 28, 2024

View reviewed changes

kyri-petrou requested a review from jdegoes April 28, 2024 01:31

kyri-petrou commented Apr 28, 2024

View reviewed changes

Use an auto-gc threshold for the FiberRuntime stack

02f6fa3

jdegoes reviewed Apr 29, 2024

View reviewed changes

core/shared/src/main/scala/zio/internal/FiberRuntime.scala Show resolved Hide resolved

kyri-petrou added 2 commits April 30, 2024 09:20

PR comments

c52f9af

Revert inbox changes

184bc68

kyri-petrou force-pushed the fiber-runtime-optimizations branch from 4b80b81 to 184bc68 Compare May 1, 2024 09:24

jdegoes approved these changes May 1, 2024

View reviewed changes

jdegoes merged commit 9eb1270 into zio:series/2.x May 1, 2024
21 checks passed

kyri-petrou deleted the fiber-runtime-optimizations branch May 1, 2024 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations for FiberRuntime runloop #8800

Optimizations for FiberRuntime runloop #8800

kyri-petrou commented Apr 28, 2024 •

edited

kyri-petrou Apr 28, 2024 •

edited

jdegoes Apr 30, 2024

kyri-petrou Apr 30, 2024

kyri-petrou Apr 28, 2024 •

edited

jdegoes Apr 29, 2024

kyri-petrou commented Apr 28, 2024

kyri-petrou Apr 28, 2024

jdegoes Apr 29, 2024

kyri-petrou Apr 29, 2024

kyri-petrou Apr 29, 2024

jdegoes May 1, 2024

kyri-petrou Apr 28, 2024

kyri-petrou Apr 28, 2024

jdegoes Apr 29, 2024

kyri-petrou Apr 29, 2024

jdegoes commented Apr 28, 2024

jdegoes left a comment

Optimizations for FiberRuntime runloop #8800

Optimizations for FiberRuntime runloop #8800

Conversation

kyri-petrou commented Apr 28, 2024 • edited

1. Checking for messages while running

2. Stack optimizations

3. Updating _lastTrace

Benchmarking results

kyri-petrou Apr 28, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyri-petrou Apr 28, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyri-petrou commented Apr 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdegoes commented Apr 28, 2024

jdegoes left a comment

Choose a reason for hiding this comment

kyri-petrou commented Apr 28, 2024 •

edited

3. Updating `_lastTrace`

kyri-petrou Apr 28, 2024 •

edited

kyri-petrou Apr 28, 2024 •

edited