Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOLocal propagation for unsafe access #3636

Open
wants to merge 36 commits into
base: series/3.x
Choose a base branch
from

Conversation

armanbilge
Copy link
Member

Still needs quite a bit of work, but wanted to sketch the basic idea.

Goal: to expose a fiber's IOLocals as ThreadLocals within a side-effecting block, so that they can be accessed and modified in unsafe land.

Motivation: basically telemetry Java interop.

Constraints: to do this as safely as possible 😅

@armanbilge armanbilge marked this pull request as draft May 16, 2023 19:35
@@ -43,4 +43,6 @@ final class IOFiberConstants {
static final byte CedeR = 6;
static final byte AutoCedeR = 7;
static final byte DoneR = 8;

static final boolean dumpLocals = Boolean.getBoolean("cats.effect.tracing.dumpLocals");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikesheddable configuration for opting-in. So the rest of us don't have to pay the penalty 😇

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specifically "tracing", even if that's the most obvious use case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh woops, this was a very lazy copy-pasta. I copied it from the system properties we use to configure fiber tracing. We should rename it anyway, dumpLocals is not quite right I think 😅

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cats.effect.localContextPropagation similar to Monix's monix.environment.localContextPropagation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I liked that! I went with cats.effect.ioLocalPropagation.

Comment on lines 253 to 257
var locals: IOLocals = null
if (dumpLocals) {
locals = new IOLocals(localState)
IOLocals.threadLocal.set(locals)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this just for delay, but I guess blocking and interruptible would want it too.

Comment on lines 4 to 5
// TODO handle defaults and lenses. all do-able, just needs refactoring ...
final class IOLocals private[effect] (private[this] var state: IOLocalState) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I was extremely lazy with implementing this thing. But the main idea of this wrapper API is that it should only give the user access to IOLocals that they know about i.e. they should not able to clear out other locals that happen to be present.

Comment on lines 1011 to 1012
val fiber = new IOFiber[A](
Map.empty,
if (IOFiberConstants.dumpLocals) unsafe.IOLocals.getState else Map.empty,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can even go in the opposite direction for IO#unsafeRun* 😁

It's less clear if/how to do this for fibers started in a Dispatcher, since they should be inheriting locals from the fiber backing the Dispatcher.

Copy link
Member

@rossabaker rossabaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I will try to give a snapshot of this a shot over on the otel4s side.

@@ -43,4 +43,6 @@ final class IOFiberConstants {
static final byte CedeR = 6;
static final byte AutoCedeR = 7;
static final byte DoneR = 8;

static final boolean dumpLocals = Boolean.getBoolean("cats.effect.tracing.dumpLocals");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specifically "tracing", even if that's the most obvious use case?

rossabaker added a commit to typelevel/otel4s that referenced this pull request May 18, 2023
Comment on lines 19 to 20
// defined in Java since Scala doesn't let us define static fields
final class IOFiberConstants {
public final class IOFiberConstants {
Copy link
Member Author

@armanbilge armanbilge May 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not good. To avoid this we'll either have to replicate it at both the cats.effect and cats.effect.unsafe levels, or move the thread-local IOLocals accessors into cats.effect.

@kevin-lee
Copy link

I apologise for jumping in and asking this question but, can it be used as an alternative to Local from Monix?
I need it specifically for logging, much like what's explained in the post, "Better logging with Monix 3, part 1: MDC" but with cats-effect 3. This has been a blocker for my company's transition from cats-effect 2 with Monix to cats-effect 3. The issue has become increasingly critical as more and more libraries cease to support cats-effect 2, choosing to support only cats-effect 3 instead. So it would be great to see if there's any upcoming solution.

@armanbilge
Copy link
Member Author

@kevin-lee no problem! At a glance, yes, this does look like an alternative/replacement to that Monix feature. Perhaps @alexandru can confirm :)

@armanbilge armanbilge changed the base branch from series/3.5.x to series/3.x June 27, 2023 00:28
@armanbilge armanbilge marked this pull request as ready for review June 27, 2023 00:29
@armanbilge armanbilge changed the title Proof-of-concept thread-local IOLocals thread-local IOLocals Jun 27, 2023
@kevin-lee
Copy link

@kevin-lee no problem! At a glance, yes, this does look like an alternative/replacement to that Monix feature. Perhaps @alexandru can confirm :)

@armanbilge Thank you. That's great!
It looks quite similar and looks like it can be used for the same purpose, but I can see the methods in IOLocals taking IOLocal[A] whereas the methods in Local from Monix don't. So I'm wondering if it can be used for the same purpose. I need to use it in a logging library like Logback, which doesn't have IO or any effect.
Yeah, it would be nice if @alexandru can confirm this.

@armanbilge
Copy link
Member Author

armanbilge commented Jun 28, 2023

but I can see the methods in IOLocals taking IOLocal[A] whereas the methods in Local from Monix don't. So I'm wondering if it can be used for the same purpose. I need to use it in a logging library like Logback, which doesn't have IO or any effect.

@kevin-lee the IOLocal[A] is just the "key". Once you have that key (and you can build one unsafely outside of the effect) then you don't need any more effects to read and write it.

For an example integration, see this PR which implements a Java SPI using this IOLocals API.

@kevin-lee
Copy link

@kevin-lee the IOLocal[A] is just the "key". Once you have that key (and you can build one unsafely outside of the effect) then you don't need any more effects to read and write it.

For an example integration, see this PR which implements a Java SPI using this IOLocals API.

@armanbilge Oh... got it. It looks very promising then. Thank you!

@armanbilge
Copy link
Member Author

armanbilge commented Sep 30, 2023

I did a quick benchmark and this feature is definitely not free 😕

ioLocalPropagation=false

[info] Benchmark                (size)   Mode  Cnt     Score     Error  Units
[info] DeepBindBenchmark.delay   10000  thrpt   20  4161.322 ± 553.643  ops/s

ioLocalPropagation=true

[info] Benchmark                (size)   Mode  Cnt     Score     Error  Units
[info] DeepBindBenchmark.delay   10000  thrpt   20  3382.086 ± 279.254  ops/s

Edit: sad, even simply adding the toggle seems to have had an impact. Here are control numbers from series/3.x before the changes in this PR.

[info] Benchmark                (size)   Mode  Cnt     Score     Error  Units
[info] DeepBindBenchmark.delay   10000  thrpt   20  4952.134 ± 353.248  ops/s

@NthPortal
Copy link

@armanbilge what's the status of this?

@armanbilge
Copy link
Member Author

I intend to land this in the upcoming v3.6.0 release (no ETA for this yet, but I'll make a milestone). The benchmarks I posted above for the initial implementation were disappointing, but I still need to benchmark the new implementation that I pushed in 3589db4.

@kevin-lee
Copy link

@armanbilge That's great. Thank you! I'm sure you can optimize it afterwards, and in the meantime, it would be extremely useful for people who need this feature.

@armanbilge
Copy link
Member Author

I'm sure you can optimize it afterwards

Unfortunately it's not so simple. The performance impact I reported before was for all users of Cats Effect, including those who are not using this feature. We want to avoid slowing down the entire runtime for everyone, because that would be a serious regression. So before we can release this, we need to minimize its impact on existing users who do not need this feature.

Still I'm optimistic about the performance of the revised implementation. Will report updated benchmarks, soon ...

@henricook
Copy link

How'd it go @armanbilge ? If you've had time

Copy link
Member

@djspiewak djspiewak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've come round to the opinion this is a necessary evil, it's just bikeshedding some of the API and vetting out a few concerns about the implementation.

core/shared/src/main/scala/cats/effect/IOLocal.scala Outdated Show resolved Hide resolved
core/shared/src/main/scala/cats/effect/IOFiber.scala Outdated Show resolved Hide resolved
Comment on lines +132 to +134
if (ioLocalPropagation) {
IOFiber.setCurrentIOFiber(null)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to do some careful benchmarking here. This particular bit is very sensitive to inliner shenanigans.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We meaning you 😛

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah yeah yeah… Give me a pair of shas! :)

@armanbilge armanbilge marked this pull request as draft June 4, 2024 20:56
@armanbilge armanbilge marked this pull request as ready for review June 5, 2024 00:34
@@ -46,6 +46,8 @@ private object IOFiberConstants {
final val AutoCedeR = 7
final val DoneR = 8

final val ioLocalPropagation = false
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since JS/Native currently lack a mechanism to eliminate branches based on system properties at link- or run-time we just hard-code to false to DCE at compile-time.

Comment on lines +23 to +29
/**
* Returns a [[java.lang.ThreadLocal]] view of this [[IOLocal]] that allows to unsafely get,
* set, and remove (aka reset) the value in the currently running fiber. The system property
* `cats.effect.ioLocalPropagation` must be `true`, otherwise throws an
* [[java.lang.UnsupportedOperationException]].
*/
def unsafeThreadLocal(): ThreadLocal[A] = if (ioLocalPropagation)
Copy link
Member Author

@armanbilge armanbilge Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new API based on the ThreadLocal interface. Thoughts?

  1. I made it JVM-only. We could definitely support this on JS/Native but I'm not yet convinced that there is a good reason to do so: no Java libs to interop with and performance implications.

  2. If ioLocalPropagation == false it throws an UnsupportedOperationException. Libraries can check IOLocal.isPropagating before deciding to call this method.

    Alternatively we could return a no-op ThreadLocal that never sets and always returns the default value.

  3. If we decide not to throw, we could make this a lazy val (or similar) instead. Creating the ThreadLocal "view" isn't observably effectual.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is great, all of it.

@djspiewak
Copy link
Member

Finished benchmarking. @armanbilge let's see your armchair performance reasoning explain this result. I'm at a loss.

Before

[info] Benchmark                                             (cpuTokens)   (size)   Mode  Cnt      Score     Error    Units
[info] DeepBindBenchmark.async                                       N/A    10000  thrpt   10   2786.655 ±   3.536    ops/s
[info] DeepBindBenchmark.delay                                       N/A    10000  thrpt   10   9613.496 ± 234.106    ops/s
[info] DeepBindBenchmark.pure                                        N/A    10000  thrpt   10  11168.222 ± 465.787    ops/s
[info] MapCallsBenchmark.batch120                                    N/A      N/A  thrpt   10    338.430 ±   3.116    ops/s
[info] MapCallsBenchmark.batch30                                     N/A      N/A  thrpt   10     86.709 ±   0.563    ops/s
[info] MapCallsBenchmark.one                                         N/A      N/A  thrpt   10      2.944 ±   0.017    ops/s
[info] MapStreamBenchmark.batch120                                   N/A      N/A  thrpt   10   5561.435 ±  14.217    ops/s
[info] MapStreamBenchmark.batch30                                    N/A      N/A  thrpt   10   2524.134 ±   2.286    ops/s
[info] MapStreamBenchmark.one                                        N/A      N/A  thrpt   10   3233.048 ±   4.027    ops/s
[info] ParallelBenchmark.parTraverse                               10000     1000  thrpt   10    886.610 ±   0.865    ops/s
[info] ParallelBenchmark.traverse                                  10000     1000  thrpt   10     70.460 ±   0.067    ops/s
[info] ShallowBindBenchmark.async                                    N/A    10000  thrpt   10   2031.729 ±   3.593    ops/s
[info] ShallowBindBenchmark.delay                                    N/A    10000  thrpt   10   9778.303 ±  62.426    ops/s
[info] ShallowBindBenchmark.pure                                     N/A    10000  thrpt   10  11973.770 ±  32.027    ops/s
[info] WorkStealingBenchmark.alloc                                   N/A  1000000  thrpt   10     14.152 ±   0.090  ops/min
[info] WorkStealingBenchmark.manyThreadsSchedulingBenchmark          N/A  1000000  thrpt   10     31.868 ±   5.072  ops/min
[info] WorkStealingBenchmark.runnableScheduling                      N/A  1000000  thrpt   10    870.238 ±   3.475  ops/min
[info] WorkStealingBenchmark.runnableSchedulingScalaGlobal           N/A  1000000  thrpt   10   2245.542 ±   7.645  ops/min
[info] WorkStealingBenchmark.scheduling                              N/A  1000000  thrpt   10     29.631 ±   2.092  ops/min

After

[info] Benchmark                                             (cpuTokens)   (size)   Mode  Cnt      Score     Error    Units
[info] DeepBindBenchmark.async                                       N/A    10000  thrpt   10   2814.793 ±   6.982    ops/s
[info] DeepBindBenchmark.delay                                       N/A    10000  thrpt   10   9589.880 ±  27.294    ops/s
[info] DeepBindBenchmark.pure                                        N/A    10000  thrpt   10  11377.434 ±  37.080    ops/s
[info] MapCallsBenchmark.batch120                                    N/A      N/A  thrpt   10    336.483 ±   2.160    ops/s
[info] MapCallsBenchmark.batch30                                     N/A      N/A  thrpt   10     85.833 ±   0.404    ops/s
[info] MapCallsBenchmark.one                                         N/A      N/A  thrpt   10      2.925 ±   0.007    ops/s
[info] MapStreamBenchmark.batch120                                   N/A      N/A  thrpt   10   5684.510 ±   8.954    ops/s
[info] MapStreamBenchmark.batch30                                    N/A      N/A  thrpt   10   2522.568 ±   7.291    ops/s
[info] MapStreamBenchmark.one                                        N/A      N/A  thrpt   10   3284.312 ±   2.895    ops/s
[info] ParallelBenchmark.parTraverse                               10000     1000  thrpt   10    891.486 ±   0.846    ops/s
[info] ParallelBenchmark.traverse                                  10000     1000  thrpt   10     70.303 ±   0.060    ops/s
[info] ShallowBindBenchmark.async                                    N/A    10000  thrpt   10   1985.424 ±   2.771    ops/s
[info] ShallowBindBenchmark.delay                                    N/A    10000  thrpt   10   9655.691 ± 182.608    ops/s
[info] ShallowBindBenchmark.pure                                     N/A    10000  thrpt   10  10075.531 ±  22.884    ops/s
[info] WorkStealingBenchmark.alloc                                   N/A  1000000  thrpt   10     14.044 ±   0.078  ops/min
[info] WorkStealingBenchmark.manyThreadsSchedulingBenchmark          N/A  1000000  thrpt   10     48.611 ±   1.642  ops/min
[info] WorkStealingBenchmark.runnableScheduling                      N/A  1000000  thrpt   10   2987.595 ±   5.299  ops/min
[info] WorkStealingBenchmark.runnableSchedulingScalaGlobal           N/A  1000000  thrpt   10   2249.480 ±  48.568  ops/min
[info] WorkStealingBenchmark.scheduling                              N/A  1000000  thrpt   10     52.240 ±   3.133  ops/min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants