New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Replaced global ContextShift/Timer instances on JVM #547

Merged

djspiewak merged 2 commits into typelevel:master from djspiewak:feature/dont-use-global

Jun 7, 2019

Member

djspiewak commented May 26, 2019

This pulls IOApp off the global ExecutionContext and onto a custom one that has better properties. Specifically:

Crashes on truly fatal errors (rather than doing something horrible, like swallowing OutOfMemoryError)
Lower-bounds the pool to 2 threads (global can go down to 1 if you only have 1 core). Having a 1-thread pool can cause deadlocks in poorly written code. Poorly written code is poor, but I'd rather not give them more headaches, and it's enough of an edge case that I'm okay with it.
Avoids issues with global pool pollution by third-party code
Avoids fork f***ing join, which is horrible for this kind of stuff

Related to #347. I'd like to hear from @rossabaker (and ideally @alexandru) as to their thoughts on the design tradeoffs here, and whether I should expand the impact at all.

I want to get this in 2.0 because, while it isn't breaking in the traditional sense, it's most definitely a very subtle compatibility shift and I don't want to catch people off guard.

Fixes #509
Fixes #515


          Replaced global ContextShift/Timer instances on JVM

3bfca15

codecov-io commented May 26, 2019

Codecov Report

Merging #547 into master will decrease coverage by 0.5%.
The diff coverage is 0%.

@@            Coverage Diff            @@
##           master    #547      +/-   ##
=========================================
- Coverage    89.9%   89.4%   -0.51%     
=========================================
  Files          72      73       +1     
  Lines        2090    2104      +14     
  Branches      127     125       -2     
=========================================
+ Hits         1879    1881       +2     
- Misses        211     223      +12

rossabaker reviewed

View reviewed changes

core/jvm/src/main/scala/cats/effect/internals/PoolUtils.scala

+                    def newThread(r: Runnable): Thread = {
+                      val back = new Thread(r)
+                      back.setName(s"ioapp-compute-${ctr.getAndIncrement()}")
+                      back.setDaemon(true)

Member

rossabaker May 27, 2019

I continue to dislike daemon threads for important work. IOApp.WithContext is the existing compromise, though it's inconvenient enough I'm not so vigilant about using it.

I don't know how to avoid the threads and shut down the context properly without a breaking change. This comment may be more of a 3.0 discussion.

Member Author

djspiewak May 27, 2019

I went back and forth on this. I think it's okay for these to be daemon specifically because we have other control flow mechanisms which guarantee everything gets wound down before exiting under normal circumstances. In the event that something is exiting without tracing through the control flow graph all the way to the end of IOApp, then something horrible has happened (like OOM), and we want the threads to just die.

TLDR, I think that threads blocking shutdown is a feature that isn't relevant or necessary in applications constructed with explicit monadic control flow.

Member

rossabaker May 27, 2019

And I suppose we can say that any service that relies on its thread pools' graceful shutdown needs to be run on a separate, non-daemon pool.

Member Author

djspiewak May 28, 2019

Exactly.

core/jvm/src/main/scala/cats/effect/internals/PoolUtils.scala Outdated

+                          case t: Throwable =>
+                            // under most circumstances, this will work even with fatal errors
+                            t.printStackTrace()
+                            System.exit(-1)

Member

rossabaker May 27, 2019

Nit: why -1 (which results in 255, which is to some sources an "exit status out of range"? I think 1 is more reasonable.

Member Author

djspiewak May 27, 2019

Will change.

core/jvm/src/main/scala/cats/effect/internals/PoolUtils.scala

+                        try {
+                          r.run()
+                        } catch {
+                          case NonFatal(t) =>

Member

rossabaker May 27, 2019

One false positive an http4s user ran into recently: rendering a very large String causes the charset encoder to allocate an array greater than Int.MaxValue and throws an OutOfMemoryError("Requested array size exceeds VM limit") but everything (but that response) is fine. It's irritating that the JVM throws this non-fatal error as the most inarguably fatal of error types.

I think this approach is more correct than not, but I'm interested to hear if there are other horror stories of non-fatal errors that aren't caught by this. The System.exit raises the cost of false negatives.

Member Author

djspiewak May 27, 2019

That's a fair point. Also that's a good example of an OOM which is legit recoverable. Another false-positive, as a note, could be AssertionError. It's another problem that I went back and forth on.

I think the thing is that, the moment some sort of truly fatal error happens, I really do want to know it and explode spectacularly. The worst feeling in the world is knowing that your system probably went OOM (or had a link error, or such), but it was swallowed and now your only way to figure it out is to dump the running system and look for orphans.

core/jvm/src/main/scala/cats/effect/internals/PoolUtils.scala

+                // we can initialize this eagerly because the enclosing object is lazy
+                val ioAppGlobal: ExecutionContext = {
+                  // lower-bound of 2 to prevent pathological deadlocks on virtual machines
+                  val bound = math.max(2, Runtime.getRuntime().availableProcessors())

Member

rossabaker May 27, 2019

👍

core/jvm/src/main/scala/cats/effect/internals/IOAppPlatform.scala Show resolved Hide resolved

core/jvm/src/main/scala/cats/effect/internals/PoolUtils.scala

+                  // lower-bound of 2 to prevent pathological deadlocks on virtual machines
+                  val bound = math.max(2, Runtime.getRuntime().availableProcessors())
+                  val executor = Executors.newFixedThreadPool(bound, new ThreadFactory {

Member

rossabaker May 27, 2019

Is the basic argument against ForkJoin that most people aren't doing divide-and-conquer tasks, and therefore we're all paying for the overhead of the layered queues without reaping its rewards? I intuitively agree, but Scala standard library chose otherwise, and the ForkJoinPool javadoc claims:

efficient processing ... when many small tasks are submitted to the pool from external clients

and

Especially when setting asyncMode to true in constructors, ForkJoinPools may also be appropriate for use with event-style tasks that are never joined.

In theory, I agree with this change. Empirically, I've swapped them out on real world workloads and not seen much difference either way. I'm curious if you have apps where a FJ pool is noticeably detrimental to performance. (Other points, like catching fatals and a minimum of one, are well taken and discussed elsewhere. I'm strictly speaking performance here.)

I imagine most cats-effect users implement blocking with evalOn and a second pool, but those who don't and whose blocking libraries save them by assuming a blocking-integrated executor will be in for an unpleasant surprise.

Member Author

djspiewak May 27, 2019

I'm curious if you have apps where a FJ pool is noticeably detrimental to performance.

Very yes. There isn't a good tldr response to this comment, so I'll write something up in a gist instead.

Member Author

djspiewak May 27, 2019

Oh actually, #509 contains a halfway-decent description. I can flesh it out more if you like. It's a lot less about the layered queues and more about the spawning semantics being optimized for bad thread pool usage. ForkJoin is a fantastic general purpose pool for Java. It's not a terrible choice for Future, either, since people tend to abuse Future for blocking side effects and such. Users of cats-effect though tend to be a lot more scrupulous about their pools, since the API is very carefully (and very effectively) designed to carrot them to do so.

When you use ForkJoin to exclusively perform CPU-bound tasks that are throughput-optimized, it will behave horrifically poorly and you'll end up with a zillion threads.

Member

rossabaker May 27, 2019

Do you tend to see this with longer-running CPU intensive tasks, rather than a many small ones?

Member Author

djspiewak May 28, 2019

Very much so, yes. Many small ones are generally okay because you're yielding back to the pool frequently. Heavily fairness-optimized workflows also don't see this issue. I'm relatively certain, based on the symptoms, that fork-join's heuristics are simply mistakenly identifying throughput-optimized workflows as being blocking IO, and thus spinning up new threads to avoid starvation.


          Better exit code

57e8192

djspiewak merged commit 03a5551 into typelevel:master

danxmoran mentioned this pull request

OutOfMemoryError in computation performed by Blocker doesn't halt the JVM #692

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment