Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaced global ContextShift/Timer instances on JVM #547

Merged
merged 2 commits into from
Jun 7, 2019

Conversation

djspiewak
Copy link
Member

This pulls IOApp off the global ExecutionContext and onto a custom one that has better properties. Specifically:

  • Crashes on truly fatal errors (rather than doing something horrible, like swallowing OutOfMemoryError)
  • Lower-bounds the pool to 2 threads (global can go down to 1 if you only have 1 core). Having a 1-thread pool can cause deadlocks in poorly written code. Poorly written code is poor, but I'd rather not give them more headaches, and it's enough of an edge case that I'm okay with it.
  • Avoids issues with global pool pollution by third-party code
  • Avoids fork f***ing join, which is horrible for this kind of stuff

Related to #347. I'd like to hear from @rossabaker (and ideally @alexandru) as to their thoughts on the design tradeoffs here, and whether I should expand the impact at all.

I want to get this in 2.0 because, while it isn't breaking in the traditional sense, it's most definitely a very subtle compatibility shift and I don't want to catch people off guard.

Fixes #509
Fixes #515

@codecov-io
Copy link

Codecov Report

Merging #547 into master will decrease coverage by 0.5%.
The diff coverage is 0%.

@@            Coverage Diff            @@
##           master    #547      +/-   ##
=========================================
- Coverage    89.9%   89.4%   -0.51%     
=========================================
  Files          72      73       +1     
  Lines        2090    2104      +14     
  Branches      127     125       -2     
=========================================
+ Hits         1879    1881       +2     
- Misses        211     223      +12

def newThread(r: Runnable): Thread = {
val back = new Thread(r)
back.setName(s"ioapp-compute-${ctr.getAndIncrement()}")
back.setDaemon(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I continue to dislike daemon threads for important work. IOApp.WithContext is the existing compromise, though it's inconvenient enough I'm not so vigilant about using it.

I don't know how to avoid the threads and shut down the context properly without a breaking change. This comment may be more of a 3.0 discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this. I think it's okay for these to be daemon specifically because we have other control flow mechanisms which guarantee everything gets wound down before exiting under normal circumstances. In the event that something is exiting without tracing through the control flow graph all the way to the end of IOApp, then something horrible has happened (like OOM), and we want the threads to just die.

TLDR, I think that threads blocking shutdown is a feature that isn't relevant or necessary in applications constructed with explicit monadic control flow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I suppose we can say that any service that relies on its thread pools' graceful shutdown needs to be run on a separate, non-daemon pool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly.

case t: Throwable =>
// under most circumstances, this will work even with fatal errors
t.printStackTrace()
System.exit(-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why -1 (which results in 255, which is to some sources an "exit status out of range"? I think 1 is more reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change.

try {
r.run()
} catch {
case NonFatal(t) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One false positive an http4s user ran into recently: rendering a very large String causes the charset encoder to allocate an array greater than Int.MaxValue and throws an OutOfMemoryError("Requested array size exceeds VM limit") but everything (but that response) is fine. It's irritating that the JVM throws this non-fatal error as the most inarguably fatal of error types.

I think this approach is more correct than not, but I'm interested to hear if there are other horror stories of non-fatal errors that aren't caught by this. The System.exit raises the cost of false negatives.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. Also that's a good example of an OOM which is legit recoverable. Another false-positive, as a note, could be AssertionError. It's another problem that I went back and forth on.

I think the thing is that, the moment some sort of truly fatal error happens, I really do want to know it and explode spectacularly. The worst feeling in the world is knowing that your system probably went OOM (or had a link error, or such), but it was swallowed and now your only way to figure it out is to dump the running system and look for orphans.

// we can initialize this eagerly because the enclosing object is lazy
val ioAppGlobal: ExecutionContext = {
// lower-bound of 2 to prevent pathological deadlocks on virtual machines
val bound = math.max(2, Runtime.getRuntime().availableProcessors())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// lower-bound of 2 to prevent pathological deadlocks on virtual machines
val bound = math.max(2, Runtime.getRuntime().availableProcessors())

val executor = Executors.newFixedThreadPool(bound, new ThreadFactory {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the basic argument against ForkJoin that most people aren't doing divide-and-conquer tasks, and therefore we're all paying for the overhead of the layered queues without reaping its rewards? I intuitively agree, but Scala standard library chose otherwise, and the ForkJoinPool javadoc claims:

efficient processing ... when many small tasks are submitted to the pool from external clients

and

Especially when setting asyncMode to true in constructors, ForkJoinPools may also be appropriate for use with event-style tasks that are never joined.

In theory, I agree with this change. Empirically, I've swapped them out on real world workloads and not seen much difference either way. I'm curious if you have apps where a FJ pool is noticeably detrimental to performance. (Other points, like catching fatals and a minimum of one, are well taken and discussed elsewhere. I'm strictly speaking performance here.)

I imagine most cats-effect users implement blocking with evalOn and a second pool, but those who don't and whose blocking libraries save them by assuming a blocking-integrated executor will be in for an unpleasant surprise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if you have apps where a FJ pool is noticeably detrimental to performance.

Very yes. There isn't a good tldr response to this comment, so I'll write something up in a gist instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually, #509 contains a halfway-decent description. I can flesh it out more if you like. It's a lot less about the layered queues and more about the spawning semantics being optimized for bad thread pool usage. ForkJoin is a fantastic general purpose pool for Java. It's not a terrible choice for Future, either, since people tend to abuse Future for blocking side effects and such. Users of cats-effect though tend to be a lot more scrupulous about their pools, since the API is very carefully (and very effectively) designed to carrot them to do so.

When you use ForkJoin to exclusively perform CPU-bound tasks that are throughput-optimized, it will behave horrifically poorly and you'll end up with a zillion threads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you tend to see this with longer-running CPU intensive tasks, rather than a many small ones?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very much so, yes. Many small ones are generally okay because you're yielding back to the pool frequently. Heavily fairness-optimized workflows also don't see this issue. I'm relatively certain, based on the symptoms, that fork-join's heuristics are simply mistakenly identifying throughput-optimized workflows as being blocking IO, and thus spinning up new threads to avoid starvation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants