Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questionable statement on introduction page about green threading and blocking IO? #243

Closed
swachter opened this issue May 25, 2018 · 8 comments

Comments

@swachter
Copy link

commented May 25, 2018

(Sorry if this issue is stupid. I am not familiar with Haskell's threading model.)

The cats-effect introduction page says:

"As Haskell’s runtime uses green threading, a synchronous IO (and the requisite thread blocking) makes a lot of sense."

Whereas Wikipedia on Green Threads says:

"When a green thread executes a blocking system call, not only is that thread blocked, but all of the threads within the process are blocked.[5] To avoid that problem, green threads must use asynchronous I/O operations, ..."

These statements seem to contradict each other. I guess that Haskell has some kind of built-in logic for IO (like yielding the control to another green thread). Maybe the situation/reasoning could be explained some more.

@alexandru

This comment has been minimized.

Copy link
Member

commented May 25, 2018

Not much contradictions about it, green threads are managed by the runtime, which means that in some languages "blocking" on some result can be translated by the runtime to async I/O operations.

TBH I would remove that whole paragraph, as it serves as justification for design decisions made when the project was first announced (as a comparison to others) and currently serves no purpose.

@SystemFw

This comment has been minimized.

Copy link
Collaborator

commented May 25, 2018

@swachter First off, I would also remove the paragraph, but anyway let me try to explain what's going on. Apologies if I'll restate things you already know. I'm also going to be a bit loose so I write a comment and not an essay.

  1. The way concurrency works is normally multiplexing: if you have multiple sequences of instructions, they are interleaving according to a certain strategy, so some actions of the first sequence are executed, then some of the third, then some of second and so on. This is called scheduling .

  2. There are several layers in which this process happens: OS processes, with independent execution state and independent memory space, OS threads, that share the same memory space, and green threads. JVM threads, at least now, are mapped 1:1 to OS threads. Green threads are mapped n:m to OS threads. Each layer is multiplexed on the layer below, so it's possible for multiple green threads to run on one OS thread, just as it's possible for multiple OS threads to run on one OS process. Haskell has green threads, and one could also say that a cats-effect run-loop is a green thread, with Fiber + start being an abstraction over spawning, interrupting or waiting on it.

  3. Blocking on one level generally means suspending at the layer below. In cats-effect, operations like Deferred.get on an empty Deferred (or in fs2 Queue.dequeue on an empty queue) wait until the result is available. In fs2 we call this "semantic blocking". I think Monix uses "asynchronous blocking". In both cases we mean that the JVM thread is not blocked, only the IO/F is. This is not a contradiction: from the point of view of an IO waiting on Deferred.get, it is blocked until a result arrives, but from the point of view of the "RTS" (we don't have a separate RTS in cats-effect yet, so think of it as the run-loops currently happening), that IO has simply yielded back control so that other stuff can happen. It's the same on the layer below: when a JVM threads block from the point of view of the code executing on it, it's simply yielding back control to other threads until it is awaken.

  4. Blocking on an JVM thread is worse than blocking on green thread. Since a JVM thread is more heavy-weight, there can be less of them (they are a scarce resource), so you want to avoid blocking to not waste a precious thread. However there can be many green threads, so it's fine to block one because the others can still run on the same OS/JVM thread.

With all this preamble, I can now explain how I interpret those two statements. I think the reason they confuse you is because they are talking about blocking on different levels.

""As Haskell’s runtime uses green threading, a synchronous IO (and the requisite thread blocking) makes a lot of sense."

thread blocking here means blocking a green thread , so this is basically the second part of point 4.

""When a green thread executes a blocking system call, not only is that thread blocked, but all of the threads within the process are blocked.[5] To avoid that problem, green threads must use asynchronous I/O operations, ..."

blocking system call here means blocking an OS thread/OS process, this is bad because the whole OS thread yields back control, and there are multiple green threads multiplexed on it, so all those green threads are blocked, on the layer above.

The haskell runtime has several strategies in place to avoid this problem, ranging from an async epoll mechanism for i/o, to cooperative yielding, to a separate pool for things that are genuinely blocking at the OS level, but this is all out of scope. The point is that those two statement refer to blocking in two different layers, and that's why they appear to contradict each other.

This turned super long, sorry 😬

@alexandru

This comment has been minimized.

Copy link
Member

commented May 25, 2018

Nice explanation @SystemFw. At some point we should do such an introduction on the website.

@SystemFw

This comment has been minimized.

Copy link
Collaborator

commented May 25, 2018

Yeah, that could be a FAQ since we've had a similar issue ("what is semantic blocking") in fs2 as well

@swachter

This comment has been minimized.

Copy link
Author

commented May 25, 2018

Many thanks for that explanation @SystemFw !

I think the most important point is that blocking a green thread must not block the underlying OS thread. However, there are introductory examples out there using

IO { Console.readln }

for keyboard input. In my understanding they block the green thread as well as the underlying OS thread. Therefore, is it justified to say that these examples are bad? How should it be done the right way in cats IO?

@SystemFw

This comment has been minimized.

Copy link
Collaborator

commented May 25, 2018

Yeah, things like IO(Thread.sleep(2000)) are bad for exactly the reason you describe, so for example Timer[IO].sleep(2.seconds) is the way to do that. In general, you need to rely on non-blocking abstractions (I don't have a readln version off the top of my head)

@oleg-py

This comment has been minimized.

Copy link
Contributor

commented May 25, 2018

@swachter the problem is that a lot of I/O in Java is still OS thread-blocking, and we don't have good alternatives for all of it (JDBC is the thing we all like to blame). While doing Console.readln() is blocking, it's typically not something you do in apps that require high concurrency, and with wrapping it in IO, you get referential transparency, which is the most important point here.

@swachter

This comment has been minimized.

Copy link
Author

commented May 25, 2018

Thanks for all your answers! I close this issue.

@swachter swachter closed this May 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants
You can’t perform that action at this time.