Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByteStack: detect overflow immediately #3949

Draft
wants to merge 1 commit into
base: series/3.5.x
Choose a base branch
from

Conversation

durban
Copy link
Contributor

@durban durban commented Jan 13, 2024

This is about #3907. The idea is to detect the overflow when it happens, and immediately throw. (Instead of writing a negative count into the ByteStack, and later get an ArrayIndexOutOfBoundsException.)

@durban durban linked an issue Jan 13, 2024 that may be closed by this pull request
@armanbilge
Copy link
Member

and later get an ArrayIndexOutOfBoundsException

btw, worth pointing out that currently on Scala.js, there are no ArrayIndexOutOfBoundsExceptions actually. Not sure what exactly a js.Array does in this circumstance. So this would be an improvement over "undefined behavior".

otoh, I am concerned if this hurts performance to check for what is effectively a fatal error anyway.

Suppose a fiber's stack overflows: is this a recoverable situation? On the JVM StackOverflowException is considered a fatal error that shuts down the runtime.

@armanbilge
Copy link
Member

Suppose a fiber's stack overflows: is this a recoverable situation?

What I'm realizing is that actually we don't really care what ByteStack does, this is an internal implementation detail.

What we do care about is if we can handle this gracefully in IOFiber i.e. raise a user-friendly exception and then run the appropriate error handlers / finalizers / whatever.

@armanbilge
Copy link
Member

armanbilge commented Jan 15, 2024

I'm not sure if we are able to treat this as an ordinary error that can be handled by users. Since error handling requires stack, which we have run out of ...

However, we could self-cancel the fiber, since this creates an entirely new stack, and log the error via some side-channel. Edit: wait, what if we are in an uncancelable region? Hmm.

private[this] def prepareFiberForCancelation(cb: Either[Throwable, Unit] => Unit): IO[Any] = {
if (!finalizers.isEmpty()) {
if (!finalizing) {
// Do not nuke the fiber execution state repeatedly.
finalizing = true
conts = ByteStack.create(8)
conts = ByteStack.push(conts, CancelationLoopK)

@durban
Copy link
Contributor Author

durban commented Jan 15, 2024

You're right. As you say, we can't error, and can't cancel. (And obviously we can't go on.) So all this PR does is that we reliably throw the exception if this happens. But then that exception is unhandled, goes into reportFailure, then the fiber hangs.

I don't really know, what to do with this. It could be argued, that this is similar to a StackOverflowException. We handle that by shutting down absolutely everything. But that seems... too much to me.

@armanbilge
Copy link
Member

As you say, we can't error, and can't cancel.

Actually, we can cancel. Sometimes. Not always.

So I think that if our ByteStack overflows and we are in a cancelable region, we should self-cancel and side-channel the error.

If we are not in a cancelable region ...

It could be argued, that this is similar to a StackOverflowException. We handle that by shutting down absolutely everything. But that seems... too much to me.

Then yes, I think we should raise a "FiberStackOverflowError" via our fatal error mechanism. Because otherwise we have a fiber that hung without running its finalizers and releasing resources, whether those are locks, file descriptors, whatever. The runtime is in an invalid state and this is irrecoverable. That seems pretty fatal.

@armanbilge
Copy link
Member

Btw, this is predicated on the assumption that adding this check is not detectably detrimental to performance. 😅

@armanbilge
Copy link
Member

We handle that by shutting down absolutely everything. But that seems... too much to me.

Thinking about this more, I think I understand why it feels like "too much" (at least for me).

Imagine if getting a fatal exception on the JVM shut down your entire machine. That would be too much 😁

In this context, we can imagine the IORuntime as floating above the JVM runtime, analogously to how the JVM runtime floats above your operating system. So by shutting down the JVM runtime for a fatal error in IORuntime, it feels like we are reaching one semantic layer above ourselves.

For an IOApp I don't think the distinction between the IORuntime and the JVM runtime is meaningful since their lifecycles are tied so closely together. So a complete shutdown seems appropriate.

However, when unsafe-running an IO outside of an IOApp, it seems an appropriate course of action might be to shutdown the current IORuntime. But the exception we raise does not need to be a JVM-fatal exception and we don't need to shutdown all the IORuntimes, like our current JVM-fatal handling mechanism does.

Essentially we need a separate pathway for handling IORuntime-fatal errors that are not JVM-fatal errors.

@durban
Copy link
Contributor Author

durban commented Jan 16, 2024

Yes, that's exactly it.

But the more I think about this, the more I think that a hanging fiber might be the least bad thing. (Hanging fibers are obviously bad, but they're already permitted for resource safety reasons.) Shutting down just one IORuntime (as you say) might be also acceptable, although still feels a bit much.

Regarding the performance concerns: it might be worth running some benchmarks with this branch. Whatever we do, it probably won't be any faster than this, as this uses a hotspot intrinsic. If it's already too slow, we probably have no chance of doing anything.

@armanbilge
Copy link
Member

Whatever we do, it probably won't be any faster than this, as this uses a hotspot intrinsic. If it's already too slow, we probably have no chance of doing anything.

Exactly. If it's already too slow, then our thoughtful discussion above is moot 😆

I was remembering this comment from ArrayStack, which suggested that we are allergic to bounds checks. I am actually not sure if it's still relevant, because otherwise I don't know what it is referring to 🤔

// TODO remove bounds check
def pop(): A = {
index -= 1
val back = buffer(index).asInstanceOf[A]
buffer(index) = null // avoid memory leaks
back
}


as this uses a hotspot intrinsic

Well, not on Scala.js. JS performance seems sensitive to these kinds of checks, so sensitive that even Scala.js doesn't bother throwing IOOB exceptions for bad scala.Array accesses and just declares it "undefined behavior". Anyway, since the JS implementation is separate we could decide to omit the check there.

@djspiewak
Copy link
Member

I think it's very likely that we're sensitive to these types of bounds checks. Regarding that ancient TODO btw, I did experiment with using Unsafe to remove the bounds checks on array access in pop() and surprisingly it made things slower. My guess (though I didn't bother to disassemble to verify) is that there's a special fast-path for this type of pattern and I jostled the JIT out of it when I replaced the array access with its Unsafe equivalent. The performance difference was noticeable btw even in the higher level JMH benchmarks.

I'll put this on my pile of "to benchmark" and see what shakes out. I agree with the way @armanbilge is describing it: we basically need our own version of an SOE to cover this scenario. It's unfathomably weird though, because we're talking about a situation where someone has recursed to staggering depth in a non tail-recursive way and hasn't popped back up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArrayIndexOutOfBoundsException in ByteStack
3 participants