New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Atomics.wait() to throw if a deadlock is guaranteed #1018

Closed
addaleax opened this Issue Oct 7, 2017 · 13 comments

Comments

Projects
None yet
8 participants
@addaleax

addaleax commented Oct 7, 2017

Hey everyone! I’m not sure this is the right place or form to bring this up, please send me somewhere else if there’s something better.

I’m working on a Workers implementation for Node (or rather, a fork of it). SharedArrayBuffer and Atomics are supported right now; One thing I’ve been thinking about is that Atomics.wait() could, at least in many cases, and maybe optionally, could identify situations in which a deadlock is guaranteed to happen. In particular:

  • When the current Agent is the only one that has access to the underlying SharedArrayBuffer memory
  • When all other Agents that have access to the underlying SharedArrayBuffer memory are inside an Atomics.wait() call themselves

So, what I would like to propose is to allow an Atomics.wait() call to throw an error if it is guaranteed to never be fulfilled.

A few notes:

  • I would exempt timeout-ed waits from this
  • I have a PoC patch for my Workers impl for V8/Node/Ayo: https://gist.github.com/addaleax/b0bd653f43d311ead6b5945222f3570c
  • There is precedent for this in some libraries, e.g. pthread_mutex_lock() is allowed to fail with EDEADLK (although with somewhat different semantics)
  • I don’t think SharedArrayBuffer + Atomic usage is widespread enough in browsers for this to be a breaking change, e.g. it is still behind a flag in implementations, and there are to my knowledge no other implementations of it outside of browsers
  • I find it very hard to imagine code that actually relies on the fact that deadlocks are actually blocking the Agent forever

I would expect this feature to be optional to implementations, but I don’t feel like I can make a good call on that.

Now, my questions:

  • I assume this requires the spec to explicitly allow for this – is that correct?
  • Is this something that should/needs to go through all stages as a proposal?
  • Is this a non-starter? Is it worth spending more time on this (which I am totally willing to do)?

/cc @TimothyGu @lars-t-hansen @binji @jasnell

@addaleax addaleax changed the title from Allow Atomics.wait() to throw if a deadlock is guranteed to Allow Atomics.wait() to throw if a deadlock is guaranteed Oct 7, 2017

@jfbastien

This comment has been minimized.

Show comment
Hide comment
@jfbastien

jfbastien Oct 7, 2017

It would be useful to understand why you think this is useful, compared to a deadlock.

jfbastien commented Oct 7, 2017

It would be useful to understand why you think this is useful, compared to a deadlock.

@TimothyGu

This comment has been minimized.

Show comment
Hide comment
@TimothyGu

TimothyGu Oct 7, 2017

Member

@jfbastien No (sane, at least) program would want to deadlock, period. I know this sounds a bit curt but I simply cannot identify a use case for deadlocking in such situations.

Member

TimothyGu commented Oct 7, 2017

@jfbastien No (sane, at least) program would want to deadlock, period. I know this sounds a bit curt but I simply cannot identify a use case for deadlocking in such situations.

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Oct 7, 2017

@jfbastien This is mostly coming from a huge appreciation of deadlock detection mechanisms in other MT libraries for other languages (edit to clarify: as a consumer of those libraries, I mean).

I realize Atomics.wait() is supposed to be a building block for low-level mechanisms, but I don’t think this particular thing could be implemented in userland as well as in the threading implementations itself – also, people are going to make mistakes.

addaleax commented Oct 7, 2017

@jfbastien This is mostly coming from a huge appreciation of deadlock detection mechanisms in other MT libraries for other languages (edit to clarify: as a consumer of those libraries, I mean).

I realize Atomics.wait() is supposed to be a building block for low-level mechanisms, but I don’t think this particular thing could be implemented in userland as well as in the threading implementations itself – also, people are going to make mistakes.

@syg

This comment has been minimized.

Show comment
Hide comment
@syg

syg Oct 8, 2017

Member

In general, I would not mind some quick set of shallow, eager, and cheap deadlock detection, but we should tread carefully (and I am skeptical), for at least the following reasons.

  1. The deadlock detection should not be implementation-dependent and needs to be fully defined.

  2. JS having very different multiple embeddings greatly constrains the kind of deadlock detection that is feasible. Your proposal might be desirable for node, but the concepts of "agent" and "agent cluster" in the spec are loose enough that it's not clear at all to me what the minimal desirable set is, or if there is one.

    It may be the case in node that, if at the time of calling Atomics.wait, if the calling agent is the only with access to the underlying shared memory, no new agents that can access that shared memory may be spawned in the agent cluster until that Atomics.wait call returns. Is that true of all embeddings now? And if it is, is that a constraint we wish to impose on all embeddings?

  3. Spec-wise, it is too late to add this behavior. The bar is higher than 1) it'd be nice to have and 2) it might not break the web in reality. And throwing is not in line with the rest of the Atomics.wait API. It would return a string literal indicating deadlock if anything.

(I tried searching the issues in the old repo and didn't see any find any existing discussion about simple deadlock detection. @lars-t-hansen should chime in, this might very well have been discussed already.)

Member

syg commented Oct 8, 2017

In general, I would not mind some quick set of shallow, eager, and cheap deadlock detection, but we should tread carefully (and I am skeptical), for at least the following reasons.

  1. The deadlock detection should not be implementation-dependent and needs to be fully defined.

  2. JS having very different multiple embeddings greatly constrains the kind of deadlock detection that is feasible. Your proposal might be desirable for node, but the concepts of "agent" and "agent cluster" in the spec are loose enough that it's not clear at all to me what the minimal desirable set is, or if there is one.

    It may be the case in node that, if at the time of calling Atomics.wait, if the calling agent is the only with access to the underlying shared memory, no new agents that can access that shared memory may be spawned in the agent cluster until that Atomics.wait call returns. Is that true of all embeddings now? And if it is, is that a constraint we wish to impose on all embeddings?

  3. Spec-wise, it is too late to add this behavior. The bar is higher than 1) it'd be nice to have and 2) it might not break the web in reality. And throwing is not in line with the rest of the Atomics.wait API. It would return a string literal indicating deadlock if anything.

(I tried searching the issues in the old repo and didn't see any find any existing discussion about simple deadlock detection. @lars-t-hansen should chime in, this might very well have been discussed already.)

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Oct 8, 2017

The deadlock detection should not be implementation-dependent and needs to be fully defined.

I’m not disagreeing, but it would be helpful for me to have a short summary of why that is a hard and fast rule?

And throwing is not in line with the rest of the Atomics.wait API.

Fwiw, Atomics.wait already throws if it’s not allowed, and unlike the possible return values of Atomics.wait, this condition always points to a programmer error. (Not that I’m set on this specific behavior, just pointing this out.)

Spec-wise, it is too late to add this behavior.

Is that a positive answer to the question “Is this something that should/needs to go through all stages as a proposal”?

addaleax commented Oct 8, 2017

The deadlock detection should not be implementation-dependent and needs to be fully defined.

I’m not disagreeing, but it would be helpful for me to have a short summary of why that is a hard and fast rule?

And throwing is not in line with the rest of the Atomics.wait API.

Fwiw, Atomics.wait already throws if it’s not allowed, and unlike the possible return values of Atomics.wait, this condition always points to a programmer error. (Not that I’m set on this specific behavior, just pointing this out.)

Spec-wise, it is too late to add this behavior.

Is that a positive answer to the question “Is this something that should/needs to go through all stages as a proposal”?

@syg

This comment has been minimized.

Show comment
Hide comment
@syg

syg Oct 8, 2017

Member

I’m not disagreeing, but it would be helpful for me to have a short summary of why that is a hard and fast rule?

The point of the standard is to give a hard guarantee of interop among standard-conforming implementations. Having something like Atomics.wait throw on some implementations and not others is a nonstarter. We've made an effort to have SAB in general have no UB.

Fwiw, Atomics.wait already throws if it’s not allowed, and unlike the possible return values of Atomics.wait, this condition always points to a programmer error. (Not that I’m set on this specific behavior, just pointing this out.)

Ah that's right, it does throw on agents whose [[CanBlock]] is false. My bad.

Is that a positive answer to the question “Is this something that should/needs to go through all stages as a proposal”?

It should be a proposal that goes through the stages, yeah. I don't know if the ship has already sailed though. Making error cases non-errors is backwards compatible, the inverse isn't.

Member

syg commented Oct 8, 2017

I’m not disagreeing, but it would be helpful for me to have a short summary of why that is a hard and fast rule?

The point of the standard is to give a hard guarantee of interop among standard-conforming implementations. Having something like Atomics.wait throw on some implementations and not others is a nonstarter. We've made an effort to have SAB in general have no UB.

Fwiw, Atomics.wait already throws if it’s not allowed, and unlike the possible return values of Atomics.wait, this condition always points to a programmer error. (Not that I’m set on this specific behavior, just pointing this out.)

Ah that's right, it does throw on agents whose [[CanBlock]] is false. My bad.

Is that a positive answer to the question “Is this something that should/needs to go through all stages as a proposal”?

It should be a proposal that goes through the stages, yeah. I don't know if the ship has already sailed though. Making error cases non-errors is backwards compatible, the inverse isn't.

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Oct 8, 2017

Okay, thank you, these were helpful answers. I’ll think about it a bit.

Making error cases non-errors is backwards compatible, the inverse isn't.

I wholeheartedly agree, but I am not sure actual deadlocks can count as non-error cases. ;)

addaleax commented Oct 8, 2017

Okay, thank you, these were helpful answers. I’ll think about it a bit.

Making error cases non-errors is backwards compatible, the inverse isn't.

I wholeheartedly agree, but I am not sure actual deadlocks can count as non-error cases. ;)

@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Oct 8, 2017

The idea of not throwing but returning a value that indicates deadlock is an interesting approach on this. It would certainly provide the caller with a bit of flexibility on handling that avoids having to handle a throw in the process.

On the part of use cases, there are some rather abstract academic use cases I can imagine where a deadlock is actually helpful but none that would apply in uses of this API. I would argue that such detection is definitely worthwhile. That said, reading through this, I definitely agree the discussion should be raised with tc39 as a possible new stage-0 proposal.

jasnell commented Oct 8, 2017

The idea of not throwing but returning a value that indicates deadlock is an interesting approach on this. It would certainly provide the caller with a bit of flexibility on handling that avoids having to handle a throw in the process.

On the part of use cases, there are some rather abstract academic use cases I can imagine where a deadlock is actually helpful but none that would apply in uses of this API. I would argue that such detection is definitely worthwhile. That said, reading through this, I definitely agree the discussion should be raised with tc39 as a possible new stage-0 proposal.

@erights

This comment has been minimized.

Show comment
Hide comment
@erights

erights Oct 8, 2017

What would a deterministic spec that said exactly which deadlocks are detected look like? (Only MUST and MUST NOT, no SHOULDs.) Which genuine deadlocks should we omit? Which are infeasible to detect on some platform? (and why?)

Note that I am not asking about whether the occurrences of the deadlocks could be made deterministic. They clearly cannot be.

erights commented Oct 8, 2017

What would a deterministic spec that said exactly which deadlocks are detected look like? (Only MUST and MUST NOT, no SHOULDs.) Which genuine deadlocks should we omit? Which are infeasible to detect on some platform? (and why?)

Note that I am not asking about whether the occurrences of the deadlocks could be made deterministic. They clearly cannot be.

@binji

This comment has been minimized.

Show comment
Hide comment
@binji

binji Oct 8, 2017

It may be the case in node that, if at the time of calling Atomics.wait, if the calling agent is the only with access to the underlying shared memory, no new agents that can access that shared memory may be spawned in the agent cluster until that Atomics.wait call returns. Is that true of all embeddings now?

At least in Chrome, it is currently possible to postMessage a SAB and not know immediately whether it will arrive in an agent that can receive it or not. (The issue comes from having transferable message ports and multiple processes; seems like other browsers should have the same issue, not sure if they have some clever way of dealing with this.) So at the time of a call to Atomics.wait, it is not clear whether it is a guaranteed deadlock or not. You could always explicitly check for this case ("is there an in-flight SAB?"), but I'd imagine it might be tricky to specify.

It's also worth mentioning that we are planning to support a wait instruction in WebAssembly as well (see the threads proposal). This uses a WebAssembly.Memory instead, but a SAB can be extracted from this, so we'd want to coordinate there too.

binji commented Oct 8, 2017

It may be the case in node that, if at the time of calling Atomics.wait, if the calling agent is the only with access to the underlying shared memory, no new agents that can access that shared memory may be spawned in the agent cluster until that Atomics.wait call returns. Is that true of all embeddings now?

At least in Chrome, it is currently possible to postMessage a SAB and not know immediately whether it will arrive in an agent that can receive it or not. (The issue comes from having transferable message ports and multiple processes; seems like other browsers should have the same issue, not sure if they have some clever way of dealing with this.) So at the time of a call to Atomics.wait, it is not clear whether it is a guaranteed deadlock or not. You could always explicitly check for this case ("is there an in-flight SAB?"), but I'd imagine it might be tricky to specify.

It's also worth mentioning that we are planning to support a wait instruction in WebAssembly as well (see the threads proposal). This uses a WebAssembly.Memory instead, but a SAB can be extracted from this, so we'd want to coordinate there too.

@lars-t-hansen

This comment has been minimized.

Show comment
Hide comment
@lars-t-hansen

lars-t-hansen Oct 9, 2017

Contributor

Atomics.wait() could, at least in many cases, and maybe optionally, could identify situations in which a deadlock is guaranteed to happen. In particular:

  • When the current Agent is the only one that has access to the underlying SharedArrayBuffer memory
  • When all other Agents that have access to the underlying SharedArrayBuffer memory are inside an Atomics.wait() call themselves

Those basically boil down to the same case (all agents sharing a SAB would be waiting), which is fine.

Deadlock detection was discussed a little bit during the design work, mostly in the context of tools though, I don't think we ever really talked about building it in. Partly this is because, on the web, the main thread can't block but will likely be the coordinating agent in many cases and responsible for distributing the SAB. As a consequence, there aren't likely to be many cases where deadlock can actually be detected - any SAB referenced by the main thread is effectively deadlock-free so far as any simple algorithm is concerned.

Suppose A and B share a SAB and for the sake of argument, both are allowed to wait. Then:

  • Suppose A goes away, or more interestingly, A GC's its copy of the SAB. Now B waits; this is a deadlock. Should we detect this? If so, @erights wants to talk to you about observing GC. Also, deadlock detection becomes racy, but this is just @binji's issue inverted, I think.
  • Suppose B waits. Then A GC's its copy of the SAB. Should B's wait now throw a deadlock exception?
  • Suppose A waits and B then waits; there's a deadlock. Do A and B both get an exception or just B? (Or just A?)
  • Suppose there's another shared SAB, and A waits on a location in the first SAB and B waits on a location in the second SAB. Do we detect this? (I think we'd almost have to, and I don't think it would necessarily be complicated, it just needs to be in there somehow.)

I suspect it would be more fruitful to work on a standard/popular locking library that could build in the notion of deadlocks by detecting lock cycles, perhaps. Most programs and programmers should not be using raw wait/wake calls in the first place. (For instance, I have a locking library at https://github.com/lars-t-hansen/js-lock-and-condition, though it does not detect deadlocks. On the other hand it has a notion of main-thread locks, so deadlock detection could become more meaningful.)

Contributor

lars-t-hansen commented Oct 9, 2017

Atomics.wait() could, at least in many cases, and maybe optionally, could identify situations in which a deadlock is guaranteed to happen. In particular:

  • When the current Agent is the only one that has access to the underlying SharedArrayBuffer memory
  • When all other Agents that have access to the underlying SharedArrayBuffer memory are inside an Atomics.wait() call themselves

Those basically boil down to the same case (all agents sharing a SAB would be waiting), which is fine.

Deadlock detection was discussed a little bit during the design work, mostly in the context of tools though, I don't think we ever really talked about building it in. Partly this is because, on the web, the main thread can't block but will likely be the coordinating agent in many cases and responsible for distributing the SAB. As a consequence, there aren't likely to be many cases where deadlock can actually be detected - any SAB referenced by the main thread is effectively deadlock-free so far as any simple algorithm is concerned.

Suppose A and B share a SAB and for the sake of argument, both are allowed to wait. Then:

  • Suppose A goes away, or more interestingly, A GC's its copy of the SAB. Now B waits; this is a deadlock. Should we detect this? If so, @erights wants to talk to you about observing GC. Also, deadlock detection becomes racy, but this is just @binji's issue inverted, I think.
  • Suppose B waits. Then A GC's its copy of the SAB. Should B's wait now throw a deadlock exception?
  • Suppose A waits and B then waits; there's a deadlock. Do A and B both get an exception or just B? (Or just A?)
  • Suppose there's another shared SAB, and A waits on a location in the first SAB and B waits on a location in the second SAB. Do we detect this? (I think we'd almost have to, and I don't think it would necessarily be complicated, it just needs to be in there somehow.)

I suspect it would be more fruitful to work on a standard/popular locking library that could build in the notion of deadlocks by detecting lock cycles, perhaps. Most programs and programmers should not be using raw wait/wake calls in the first place. (For instance, I have a locking library at https://github.com/lars-t-hansen/js-lock-and-condition, though it does not detect deadlocks. On the other hand it has a notion of main-thread locks, so deadlock detection could become more meaningful.)

@jfbastien

This comment has been minimized.

Show comment
Hide comment
@jfbastien

jfbastien Oct 9, 2017

@TimothyGu:

@jfbastien No (sane, at least) program would want to deadlock, period. I know this sounds a bit curt but I simply cannot identify a use case for deadlocking in such situations.

Sure, I agree there's little practical use to a deadlock. I'd still like to know why the proposed approach is the right one. Will a SAB user do something useful when a deadlock is detected? I'd venture that any deadlock handling code won't be tested because... well they'd have avoided the deadlock in the first place if they could.

At least from my POV, whenever I encounter a deadlock I know exactly how to debug it. Are you saying that having an exception would allow devs to collect backtraces on their user's machines and know "hey I have a deadlock somewhere"? That seems worthwhile, but I'd like you to detail these use cases.

@addaleax:

@jfbastien This is mostly coming from a huge appreciation of deadlock detection mechanisms in other MT libraries for other languages (edit to clarify: as a consumer of those libraries, I mean).

What was done in other MT libraries, along with their rationale, would be useful information to add to this proposal.

I realize Atomics.wait() is supposed to be a building block for low-level mechanisms, but I don’t think this particular thing could be implemented in userland as well as in the threading implementations itself

That claim is useful to add to the proposal.

– also, people are going to make mistakes.

Sure, but is your proposal going to help anyone when they do make a mistake? I'm not saying it won't, I even propose something useful above, I'm asking you to explain why anything other than a deadlock is better. I'm asking because I'd vote strongly against any proposal that makes matters worst, or which doesn't claim to improve anything.

Also remember than the browser is allowed to terminate pages. ECMAScript can say what it wants, but an implementation which seems unresponsive can get terminated. I think you need to make the case that throwing is better than a deadlock or page termination.

Lastly, please also consider the interaction with non-JavaScript agents as mentioned by others above. In a browser we expect SAB and WebAssembly to communicate with each other, and be able to wait on each other. I'd like to make sure that what you proposal won't break down in such a setup.

jfbastien commented Oct 9, 2017

@TimothyGu:

@jfbastien No (sane, at least) program would want to deadlock, period. I know this sounds a bit curt but I simply cannot identify a use case for deadlocking in such situations.

Sure, I agree there's little practical use to a deadlock. I'd still like to know why the proposed approach is the right one. Will a SAB user do something useful when a deadlock is detected? I'd venture that any deadlock handling code won't be tested because... well they'd have avoided the deadlock in the first place if they could.

At least from my POV, whenever I encounter a deadlock I know exactly how to debug it. Are you saying that having an exception would allow devs to collect backtraces on their user's machines and know "hey I have a deadlock somewhere"? That seems worthwhile, but I'd like you to detail these use cases.

@addaleax:

@jfbastien This is mostly coming from a huge appreciation of deadlock detection mechanisms in other MT libraries for other languages (edit to clarify: as a consumer of those libraries, I mean).

What was done in other MT libraries, along with their rationale, would be useful information to add to this proposal.

I realize Atomics.wait() is supposed to be a building block for low-level mechanisms, but I don’t think this particular thing could be implemented in userland as well as in the threading implementations itself

That claim is useful to add to the proposal.

– also, people are going to make mistakes.

Sure, but is your proposal going to help anyone when they do make a mistake? I'm not saying it won't, I even propose something useful above, I'm asking you to explain why anything other than a deadlock is better. I'm asking because I'd vote strongly against any proposal that makes matters worst, or which doesn't claim to improve anything.

Also remember than the browser is allowed to terminate pages. ECMAScript can say what it wants, but an implementation which seems unresponsive can get terminated. I think you need to make the case that throwing is better than a deadlock or page termination.

Lastly, please also consider the interaction with non-JavaScript agents as mentioned by others above. In a browser we expect SAB and WebAssembly to communicate with each other, and be able to wait on each other. I'd like to make sure that what you proposal won't break down in such a setup.

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Oct 19, 2017

I’m closing this. It seems determinism is a must-have for this feature, and that restricts it too much for what I had in mind – if anybody else thinks this is a good idea and wants to push it, feel free to do so. :)

addaleax commented Oct 19, 2017

I’m closing this. It seems determinism is a must-have for this feature, and that restricts it too much for what I had in mind – if anybody else thinks this is a good idea and wants to push it, feel free to do so. :)

@addaleax addaleax closed this Oct 19, 2017

@addaleax addaleax referenced this issue May 24, 2018

Closed

worker: initial implementation #20876

4 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment