fix: prevent deadlock on concurrent push and disconnect (CP: 25.0)#24216
fix: prevent deadlock on concurrent push and disconnect (CP: 25.0)#24216
Conversation
The previous `AtomicBoolean` guard in `AtmospherePushConnection` closed only the disconnect-vs-disconnect race. A push thread that reads `disconnecting` as false before a concurrent `disconnect()` flips it can still proceed into `synchronized(lock)` behind the disconnect thread, which is itself blocked inside `resource.close()` waiting for the servlet container's HTTP session lock held by the push thread — a two-lock cycle. Move `resource.close()` out of the monitor: inside `synchronized(lock)` capture the resource into a local and call `connectionLost()` to transition the state, then release the monitor before invoking `close()` on the stashed reference. Add a matching re-check of `isConnected()` at the top of the `synchronized` block in `push()` so a push that waited for the monitor observes the late disconnect and defers via `PUSH_PENDING`/`RESPONSE_PENDING` instead of NPEing on the cleared resource. The `disconnecting` flag stays set until `close()` returns so subsequent pushes take the fast path and no new `disconnect()` re-enters while `close()` is still in flight. Related-to #24192
|
|
Hi @mcollovati and @mshabarov, when i performed cherry-pick to this commit to 23.7, i have encountered the following issue. Can you take a look and pick it manually? |
|
Hi @mcollovati and @mshabarov, when i performed cherry-pick to this commit to 23.6, i have encountered the following issue. Can you take a look and pick it manually? |
…24216) (CP: 24.10) (#24225) This PR cherry-picks changes from the original PR #24216 to branch 24.10. --- #### Original PR description > The previous `AtomicBoolean` guard in `AtmospherePushConnection` closed only the disconnect-vs-disconnect race. A push thread that reads `disconnecting` as false before a concurrent `disconnect()` flips it can still proceed into `synchronized(lock)` behind the disconnect thread, which is itself blocked inside `resource.close()` waiting for the servlet container's HTTP session lock held by the push thread — a two-lock cycle. > > Move `resource.close()` out of the monitor: inside `synchronized(lock)` capture the resource into a local and call `connectionLost()` to transition the state, then release the monitor before invoking `close()` on the stashed reference. Add a matching re-check of `isConnected()` at the top of the `synchronized` block in `push()` so a push that waited for the monitor observes the late disconnect and defers via `PUSH_PENDING`/`RESPONSE_PENDING` instead of NPEing on the cleared resource. The `disconnecting` flag stays set until `close()` returns so subsequent pushes take the fast path and no new `disconnect()` re-enters while `close()` is still in flight. > > Related-to #24192 > --------- Co-authored-by: Marco Collovati <marco@vaadin.com>
…24216) (CP: 24.9) (#24226) This PR cherry-picks changes from the original PR #24216 to branch 24.9. --- #### Original PR description > The previous `AtomicBoolean` guard in `AtmospherePushConnection` closed only the disconnect-vs-disconnect race. A push thread that reads `disconnecting` as false before a concurrent `disconnect()` flips it can still proceed into `synchronized(lock)` behind the disconnect thread, which is itself blocked inside `resource.close()` waiting for the servlet container's HTTP session lock held by the push thread — a two-lock cycle. > > Move `resource.close()` out of the monitor: inside `synchronized(lock)` capture the resource into a local and call `connectionLost()` to transition the state, then release the monitor before invoking `close()` on the stashed reference. Add a matching re-check of `isConnected()` at the top of the `synchronized` block in `push()` so a push that waited for the monitor observes the late disconnect and defers via `PUSH_PENDING`/`RESPONSE_PENDING` instead of NPEing on the cleared resource. The `disconnecting` flag stays set until `close()` returns so subsequent pushes take the fast path and no new `disconnect()` re-enters while `close()` is still in flight. > > Related-to #24192 > --------- Co-authored-by: Marco Collovati <marco@vaadin.com>



The previous
AtomicBooleanguard inAtmospherePushConnectionclosed only the disconnect-vs-disconnect race. A push thread that readsdisconnectingas false before a concurrentdisconnect()flips it can still proceed intosynchronized(lock)behind the disconnect thread, which is itself blocked insideresource.close()waiting for the servlet container's HTTP session lock held by the push thread — a two-lock cycle.Move
resource.close()out of the monitor: insidesynchronized(lock)capture the resource into a local and callconnectionLost()to transition the state, then release the monitor before invokingclose()on the stashed reference. Add a matching re-check ofisConnected()at the top of thesynchronizedblock inpush()so a push that waited for the monitor observes the late disconnect and defers viaPUSH_PENDING/RESPONSE_PENDINGinstead of NPEing on the cleared resource. Thedisconnectingflag stays set untilclose()returns so subsequent pushes take the fast path and no newdisconnect()re-enters whileclose()is still in flight.Related-to #24192