Skip to content

fix: prevent deadlock on concurrent push and disconnect (CP: 25.0)#24216

Merged
mshabarov merged 1 commit into25.0from
fix/pushconnection-deadlock-25.0
Apr 30, 2026
Merged

fix: prevent deadlock on concurrent push and disconnect (CP: 25.0)#24216
mshabarov merged 1 commit into25.0from
fix/pushconnection-deadlock-25.0

Conversation

@mcollovati
Copy link
Copy Markdown
Collaborator

The previous AtomicBoolean guard in AtmospherePushConnection closed only the disconnect-vs-disconnect race. A push thread that reads disconnecting as false before a concurrent disconnect() flips it can still proceed into synchronized(lock) behind the disconnect thread, which is itself blocked inside resource.close() waiting for the servlet container's HTTP session lock held by the push thread — a two-lock cycle.

Move resource.close() out of the monitor: inside synchronized(lock) capture the resource into a local and call connectionLost() to transition the state, then release the monitor before invoking close() on the stashed reference. Add a matching re-check of isConnected() at the top of the synchronized block in push() so a push that waited for the monitor observes the late disconnect and defers via PUSH_PENDING/RESPONSE_PENDING instead of NPEing on the cleared resource. The disconnecting flag stays set until close() returns so subsequent pushes take the fast path and no new disconnect() re-enters while close() is still in flight.

Related-to #24192

The previous `AtomicBoolean` guard in `AtmospherePushConnection`
closed only the disconnect-vs-disconnect race. A push thread that
reads `disconnecting` as false before a concurrent `disconnect()`
flips it can still proceed into `synchronized(lock)` behind the
disconnect thread, which is itself blocked inside `resource.close()`
waiting for the servlet container's HTTP session lock held by the
push thread — a two-lock cycle.

Move `resource.close()` out of the monitor: inside
`synchronized(lock)` capture the resource into a local and call
`connectionLost()` to transition the state, then release the monitor
before invoking `close()` on the stashed reference. Add a matching
re-check of `isConnected()` at the top of the `synchronized` block
in `push()` so a push that waited for the monitor observes the late
disconnect and defers via `PUSH_PENDING`/`RESPONSE_PENDING` instead
of NPEing on the cleared resource. The `disconnecting` flag stays
set until `close()` returns so subsequent pushes take the fast path
and no new `disconnect()` re-enters while `close()` is still in
flight.

Related-to #24192
@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 29, 2026

Test Results

1 318 files  ±0  1 318 suites  ±0   1h 13m 42s ⏱️ -35s
9 412 tests +1  9 345 ✅ +1  67 💤 ±0  0 ❌ ±0 
9 868 runs  ±0  9 794 ✅ +1  74 💤  - 1  0 ❌ ±0 

Results for commit 38a8a0a. ± Comparison against base commit 062eb66.

♻️ This comment has been updated with latest results.

@mshabarov mshabarov merged commit 137826f into 25.0 Apr 30, 2026
48 of 50 checks passed
@mshabarov mshabarov deleted the fix/pushconnection-deadlock-25.0 branch April 30, 2026 11:42
@vaadin-bot
Copy link
Copy Markdown
Collaborator

Hi @mcollovati and @mshabarov, when i performed cherry-pick to this commit to 23.7, i have encountered the following issue. Can you take a look and pick it manually?
Error Message:
Error: Command failed: git cherry-pick 137826f
error: could not apply 137826f... fix: prevent deadlock on concurrent push and disconnect (CP: 25.0) (#24216)
hint: After resolving the conflicts, mark them with
hint: "git add/rm ", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".

@vaadin-bot
Copy link
Copy Markdown
Collaborator

Hi @mcollovati and @mshabarov, when i performed cherry-pick to this commit to 23.6, i have encountered the following issue. Can you take a look and pick it manually?
Error Message:
Error: Command failed: git cherry-pick 137826f
error: could not apply 137826f... fix: prevent deadlock on concurrent push and disconnect (CP: 25.0) (#24216)
hint: After resolving the conflicts, mark them with
hint: "git add/rm ", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".

vaadin-bot added a commit that referenced this pull request Apr 30, 2026
…24216) (CP: 24.10) (#24225)

This PR cherry-picks changes from the original PR #24216 to branch
24.10.
---
#### Original PR description
> The previous `AtomicBoolean` guard in `AtmospherePushConnection`
closed only the disconnect-vs-disconnect race. A push thread that reads
`disconnecting` as false before a concurrent `disconnect()` flips it can
still proceed into `synchronized(lock)` behind the disconnect thread,
which is itself blocked inside `resource.close()` waiting for the
servlet container's HTTP session lock held by the push thread — a
two-lock cycle.
> 
> Move `resource.close()` out of the monitor: inside
`synchronized(lock)` capture the resource into a local and call
`connectionLost()` to transition the state, then release the monitor
before invoking `close()` on the stashed reference. Add a matching
re-check of `isConnected()` at the top of the `synchronized` block in
`push()` so a push that waited for the monitor observes the late
disconnect and defers via `PUSH_PENDING`/`RESPONSE_PENDING` instead of
NPEing on the cleared resource. The `disconnecting` flag stays set until
`close()` returns so subsequent pushes take the fast path and no new
`disconnect()` re-enters while `close()` is still in flight.
> 
> Related-to #24192
>

---------

Co-authored-by: Marco Collovati <marco@vaadin.com>
vaadin-bot added a commit that referenced this pull request Apr 30, 2026
…24216) (CP: 24.9) (#24226)

This PR cherry-picks changes from the original PR #24216 to branch 24.9.
---
#### Original PR description
> The previous `AtomicBoolean` guard in `AtmospherePushConnection`
closed only the disconnect-vs-disconnect race. A push thread that reads
`disconnecting` as false before a concurrent `disconnect()` flips it can
still proceed into `synchronized(lock)` behind the disconnect thread,
which is itself blocked inside `resource.close()` waiting for the
servlet container's HTTP session lock held by the push thread — a
two-lock cycle.
> 
> Move `resource.close()` out of the monitor: inside
`synchronized(lock)` capture the resource into a local and call
`connectionLost()` to transition the state, then release the monitor
before invoking `close()` on the stashed reference. Add a matching
re-check of `isConnected()` at the top of the `synchronized` block in
`push()` so a push that waited for the monitor observes the late
disconnect and defers via `PUSH_PENDING`/`RESPONSE_PENDING` instead of
NPEing on the cleared resource. The `disconnecting` flag stays set until
`close()` returns so subsequent pushes take the fast path and no new
`disconnect()` re-enters while `close()` is still in flight.
> 
> Related-to #24192
>

---------

Co-authored-by: Marco Collovati <marco@vaadin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants