Skip to content

fix(workloadmanager): use detached context for store delete and add missing log#321

Merged
volcano-sh-bot merged 2 commits into
volcano-sh:mainfrom
Abhinav-kodes:fix-delete-sandbox-context
May 13, 2026
Merged

fix(workloadmanager): use detached context for store delete and add missing log#321
volcano-sh-bot merged 2 commits into
volcano-sh:mainfrom
Abhinav-kodes:fix-delete-sandbox-context

Conversation

@Abhinav-kodes
Copy link
Copy Markdown
Contributor

@Abhinav-kodes Abhinav-kodes commented May 11, 2026

What type of PR is this?

/kind bug

What this PR does / why we need it:
handleDeleteSandbox uses c.Request.Context() for both the K8s deletion and
the subsequent store deletion. If the client disconnects after the K8s
resource is successfully deleted but before DeleteSandboxBySessionID runs,
the request context is already canceled. The store call fails instantly,
leaving a stale entry permanently pointing to a K8s resource that no longer
exists. Future GET or DELETE calls for that sessionID will return stale data
or fail with a misleading error.

This PR fixes the issue by using a detached context.WithTimeout for the store
delete, matching the pattern already established in rollbackSandboxCreation.

It also adds a missing klog.Errorf before the respondError on store delete
failure (Every other store error in this handler logs before responding, but
this path was silently swallowing the error, making store failures during
deletion invisible in production diagnostics).

Special notes for your reviewer:
The detached context uses a 30s timeout, consistent with rollbackSandboxCreation
which uses the same value for identical store cleanup operations.

The K8s deletion calls intentionally retain c.Request.Context(). A client
disconnect does not cancel an already-dispatched K8s API call server-side,
so the request context is appropriate there. Only the store write after
K8s deletion is at risk from a canceled context.

These two fixes are independent but co-located in the same function, so
they are included in a single PR to keep the diff minimal.

Does this PR introduce a user-facing change?:

Fixed a bug where a client disconnect during sandbox deletion could leave
a stale store entry after the Kubernetes resource was already deleted,
causing subsequent delete or lookup calls for that session to fail or
return incorrect state.

Copilot AI review requested due to automatic review settings May 11, 2026 16:04
…issing log

Signed-off-by: Abhinav-kodes <183825080+Abhinav-kodes@users.noreply.github.com>
@Abhinav-kodes Abhinav-kodes force-pushed the fix-delete-sandbox-context branch from 9371b20 to 3476879 Compare May 11, 2026 16:06
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the handleDeleteSandbox handler to use a detached context with a 30-second timeout when deleting a sandbox from the store, preventing orphaned entries upon client disconnection. Additionally, error logging was added for these deletion failures. The reviewer suggested improving the observability of the new error log by including the sandbox name and namespace, ensuring consistency with other log messages in the file.

Comment thread pkg/workloadmanager/handlers.go Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 11, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 48.14%. Comparing base (524e55e) to head (6e7735c).
⚠️ Report is 41 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloadmanager/handlers.go 80.00% 1 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #321      +/-   ##
==========================================
+ Coverage   47.57%   48.14%   +0.57%     
==========================================
  Files          30       30              
  Lines        2819     2858      +39     
==========================================
+ Hits         1341     1376      +35     
+ Misses       1338     1329       -9     
- Partials      140      153      +13     
Flag Coverage Δ
unittests 48.14% <80.00%> (+0.57%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread pkg/workloadmanager/handlers.go
Comment thread pkg/workloadmanager/handlers.go Outdated
Comment thread pkg/workloadmanager/handlers.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread pkg/workloadmanager/handlers_test.go
Comment thread pkg/workloadmanager/handlers.go
Copilot AI review requested due to automatic review settings May 12, 2026 10:27
…error log

Signed-off-by: Abhinav-kodes <183825080+Abhinav-kodes@users.noreply.github.com>
Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@Abhinav-kodes Abhinav-kodes force-pushed the fix-delete-sandbox-context branch from 8023b86 to 6e7735c Compare May 12, 2026 10:30
@Abhinav-kodes
Copy link
Copy Markdown
Contributor Author

@hzxuzhonghu , I have applied all the changes suggested by copilot and the pr is ready to be merged from my end

@hzxuzhonghu
Copy link
Copy Markdown
Member

/lgtm
/approve

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot merged commit ced2f28 into volcano-sh:main May 13, 2026
14 checks passed
@Abhinav-kodes Abhinav-kodes changed the title fix(workloadmanager): use detached context for store delete and add m… fix(workloadmanager): use detached context for store delete and add missing log May 13, 2026
Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detached context for store delete is the right call — good catch that a client disconnect post-K8s-deletion would otherwise orphan the store entry. The error log before the 500 response is also a welcome addition.


require.True(t, storeDeleteCalled, "DeleteSandboxBySessionID should be called even if the request context is canceled")
require.Equal(t, http.StatusOK, w.Code)
} No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is missing a trailing newline — most editors and gofmt/goimports will flag this. Easy to miss in a PR diff.

@@ -475,3 +475,59 @@ func TestHandleSandboxCreate(t *testing.T) {
})
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go convention is to use // line comments for inline documentation, even for multi-line blocks. This /* ... */ block comment is a bit unusual in Go test files — would be cleaner as a series of // lines above the function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants