fix(workloadmanager): use detached context for store delete and add missing log#321
Conversation
…issing log Signed-off-by: Abhinav-kodes <183825080+Abhinav-kodes@users.noreply.github.com>
9371b20 to
3476879
Compare
There was a problem hiding this comment.
Code Review
This pull request modifies the handleDeleteSandbox handler to use a detached context with a 30-second timeout when deleting a sandbox from the store, preventing orphaned entries upon client disconnection. Additionally, error logging was added for these deletion failures. The reviewer suggested improving the observability of the new error log by including the sandbox name and namespace, ensuring consistency with other log messages in the file.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #321 +/- ##
==========================================
+ Coverage 47.57% 48.14% +0.57%
==========================================
Files 30 30
Lines 2819 2858 +39
==========================================
+ Hits 1341 1376 +35
+ Misses 1338 1329 -9
- Partials 140 153 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f63c059 to
b87f038
Compare
…error log Signed-off-by: Abhinav-kodes <183825080+Abhinav-kodes@users.noreply.github.com> Signed-off-by: Abhinav Singh <abhinavsingh717073@gmail.com>
8023b86 to
6e7735c
Compare
|
@hzxuzhonghu , I have applied all the changes suggested by copilot and the pr is ready to be merged from my end |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hzxuzhonghu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
hzxuzhonghu
left a comment
There was a problem hiding this comment.
The detached context for store delete is the right call — good catch that a client disconnect post-K8s-deletion would otherwise orphan the store entry. The error log before the 500 response is also a welcome addition.
|
|
||
| require.True(t, storeDeleteCalled, "DeleteSandboxBySessionID should be called even if the request context is canceled") | ||
| require.Equal(t, http.StatusOK, w.Code) | ||
| } No newline at end of file |
There was a problem hiding this comment.
File is missing a trailing newline — most editors and gofmt/goimports will flag this. Easy to miss in a PR diff.
| @@ -475,3 +475,59 @@ func TestHandleSandboxCreate(t *testing.T) { | |||
| }) | |||
| } | |||
| } | |||
There was a problem hiding this comment.
Go convention is to use // line comments for inline documentation, even for multi-line blocks. This /* ... */ block comment is a bit unusual in Go test files — would be cleaner as a series of // lines above the function.
What type of PR is this?
/kind bug
What this PR does / why we need it:
handleDeleteSandbox uses c.Request.Context() for both the K8s deletion and
the subsequent store deletion. If the client disconnects after the K8s
resource is successfully deleted but before DeleteSandboxBySessionID runs,
the request context is already canceled. The store call fails instantly,
leaving a stale entry permanently pointing to a K8s resource that no longer
exists. Future GET or DELETE calls for that sessionID will return stale data
or fail with a misleading error.
This PR fixes the issue by using a detached context.WithTimeout for the store
delete, matching the pattern already established in rollbackSandboxCreation.
It also adds a missing klog.Errorf before the respondError on store delete
failure (Every other store error in this handler logs before responding, but
this path was silently swallowing the error, making store failures during
deletion invisible in production diagnostics).
Special notes for your reviewer:
The detached context uses a 30s timeout, consistent with rollbackSandboxCreation
which uses the same value for identical store cleanup operations.
The K8s deletion calls intentionally retain c.Request.Context(). A client
disconnect does not cancel an already-dispatched K8s API call server-side,
so the request context is appropriate there. Only the store write after
K8s deletion is at risk from a canceled context.
These two fixes are independent but co-located in the same function, so
they are included in a single PR to keep the diff minimal.
Does this PR introduce a user-facing change?: