-
Notifications
You must be signed in to change notification settings - Fork 40.9k
kubelet: enhance exec probe logging with pod and container context #132744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
/cc @bart0sh @SergeyKanzhelev @dims |
/cc @pacoxu |
/cc @tzneal @NoicFank @bart0sh |
@bart0sh Would you have some time to review this PR? Thank you! |
@@ -152,7 +152,7 @@ func (pb *prober) runProbe(ctx context.Context, probeType probeType, p *v1.Probe | |||
case p.Exec != nil: | |||
klog.V(4).InfoS("Exec-Probe runProbe", "pod", klog.KObj(pod), "containerName", container.Name, "execCommand", p.Exec.Command) | |||
command := kubecontainer.ExpandContainerCommandOnlyStatic(p.Exec.Command, container.Env) | |||
return pb.exec.Probe(pb.newExecInContainer(ctx, container, containerID, command, timeout)) | |||
return pb.exec.Probe(pb.newExecInContainer(ctx, pod, container, containerID, command, timeout)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this only implemented for exec probes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other Probes, when they fail, they return an error and the upper-level caller logs it with the podName
and containerName
. However, when an exec
Probe succeeds, it writes the command output to the data stream. If eic.writer.Write(data)
returns an error, the resulting log doesn't include the podName
or containerName
.
kubernetes/pkg/kubelet/prober/prober.go
Lines 267 to 276 in 6ecec84
func (eic *execInContainer) Start() error { | |
data, err := eic.run() | |
if eic.writer != nil { | |
// only record the write error, do not cover the command run error | |
if p, err := eic.writer.Write(data); err != nil { | |
klog.ErrorS(err, "Unable to write all bytes from execInContainer", "expectedBytes", len(data), "actualBytes", p) | |
} | |
} | |
return err | |
} |
Errors from other Probes are consistently logged at this point.
kubernetes/pkg/kubelet/prober/prober.go
Lines 100 to 107 in 6ecec84
result, output, err := pb.runProbeWithRetries(ctx, probeType, probeSpec, pod, status, container, containerID, maxProbeRetries) | |
if err != nil { | |
// Handle probe error | |
klog.V(1).ErrorS(err, "Probe errored", "probeType", probeType, "pod", klog.KObj(pod), "podUID", pod.UID, "containerName", container.Name, "probeResult", result) | |
pb.recordContainerEvent(pod, &container, v1.EventTypeWarning, events.ContainerUnhealthy, "%s probe errored and resulted in %s state: %s", probeType, result, err) | |
return results.Failure, err | |
} |
pkg/kubelet/prober/prober.go
Outdated
@@ -253,7 +259,7 @@ func (eic *execInContainer) Start() error { | |||
if eic.writer != nil { | |||
// only record the write error, do not cover the command run error | |||
if p, err := eic.writer.Write(data); err != nil { | |||
klog.ErrorS(err, "Unable to write all bytes from execInContainer", "expectedBytes", len(data), "actualBytes", p) | |||
klog.ErrorS(err, "Unable to write all bytes from execInContainer", "expectedBytes", len(data), "actualBytes", p, "pod", klog.KObj(eic.pod), "containerName", eic.containerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't have to cascade the pod object here, just get the error in https://github.com/kubernetes/kubernetes/pull/132744/files#r2197504163 and log out there, don't create this dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea but write error is not returned from the execInContainer::Start()
. The probe can be successful with unsuccessful write.
kubernetes/pkg/kubelet/prober/prober.go
Lines 267 to 276 in 6ecec84
func (eic *execInContainer) Start() error { | |
data, err := eic.run() | |
if eic.writer != nil { | |
// only record the write error, do not cover the command run error | |
if p, err := eic.writer.Write(data); err != nil { | |
klog.ErrorS(err, "Unable to write all bytes from execInContainer", "expectedBytes", len(data), "actualBytes", p) | |
} | |
} | |
return err | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't have to cascade the pod object here, just get the error in https://github.com/kubernetes/kubernetes/pull/132744/files#r2197504163 and log out there, don't create this dependency
When an exec Probe succeeds but eic.writer.Write(data)
fails, the error isn't propagated to the upper-level function runProbeWithRetries
, so no error log is printed there. That's why this log should include the podName
and containerName
.
kubernetes/pkg/kubelet/prober/prober.go
Lines 100 to 107 in 6ecec84
result, output, err := pb.runProbeWithRetries(ctx, probeType, probeSpec, pod, status, container, containerID, maxProbeRetries) | |
if err != nil { | |
// Handle probe error | |
klog.V(1).ErrorS(err, "Probe errored", "probeType", probeType, "pod", klog.KObj(pod), "podUID", pod.UID, "containerName", container.Name, "probeResult", result) | |
pb.recordContainerEvent(pod, &container, v1.EventTypeWarning, events.ContainerUnhealthy, "%s probe errored and resulted in %s state: %s", probeType, result, err) | |
return results.Failure, err | |
} |
you don't have to modify the existing struct and methods, and check the returned value on the probe execution to log |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: xigang The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8d737c1
to
6ecec84
Compare
Just in case my message in threads does not get enough attention:
kubernetes/pkg/kubelet/prober/prober.go Lines 267 to 276 in 6ecec84
Given that Eg:
☝️ this would be a successful probe with failing writer. |
ddc00a7
to
8a46065
Compare
Signed-off-by: xigang <wangxigang2014@gmail.com>
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
This improves debugging capabilities by providing more context in exec probe error messages, making it easier to identify which specific pod and container encountered issues during probe execution.
Which issue(s) this PR is related to:
#132714
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: