New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probe checks don't consider timeout values #296
Comments
Hey! This topic is recurring, and I'm a bit torn about it. For me, it's important that kube-score only recommends changes if that changes has a high chance of increasing the reliability and availability of the overall system. The general approach that I've taken with regards to livenessProbes recently is that, you should probably not have one for the vast majority of workloads. And if you do need one, and understand the risks associated, you can easily add it in a way that is compliant with kube-score. I think that there are a few problematic issues with re-using the probes, even if the time-to-fail is longer: 1: Before startupProbes where available, it was a fairly common pattern to use a failing readinessProbe during app-boot to allow it to warmup before accepting traffic. If this pattern is used in combination with an identical livenessProbe, even if the time-to-fail is used, the system is at risk of failing the livenessProbe if the warmup takes longer than usual. 2: The probes might "accidentally" depend on an external system, such as a database, or another service, in their probe. When this is done in a readinessProbe the downsides are manageable, the Pods/Endpoints will be removed from the Service, but the containers are still there and ready to serve traffic immediately once the probe is succeeding again. If the livenessProbe has the same "accidental" behavior, the containers will be restarted, which will in turn lead to more downtime than expected, as the containers are not ready to serve traffic once the connections to the downstreams are healthy again. Many applications that are not built explicitly for Kubernetes, only expose one type of health check. In those cases I believe that it's tempting to re-use the health check endpoint for both types of probes, without putting too much thought into why, or understanding the risks involved. 3: If the application handles liveness and readiness properly, and the operator knows the risks associated, it's easy to add a very simple liveness endpoint to almost any application, as it should do as little as possible. 4: If the kube-score recommendation doesn't fit your workload, you can add a This is of course a highly opinionated topic, and it's hard to make general recommendations for very specific issues. I try to make sure that kube-score makes good recommendations that are easy to understand for everyone, and that I can personally feel comfortable recommending. I don't want a kube-score recommendation to be what's causing an incident somewhere. / Gustav |
Thank you for the detailed response. This definitely seems like a topic of debate. You make a lot of great points, and I totally understand your perspective with how the tooling should behave. I'm going to close this ticket, but FYI I did notice one quirk with Thanks again. |
Which version of kube-score are you using?
What did you do?
When defining a
livenessProbe
that uses the same command as areadinessProbe
but with longer timeouts, kube-score generates aCRITICAL
For instance:
What did you expect to see?
Based on the recommendations at https://srcco.de/posts/kubernetes-liveness-probes-are-dangerous.html that it's OK to use the same command, but with longer timeout values, I'd expect this to not be a critical error.
What did you see instead?
The text was updated successfully, but these errors were encountered: