-
Notifications
You must be signed in to change notification settings - Fork 1.5k
KEP-5311: Relaxed validation for Services names #5315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-5311: Relaxed validation for Services names #5315
Conversation
Skipping CI for Draft Pull Request. |
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
b4d56a5
to
cd35944
Compare
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
## Summary | ||
|
||
This KEP proposes a relaxation of the Service name in order bring it in line | ||
with the validation requirements of other resources in Kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the validation requirements of other resources in Kubernetes. | |
with the validation requirements of names of other resources in Kubernetes. |
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
|
||
- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/e2e/...): [SIG ...](https://testgrid.k8s.io/sig-...?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature) | ||
|
||
### Graduation Criteria |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should talk more here about the validation you talked about under "Risks and Mitigations". At what point will that happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, I don't know. I may need some guidance here.
The risks internal to k/k will be handled with integration and e2e tests (ie: the Ingress resource)
As for the risks external to k/k (consumers of Services and Ingress resources), I'm not really sure. Is there something this KEP should call out?
I remember seeing another KEP mentioning waiting for enough clouds to get onboard with the idea before graduating to GA, so I assume something similar needs to happen here, but I'm not really sure how to reasonability approach this.
- Verifying that DNS providers used in Kubernetes clusters can handle the new service name format. | ||
- Running integration tests to ensure that dependent components such as Ingress controllers and service discovery mechanisms function correctly with the updated validation. | ||
2. The Ingress resource references Service and will also need a relaxed validation on its reference to the Service | ||
3. Downstream applications may perform validation on fields relating to Service names (ie: [ingress-ngnix](https://github.com/kubernetes/ingress-nginx/blob/d3ab5efd54f38f2b7c961024553b0ad060e2e916/internal/ingress/annotations/parser/validators.go#L190-L197) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should this risk be mitigated?
For ingress-nginx I'm happy to make a PR to change its validation, but that assumes it will one day bump to using k8s.io/apimachinery
0.34.0 (or higher).
What about the unknowns? There could be other service meshes, Ingress controllers, etc that are doing their own validation. Is it in scope of this KEP to identify those and get them fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding the code, it says that the value of the default-backend
annotation has to be a DNS1035Label
. So that just means that if you create a Service with a DNS1123Label
name, you won't be able to refer to that Service from an ingress-nginx default-backend
annotation (until they change the ingress-nginx validation). So this isn't really a big deal; you can just not use a 1123 name in that case.
doing their own validation. Is it in scope of this KEP to identify those and get them fixed?
If there are problems, they're probably not going to be because of people doing conflicting validation, they're going to be because of people putting Service names into contexts where an initial digit would screw things up.
Like, for instance, if you write out a YAML file by hand, and put the service name into a field, currently it is guaranteed that that field would get parsed as a string. But with this KEP, it would be possible to have a service whose name was entirely numeric, so if you wrote that out, the consumer of the YAML would find an integer value there rather than a string value and probably break.
Yeah, yeah, you could break it before with a service named "true" too. Yay YAML. Anyway, you get the idea though. People might use service names in contexts where the newly-allowed characters could cause problems.
And there's no way you could find all such places, since this could just be in a script that some random k8s admin somewhere wrote.
So, this is definitely something to worry about, but in the end, we have to just be like "eh... probably not going to happen".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding the code, it says that the value of the
default-backend
annotation has to be aDNS1035Label
. So that just means that if you create a Service with aDNS1123Label
name, you won't be able to refer to that Service from an ingress-nginxdefault-backend
annotation (until they change the ingress-nginx validation). So this isn't really a big deal; you can just not use a 1123 name in that case.
🤦 You're right. I found the function and just assumed how it would be used, without actually reading the context. Thanks for making up for my laziness :/
So, this is definitely something to worry about, but in the end, we have to just be like "eh... probably not going to happen".
If I'm hearing you right, while a concern, it's really out of scope of this KEP?
18f8bf6
to
3ba04c6
Compare
dcec20b
to
049bcb9
Compare
I think this is in a good place for another review. |
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
- Creation of Services will use the previous `NameIsDNS1035Label()` validation for `.metadata.name` | ||
- Updates of Services doesn't change, as the `metada.name` field is immutable | ||
- Creation of Ingress will use the previous `NameIsDNS1035Label()` validation for `.spec.[]rules.http.[]paths.backend.service.name` | ||
- Updates of Ingress will use the new `NameIsDNSLabel()` validation for `.spec.[]rules.http.[]paths.backend.service.name` if that field changes, otherwise, there is no validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Updates of Ingress will use the new `NameIsDNSLabel()` validation for `.spec.[]rules.http.[]paths.backend.service.name` if that field changes, otherwise, there is no validation | |
- Updates of Ingress will use the previous `NameIsDNS1035Label()` validation for `.spec.[]rules.http.[]paths.backend.service.name` if that field changes, otherwise, there is no validation |
What this is talking about is:
- User upgrades to 1.34 and enables the feature gate.
- User creates a new Service
5311-kep-example
and an new Ingresskep-example
that points to that Service via.spec.[]rules.http.[]paths.backend.service.name
- User disables the feature gate
- At this point, Services can't have non-1035 names, and Ingresses aren't allowed to refer to non-1035 names, but the user already has a non-1035-named Service and an Ingress that refers to it, and we don't want them to just break. So:
- We let the user make updates to the
5311-kep-example
Service, even though that Service would be considered invalid if they tried to create it now. - We let the user make updates to the
kep-example
Ingress as long as they don't modify the rule referring to5311-kep-example
. - We let the user update the
kep-example
Ingress to fix the5311-kep-example
rule to refer tokep-example
instead (because we would validate the modified rule withNameIsDNS1035Label
, and find it valid). - We don't let the user update the
kep-example
Ingress to add a new rule referring to5311-kep-example
(because we would validate the new rule withNameIsDNS1035Label
and find it invalid).
- We let the user make updates to the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usual pattern is to check the gate in pkg/registry/.../strategy.go and pass a flag or "opts" struct into the Validation function. Then validation checks the flag/opts and changes its behavior based on that. Then you can test both paths easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks. I think I had assumed that once a field was using the new validation, it should continue to use that validation, but ye, I see what you're saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify - Service name is immutable so you can just skip validation of it on updates. The field referenced here SHOULD continue to use the original validation. We don't want saved objects to suddenly become invalid.
I said "check the gate in pkg/registry/.../strategy.go" what I really meant was to apply the usual logic of "if the gate is set OR the field is already using the gated functionality". Put another way: if the gate is off, the door is shut, but anyone who is already inside is allowed to remain inside. :)
In declarative validation this is being implemented as "if newVal == oldVal, skip validation entirely". But that requires plumbing oldVal into every validation function. In most handwritten code it becomes somethign like:
opts.ServiceNameRelaxed = <read feature gate>
if isValidDNSLabel(field) && !isValidDNS1035Label) {
opts.ServiceNameRelaxed = true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My response was to Dan, about Ingress.spec.rules[].http.paths[].backend.service.name
.
I'm unsure if we're crossing threads here.
I guess what I'm now not clear about is the Ingress.spec.rules[].http.paths[].backend.service.name
field.
Let's assume that a user turns the feature gate on, sets that field to "1-adrian", then turns the feature gate off. Now they edit it again, and change the field to "2-adrian", is that situation allowed here?
I think what I'm hearing Dan say is "During Ingress updates, only do validation if that field changes, otherwise do no validation", and what I'm hearing Tim say is "During Ingress updates , do validation on that field. If it matches the new validation, then validate using that".
Apologies if I'm not catching what you're saying!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the existing code, the easiest path forward will be "If this Ingress resource currently uses the new validation on any of its service refs, allow that Ingress resource to continue validated using the new validation as a whole".
I think this is what Tim is saying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume that a user turns the feature gate on, sets that field to "1-adrian", then turns the feature gate off. Now they edit it again, and change the field to "2-adrian", is that situation allowed here?
While we should say "no, only the existing value", it's just not worth the effort to prevent that, so we usually do not bother.
the easiest path forward will be "If this Ingress resource currently uses the new validation on any of its service refs, allow that Ingress resource to continue validated using the new validation...
Right - finer granularity than that is not super important. As declarative validdation comes on-line, we will get a lot of that for free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that makes sense. I've implemented it that was in the code I'm working on.
Should I update the KEP to also say that if an Ingress resource contains a new-validated Service name, then the entire Ingress resource is eligible to use the new-validated Service name?
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
When the feature gate is disabled: | ||
|
||
- Creation of Services will use the previous `NameIsDNS1035Label()` validation for `.metadata.name` | ||
- Updates of Services doesn't change, as the `metada.name` field is immutable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Updates of Services doesn't change, as the `metada.name` field is immutable | |
- Updates of Services will no longer validate `metadata.name`, since the field is immutable, so the existing value can be assumed to be valid. |
I said before that you "didn't need to worry about" this case, but that's not true; the current code always validates the name, and you need to change it so that it only validates the name on Creates, and skips re-validating it on Updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks like ValidateServiceUpdate() calls ValidateService(). You can pass a flag, e.g. isUpdate bool
and change name validation to pass a do-nothing functionm, like:
nameFn := ValidateServiceName,
if isUpdate {
nameFn = func(_ string, _ bool) []string { return nil }
}
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
Describe manual testing that was done and the outcomes. | ||
Longer term, we may want to require automated upgrade/rollback tests, but we | ||
are missing a bunch of machinery and tooling and can't do that now. | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
("No" is a legitimate answer here for simple KEPs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this one unanswered, as it was under the "Rollout, Upgrade and Rollback Planning" heading, which has the comment "This section must be completed when targeting beta to a release."
So I plan on doing the manual testing after writing the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to leave it no for now, for beta you'll need to put here a description of upgrade->downgrade->upgrade manual testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a few details to Dan's good comments.
When the feature gate is disabled: | ||
|
||
- Creation of Services will use the previous `NameIsDNS1035Label()` validation for `.metadata.name` | ||
- Updates of Services doesn't change, as the `metada.name` field is immutable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks like ValidateServiceUpdate() calls ValidateService(). You can pass a flag, e.g. isUpdate bool
and change name validation to pass a do-nothing functionm, like:
nameFn := ValidateServiceName,
if isUpdate {
nameFn = func(_ string, _ bool) []string { return nil }
}
- Creation of Services will use the previous `NameIsDNS1035Label()` validation for `.metadata.name` | ||
- Updates of Services doesn't change, as the `metada.name` field is immutable | ||
- Creation of Ingress will use the previous `NameIsDNS1035Label()` validation for `.spec.[]rules.http.[]paths.backend.service.name` | ||
- Updates of Ingress will use the new `NameIsDNSLabel()` validation for `.spec.[]rules.http.[]paths.backend.service.name` if that field changes, otherwise, there is no validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usual pattern is to check the gate in pkg/registry/.../strategy.go and pass a flag or "opts" struct into the Validation function. Then validation checks the flag/opts and changes its behavior based on that. Then you can test both paths easily.
If e2e tests are not necessary or useful, explain why. | ||
--> | ||
|
||
- Create a Service that requires the new validation and test if a DNS lookup works for it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need integration AND validation for this? We don't generally exercise name validation in e2e (so expensive).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the sentence as it will be a single e2e test, that will create a Service with the new name format and perform a DNS lookup, so the validation will be tested implicitly not as a different test. We need this test to be promoted to conformance or we get the risk of new Service names will not be processed by DNS implementers.
049bcb9
to
df40367
Compare
df40367
to
56514e8
Compare
### Risks and Mitigations | ||
|
||
1. Services are responsible for creating DNS records (ie: `<service-name>.<namespace>.svc.cluster.local>`). To confirm that downstream systems will support the new validation, we will conduct compatibility testing by: | ||
- Verifying that DNS providers used in Kubernetes clusters can handle the new service name format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be a Conformance test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Listed under e2e tests already: https://github.com/kubernetes/enhancements/pull/5315/files#diff-02440b3714bbc7b0b3116621ca36837b9b4a9289d483bf98de941679121077aaR237
And I've added this note under GA:
https://github.com/kubernetes/enhancements/pull/5315/files#diff-02440b3714bbc7b0b3116621ca36837b9b4a9289d483bf98de941679121077aaR253
/lgtm only comments for clarifications and suggestions on the tests implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm... not sure if Antonio wanted changes now or later?
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
56514e8
to
49ac61e
Compare
49ac61e
to
4e4f675
Compare
/approve for API - Dan and Antonio have the LGTM |
/assign soltysh (For PRR) |
/lgtm |
keps/sig-network/5311-relaxed-validation-for-service-names/kep.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several comments, but this is mostly ok
keps/sig-network/5311-relaxed-validation-for-service-names/kep.yaml
Outdated
Show resolved
Hide resolved
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
Services that depend on the new validation, since any version of Kubernetes with | ||
this feature gate will allow existing Services to have `NameIsDNSLabel` names. If a | ||
user rolls back to a version before the feature gate, and has Services that depend | ||
on the new validation, they will be unable to modify those services, and will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This not entirely true, is it? We only validate service when creating, so theoretically they should function. One won't be able to create the services which rely on relaxed validation? Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, currently ValidateServiceCreate
and ValidateServiceUpdate
use a shared ValidateService
function that mostly doesn't pay attention to the distinction between creates and updates.,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense.
|
||
There are no metrics that would inform anyone that the feature was | ||
failing, but since the feature is opt-in, individual users can simply | ||
stop using the feature if it is not working for them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree there's no explicit metric, but one might suggest looking at increased level of failed Service creations, which may signal a problem with mismatched versions, as described in the rollout/rollback failure above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks, I've updated this
Describe manual testing that was done and the outcomes. | ||
Longer term, we may want to require automated upgrade/rollback tests, but we | ||
are missing a bunch of machinery and tooling and can't do that now. | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to leave it no for now, for beta you'll need to put here a description of upgrade->downgrade->upgrade manual testing.
keps/sig-network/5311-relaxed-validation-for-service-names/README.md
Outdated
Show resolved
Hide resolved
/lgtm |
c3a8c1b
to
f031c01
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
the PRR
|
||
1. With the feature gate enabled, test that Services can be created with both new and previous validation | ||
1. With the feature gate disabled, test that Services can be created with the previous validation, and fail when using the new validation | ||
1. Disable the feature gate and ensure that the Service can be edited without a validation error being returned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's reasonable - thx.
Services that depend on the new validation, since any version of Kubernetes with | ||
this feature gate will allow existing Services to have `NameIsDNSLabel` names. If a | ||
user rolls back to a version before the feature gate, and has Services that depend | ||
on the new validation, they will be unable to modify those services, and will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adrianmoisey, soltysh, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Uh oh!
There was an error while loading. Please reload this page.