Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the apiVersion in Kubernetes CRD definitions #9600

Closed
wants to merge 2 commits into from

Conversation

sdelicata
Copy link
Contributor

@sdelicata sdelicata commented Dec 12, 2022

What does this PR do?

This PR updates the apiVersion in Kubernetes CRD definitions.
traefik.containo.us/v1alpha1 -> traefik.io/v1

Motivation

Fixes #7621
Fixes #6053

More

  • Added/updated tests
  • Added/updated documentation

@tomMoulard tomMoulard added this to the 3.0 milestone Dec 12, 2022
@ldez ldez self-requested a review December 12, 2022 08:58
@sdelicata sdelicata changed the title Update organization name in Kubernetes CRD definitions Update the group name in Kubernetes CRD definitions Dec 12, 2022
@ldez ldez marked this pull request as draft December 12, 2022 09:08
@ldez ldez changed the title Update the group name in Kubernetes CRD definitions Update the apiVersion in Kubernetes CRD definitions Dec 12, 2022
@sdelicata sdelicata marked this pull request as ready for review December 12, 2022 10:45

### Kubernetes CRD

In v3, the old api version `traefik.containo.us/v1alpha1` has been replaced by `traefik.io/v1` in the Kubernetes CRD definitions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In v3, the old api version `traefik.containo.us/v1alpha1` has been replaced by `traefik.io/v1` in the Kubernetes CRD definitions.
In v3, the Kubernetes IngressRoute provider CRDs (custom resource definitions) API version has changed from `traefik.containo.us/v1alpha1` to `traefik.io/v1`.

Copy link
Member

@rtribotte rtribotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👌

@z0rc
Copy link

z0rc commented Dec 13, 2022

DO NOT DO THIS.

This is not acceptable to drop support of single existing CRD version without a transition period. This will break end user installations.

Check how Kubernetes itself handles API deprecations. The deprecated API usually available for 3 minor releases (or about a year), so users can safely migrate to new version.

@z0rc
Copy link

z0rc commented Dec 13, 2022

See https://kubernetes.io/docs/reference/using-api/deprecation-policy/, rule 4b for deprecating API.

Your case is a bit different, as current v1alpha1 is the only version. It's named as alpha, but treated like de facto v1 for years, there was no beta period for it. How do you expect users with hundreds of resources, that currently based on v1alpha1 to switch to v1 without data loss and outages?

@ldez
Copy link
Member

ldez commented Dec 14, 2022

The v2 will still be maintained after the release of the v3.

We will also create a migration tool to help users to handle the configuration changes.

K8s doesn't create major versions, so the context is not the same.

@z0rc
Copy link

z0rc commented Dec 14, 2022

@ldez question isn't about maintenance of v2.

There must be a version of traefik that supports old and new api versions to actually enable any kind of migration. This is a gradual process for large installations, one time off conversion tool would not suffice.

Check the history of the cert-manager, how much time they supported multiple api versions.

@ddtmachado
Copy link
Contributor

Hey @z0rc, jumping in on this as I think your concerns are indeed legitimate, but as @ldez mentioned this is not a minor release but a major one for Traefik itself, and not only for the CRD resources, which means there will be breaking changes in other places so you would not want to hot swap v2 for v3 even if it would support both old and new CRD's.

When upgrading between major versions I believe the best aproach to be a blue / green deployment, with a gradual migration that allows you to validate behavioral changes, performance impacts and everything else, this approach was covered on this blog post previously.

@z0rc
Copy link

z0rc commented Dec 14, 2022

@ddtmachado this is how you lose users. Migrating to from traefik v2 to v3 would be no different from migrating from traefik v2 to ingress controller which is more friendly to end users.

I haven't checked other changes yet, but CRD should be handled with extra care and planned transition, because it's traefik's state within Kubernetes.

This is your product and you are free to version it as you want, I just don't see a point to keep using it if handles state transition like this.

@z0rc
Copy link

z0rc commented Dec 14, 2022

To expand my concerns about CRDs.

I, as cluster operator, am able to control traefik instances, it isn't a problem for me to review changes between versions and adjust traefik configuration when deploying new version.

CRDs on the other hand is a public API that used by everyone within cluster, I, as cluster operator, make it as part of contract, so regular users can be sure that API won't break or change without notice. Also CRDs are de facto state storage for traefik. If you drop support of existing CRDs, I, as cluster operator, won't be able to maintain contract to users, there is no way to enable smooth transition.

Problem with current PR is that is just changes apiversion identifier without functional changes. This invalidates existing state for sake identifier change only.

@jnoordsij
Copy link
Contributor

CRDs on the other hand is a public API that used by everyone within cluster, I, as cluster operator, make it as part of contract, so regular users can be sure that API won't break or change without notice. Also CRDs are de facto state storage for traefik. If you drop support of existing CRDs, I, as cluster operator, won't be able to maintain contract to users, there is no way to enable smooth transition.

Just to highlight this concern, this proposed CRD change will make a graceful/zero downtime upgrade from Traefik v2 to Traefik v3 within a cluster impossible for anyone currently running v2 for as far as I can understand.

One thing to note about CRDs here, is that they are cluster-wide resources, which is problematic when you have multiple Traefik instances running, as it makes it impossible to do this intended upgrade for 1 instance: you either have to update none or all at the same time.

What's more, is that updating CRDs as planned above in a cluster would result in immediately invalidating all implemented resources, as they no longer serve an existing version, so this would lead to all existing resources to be undetectable, resulting in guaranteed downtime. Updating the resources before updating the CRDs themself is also not possible, as then they would have to refer to a not-yet-existing version of the CRD.

I think worth mentioning here is the part in the Kubernetes docs about Versions in CustomResourceDefinitions, which has a great number of examples to show how to handle multiple versions within a CRD.

I'm not sure if any real-life (test-)setups have already been tested/upgraded including this CRD upgrade, and if so, I would love to see an upgrade example on this, but I fear it's simply not possible with this change. I think this should be thoroughly tested first before updating the CRDs in this particular way.

@jnoordsij
Copy link
Contributor

Additional note: as far as I can tell, changing the group name on the CRD (as proposed) cannot be done without a hard migration and corresponding downtime regardless of versioning use, so it might be better to avoid that as well.

@ddtmachado
Copy link
Contributor

Hey all, just a quick update on this.

First thanks for the feedback and good arguments you all provided. They were of great value especially since they supported a very different approach than what we planned for the migration path.

Talking about that, this PR was supposed to be just the first step in moving to the new scheme for CRDs, and we planned to work on the migration tools and other changes later on to make it happen.

Then we realised we actually didn't share our full plan on the migration around CRD's, so we're taking a step back and hold on merging this PR until after we explain and discuss our options, so we can reach an agreement with the community on the best way to handle it. You will be given the chance to give input every step of the way.

Keep tuned for more news, we'll update the original issue (and link here so you are sure to see it) with the conversation about our plan for updating CRDs and how migration will look for CRDs.

In addition we will also have a new issue that discusses our overall migration plans to make sure you have full transparency.

@jnoordsij
Copy link
Contributor

Thanks for looking into this and taking time to consider the upgrade path! I found out that, contrary to what I thought earlier, changing the group in this case actually does allow for installing the new CRDs side-by-side with the old ones, so that might make things a little more easy in this case.

This would allow for an upgrade path where both Traefik v2 and Traefik v3 would be running simultaneously within a cluster, with 1 version serving the "old" versions and the other serving the "new" versions. This still would be a somewhat complicated migration, where one should be careful which Traefik services are connected to which LoadBalancer/IPs, but at least it should be possible in a reasonable way.

I think a grace period where v3 can actually serve both the old and new API versions would be even better, but I understand the additional technical overhead for that might be quite a hassle to deal with as well.

@ddtmachado
Copy link
Contributor

@jnoordsij that's correct, I actually made a quick gist with a local setup to highlight that, it requires building Traefik with this PR and actually modifying the CRD's definitions as well since they are not available yet on the static endpoint, but its useful to validate this migration strategy.

We're still writing down and discussing the alternatives and will soon share that in the issue as mentioned.

@z0rc
Copy link

z0rc commented Dec 21, 2022

I would strongly advice to look into supporting v1alpha1 and v1 simultaneously as main goal. Workaround of fall-though using IngressRoute probably won't work for users who use Kubenetes Ingress.

In my case I have big number of Ingress+Middleware resources and switching to IngressRoute is a no go for me.

@longit644
Copy link
Contributor

Hi all, any update on this?

@tfny
Copy link
Contributor

tfny commented Mar 9, 2023

Hey all, I am sorry, I thought I sent this response a few weeks ago, I am not sure what happened.

After listening to the community, we have decided to allow more time before releasing Traefik v3.0 to allow for a gradual migration to the new Kubernetes CRD definitions. This is in line with feedback we have received, best practices, and Kubernetes own migration practices.

To do this, we will release Traefik v2.10 with both CRD API Groups containo.us/ (the current group) and traefik.io/v1alpha1 (the new group). Traefik v2.10 will bring no additional changes to the CRD scheme other than updating the API group. It will also include a warning log about deprecated containo.us/v1alpha1 resources which will be removed in v3.0.

Traefik v3.0 will launch with traefik.io/v1 as the only CRD API Group available. This is because breaking changes to other parts of Traefik will make continued backward compatibility impossible. To manage this, we are creating an automatic Kubernetes webhook to migrate your CRDs from traefik.io/v1alpha1 to traefik.io/v1.

It is possible that using the webhook to migrate your CRDs might not prevent all breaking changes. This will be because Traefik v3.0 brings breaking changes to other areas that may impact your configuration. We will create additional resources to handle this (e.g. passHostHeader will move to the ServersTransport resource) and, possibly, an additional CLI tool.

Traefik v2 will be maintained for a minimum of one year after v3.0 is released.

Please share your thoughts with us.

@tfny tfny mentioned this pull request Mar 9, 2023
2 tasks
@z0rc
Copy link

z0rc commented Mar 9, 2023

@tfny thanks for listening. This is really appreciated.

Plan looks okay, but also raises a question. Right for me it looks like there will be 2 migrations needed. First with v2.10 users have to update apiVersion traefik.containo.us/v1alpha1 to traefik.io/v1alpha1. Second after release of v3 users have to update from traefik.io/v1alpha1 to traefik.io/v1 eventually, as I assume webhook won't be available indefinitely. In my case we deploy Middleware resources using own Helm charts, so have to eventually switch to traefik.io/v1 by default.

My question is, what stops skipping traefik.io/v1alpha1 and allow to have only 1 migration? From here I can see 2 ways to approach it:

  • v2.10 that supports traefik.containo.us/v1alpha1 and traefik.io/v1
  • v3 webhook, that converts traefik.containo.us/v1alpha1 to traefik.io/v1

@ddtmachado
Copy link
Contributor

Hi @z0rc, thats a good question, let me address one point at a time:

First the problem we have in supporting v1 right now on 2.10 is that we didn't finished working on the CRD provider so we don't have the final objects ready, the changes we're going to introduce are still an ongoing discussion at this point.

Then providing a webhook to migrate between two different api groups is not possible following the Kubernetes design, after all for Kubernetes they are two completely different APIs and when you register a webhook you do it for an api group so the output object should be on the same api group just in an updated version.

That is why the approach is taking two steps, one to make sure you have time to migrate APIs to the new group while not being concerned with any breaking change. Then moving to a new version in the same group, as it should be normally, but this time moving to a new major version which will include breaking changes, but supported by an automatic webhook translation and maybe even other tooling where necessary to help ease the process.

@ddtmachado
Copy link
Contributor

ddtmachado commented Mar 9, 2023

BTW this PR will probably be closed after we merge 9765 and then we'll open a new PR to introduce v1 once we have everything ready

@sjmiller609
Copy link

In my opinion, the API versions should indeed be updated in the traefik v3 branch and z0rc's point of supporting an upgrade path is a separate issue because the code in traefik v3 is already a different API, and this PR is simply indicating that's the case with an API version. Whether or not the Traefik team decides to support something like z0rc suggests, in either case this PR should go in.

For example, Traefik 3, under traefik.io/v1alpha1, we have:

apiVersion: traefik.io/v1alpha1
kind: MiddlewareTCP
metadata:
  name: ipallowlist
  namespace: default
spec:
  ipAllowList:
    sourceRange:
      - 127.0.0.1/32

And in Traefik 2, under traefik.io/v1alpha1, we have:

apiVersion: traefik.io/v1alpha1 # <------- API version is the same
kind: MiddlewareTCP
metadata:
  name: ipwhitelist
  namespace: default
spec:
  ipWhiteList: # <---------- but this is different!
    sourceRange:
      - 127.0.0.1/32

I expect the other breaking changes related to Traefik v3 are also already present in Traefik version 3, even though they still live under the api version traefik.io/v1alpha1. This includes new features, renames, and deprecated code dropping.

Traefik version 2 and Traefik version 3 are already actually different APIs, and so the API version should be different, but currently it's just named the same even though it's actually different. So in my opinion, renaming the APIs like shown in this PR is correct and the concept to support a graceful switching over is a separate issue.

Supporting something like z0rc is suggesting involves one of three options: support new stuff in v2, support old stuff in v3, or run-both at the same time on different API versions. This PR simply labeling the new API as a new version is required no matter which option is selected.

@sjmiller609
Copy link

Also, in my opinion the run-both option will probably work best for what z0rc is asking for, i.e. running traefik version 2 to back traefik.io/v1alpha1 and running traefik version 3 to back traefik.io/v1 simultaneously.

In such a setup, it's the same concept to running 2 ingress controllers at the same time. For example, it would be a similar process for migration from Traefik to Nginx ingress controller, just install both at the same time and cut workloads over.

@nmengin nmengin mentioned this pull request Dec 22, 2023
2 tasks
@nmengin
Copy link
Contributor

nmengin commented Jan 16, 2024

Hey,

Just a heads up on CRD support in Traefik v3.

We discussed the matter with other maintainers, and we've decided to keep the v1alpha1 version in Traefik v3.0. The good news is that there will be no migration required on this front.

To ease the migration in Traefik v3, we've opted for not introduce any breaking changes in the dynamic configuration. This means that we will retain the deprecated middleware options. However, we do not plan to maintain these options in the CRD v1.

For those reasons, we will open a new PR that implements the plan and we close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet