Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore API Groups based on Target Cluster Preferred/Supported Version #2551

Closed
brito-rafa opened this issue May 20, 2020 · 17 comments · Fixed by #3133
Closed

Restore API Groups based on Target Cluster Preferred/Supported Version #2551

brito-rafa opened this issue May 20, 2020 · 17 comments · Fixed by #3133
Assignees
Labels
versioning issues related to resource versioning
Milestone

Comments

@brito-rafa
Copy link
Contributor

Describe the problem/challenge you have
This is a follow-up of PR #2373 (back up all API Group Versions). We now need to introduce the ability to restore the Target Cluster APIGroup Preferred Version (TCPV). Today's default restores the source cluster APIGroup Preferred Version (SCPV).
This is ultimately to address #2251.

Describe the solution you'd like
I added a diagram here:
https://docs.google.com/presentation/d/1hpq3kwc4uYR96z1Jzi6G1TZMqq9YcBoIw-JcHqkSnnE/edit#slide=id.p1

Lingo:
Target Cluster APIGroup Preferred Version (TCPV) - from discovery client.
Source Cluster APIGroup Preferred Version (SCPV) - looking up the "preferredversion" directory inside the backup tarball.
Target Cluster APIGroup Supported Versions (TCSV) - array from discovery client.
Source Cluster APIGroup Supported Versions (SCSV) - array built looking up the versions inside the backup tarball.

Basically, wrap this new logic under the same feature flag "EnableAPIGroupVersions". Without this flag, restore logic stays the same.
If the feature flag is passed, the target cluster discovery client will contain a list of target cluster preferred versions (TCPV) and an array of all supported versions (TCSV).

If TCPV in found on the backup then
restore item using the TCPV in path inside the backup tarball
else if SCPV belongs in TCSV array then
restore item using the SCPV in path inside the backup tarball
else if SCSV and TCSV arrays have any element in common then
restore item common element version in path inside the backup tarball
else log an error or a warning "not supported API Group between the clusters"

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:
1.4.0+ future

@nrb nrb added the versioning issues related to resource versioning label May 20, 2020
@skriss skriss added this to the v1.5 milestone May 20, 2020
@carlisia
Copy link
Contributor

This is the recording for the WG-Data Protection meeting of 04/08/2020 where the motivation and more context for this work was presented:

https://youtu.be/S2o7znJS_9w?t=206

@nrb nrb modified the milestones: v1.5, v1.6 Aug 11, 2020
@jenting
Copy link
Contributor

jenting commented Aug 28, 2020

I'd like to handle this issue.

@ashish-amarnath
Copy link
Member

@brito-rafa If you are not working on this anymore, is it ok for @jenting to pick this up?

@brito-rafa
Copy link
Contributor Author

Hey there!
I was planning to tackle this one around mid-September. In preliminary conversations about this change is how the new Restore logic would behave with RestoreItem plugin.
So I started to write my own plugin before changing the Velero Core.
What about this: @jenting start to code this change and I will partner up with him to test it?
Thanks,
Rafa

@jenting
Copy link
Contributor

jenting commented Aug 31, 2020

Thanks, @brito-rafa
I'll take a look at RestoreItem plugin and Restore logic before start working the code change to get more familiar with retore code first.
Let's keep in touch.

@codegold79
Copy link
Contributor

This is a popular issue. I too would like to help work on it. @jenting, I'll contact you via Kubernetes Slack. I'm curious how far you've gotten and where I can help.

@jenting
Copy link
Contributor

jenting commented Oct 13, 2020

This is a popular issue. I too would like to help work on it. @jenting, I'll contact you via Kubernetes Slack. I'm curious how far you've gotten and where I can help.

Background:

Let's focus on the Kubernetes supported groups (scheduling.k8s.io, autoscaling, ...) because we can't understand the CRDs API conversion logic. Three cases we need to solve:

  1. Source cluster preferred API version = Target cluster preferred API version
  2. Source cluster preferred API version != Target cluster preferred API version but Source cluster supported API versions have an intersection with Target cluster supported API versions
  3. No API version intersection between Source cluster supported API versions and Target cluster supported API versions (NOTE: I can't find an API group that there is no API version intersection from Kubernetes version 1.13 to 1.19)

I've tested the use case that Target Kubernetes version > Source Kubernetes version and all cases 1,2,3 works well on all the Kubernetes supported group resources. Here is my testing environment.

  • kind
  • velero version 1.4.2
  • without enabling the feature flag EnableAPIGroupVersions in the velero server

I've not tested the use case that Target Kubernetes version < Source Kubernetes version yet. But I think it would not worker correctly since the Target Kubernetes cluster does not know how to converts the unknown API version.

@codegold79
Copy link
Contributor

codegold79 commented Oct 13, 2020

This is fantastic work, @jenting, thank you for doing this. And thank you for the update :)

I'm still new to Kubernetes and this project so I have some questions. Please correct me if I'm wrong.

You mentioned getting API Versions using

kubectl get --raw /apis | jq -r

Is that command using the discovery client that @brito-rafa mentioned at the top of the description of this issue?

Let's focus on the Kubernetes supported groups (scheduling.k8s.io, autoscaling, ...) because we can't understand the CRDs API conversion logic.

I agree. I think the scope of this issue covers only the resources that come with Kubernetes and not the CRDs other organizations create for their clusters.

Three cases we need to solve: (1) Source cluster preferred API version = Target cluster preferred API version (2) Source cluster preferred API version != Target cluster preferred API version but Source cluster supported API versions have an intersection with Target cluster supported API versions (3) No API version intersection between Source cluster supported API versions and Target cluster supported API versions (NOTE: I can't find an API group that there is no API version intersection from Kubernetes version 1.13 to 1.19)

I have a different interpretation of the three cases. I view it more like this:

  • Priority 1 (Highest priority). Target preferred version can be used. This means one of two things:
    • (A) target preferred version == source preferred version OR
    • (B) target preferred version == source supported version
  • Priority 2. Source preferred version can be used. This means
    • source preferred version == target supported version
  • Priority 3. A common supported version can be used. This means
    • target supported version == source supported version
    • if multiple support versions intersect, choose the latest version (v2 > v1, beta > alpha, etc).
  • Priority 4. Look for plugins that can convert any of the source versions to a target preferred/supported version. If multiple plugins match, the target preferred version is the highest priority.
  • Error out per normal if none of the above can be done.

If none of the above four priority routes can be taken, then the backup will lead to an error. The error message in the logs will look something like,

error: unable to backup Kubernetes version XYZ due to resource API versions not supported in target cluster.
error: unable to find user-defined plugin to convert source resource API version to a target supported version.

And finally, my last question is regarding your test environment:

without enabling the feature flag EnableAPIGroupVersions in the velero server

Without enabling that feature, does that mean you cannot see the source preferred and supported versions? @brito-rafa suggested in the description to look up the preferred and supported source version in a directory in the backup tarball.

The file structure inside the backup tarball file is shown here: https://github.com/vmware-tanzu/velero/blob/7a103b9eda878769018386ecae78da4e4f8dde83/site/docs/master/output-file-format.md#file-format-version-11-current.

I still haven't gotten my test environment set up so I cannot generate any backup tarball files. I'll be working on that today. Hopefully then I can see the backup tarball directory structure myself.

Thanks again!

@jenting
Copy link
Contributor

jenting commented Oct 14, 2020

You mentioned getting API Versions using

kubectl get --raw /apis | jq -r

Is that command using the discovery client that @brito-rafa mentioned at the top of the description of this issue?

sorry, I did not dig deeper on Kubernetes code, but I believe the discovery client having the same output as kubectl get --raw /apis | jq -r.

And finally, my last question is regarding your test environment:

without enabling the feature flag EnableAPIGroupVersions in the velero server

Without enabling that feature, does that mean you cannot see the source preferred and supported versions? @brito-rafa suggested in the description to look up the preferred and supported source version in a directory in the backup tarball.

Yes, I know with enabling the feature flag EnableAPIGroupVersions, the backup files include all the preferred version and all the supported versions.

Since I'm testing back up at Kubernetes version 1.x and restore at Kubernetes version 1.(x+n), n > 0, so the source preferred version includes in target supported versions, this means even if the version is not exactly the same, the webhook conversion would perform conversion logic for us.

@codegold79
Copy link
Contributor

Thanks for the clarifications.

Yes, I think you're right about kubectl get --raw /apis.

I think I understand what's happening. You are using a Kubernetes feature (the webhook conversion) to convert a preferred source version to a target supported version. Does that mean that you do not prioritize converting a source version to a target preferred version? Maybe your tests are only doing a source preferred -> target supported, even though that is second priority?

@jenting
Copy link
Contributor

jenting commented Oct 15, 2020

Thanks for the clarifications.

Yes, I think you're right about kubectl get --raw /apis.

I think I understand what's happening. You are using a Kubernetes feature (the webhook conversion) to convert a preferred source version to a target supported version. Does that mean that you do not prioritize converting a source version to a target preferred version? Maybe your tests are only doing a source preferred -> target supported, even though that is second priority?

I think my tests cover priority 1 & 2.

For example:

    • source cluster
      • preferred version (v1beta1)
      • supported versions (v1beta1)
    • target cluster
      • preferred version (v1beta1)
      • supported versions (v1beta1)
    • source cluster
      • preferred version (v1beta1)
      • supported versions (v1beta1)
    • target cluster
      • preferred version (v1)
      • supported versions (v1beta1, v1)

@codegold79
Copy link
Contributor

@jenting That's really great you are able to get Priorities 1 & 2. Without looking too carefully, it does seem like all current cases can fall into those two priority groups. But maybe it will be a problem in the future. It seems to be a problem in issue #2251 that needs to be addressed. It will probably be good to be able to migrate from a source supported version to a target supported version, even if they aren't preferred in either cluster.

Were you planning on working on enabling skipping a version where only the supported versions intersect (no preferred versions match)? I can help (or take over) work on this priority 3 case if you'd like.

There's also priority 4 where no version is in common and a plugin will need to be sought. This looks difficult, but I can try to help with that also.

@jenting
Copy link
Contributor

jenting commented Oct 15, 2020

@jenting That's really great you are able to get Priorities 1 & 2. Without looking too carefully, it does seem like all current cases can fall into those two priority groups. But maybe it will be a problem in the future. It seems to be a problem in issue #2251 that needs to be addressed. It will probably be good to be able to migrate from a source supported version to a target supported version, even if they aren't preferred in either cluster.

Were you planning on working on enabling skipping a version where only the supported versions intersect (no preferred versions match)? I can help (or take over) work on this priority 3 case if you'd like.

There's also priority 4 where no version is in common and a plugin will need to be sought. This looks difficult, but I can try to help with that also.

You could work on priority 3 first, but we need to find an API group and a source and target Kubernetes version that falls on priority 3 cases.

@codegold79
Copy link
Contributor

You could work on priority 3 first, but we need to find an API group and a source and target Kubernetes version that falls on priority 3 cases.

Sounds good. I'll look for a priority 3 case. If I can't find it, I think I might have to fake it. I'll let you know how it goes.

@codegold79
Copy link
Contributor

I spoke with @brito-rafa and he says it would be all right to not do priority 4 (look for plugins if there are no common supported version). The priorities are listed out in my comment here.

To make it easier to understand the priority list, here is a PowerPoint that can help:
Priority-Supported-Preferred-Versions-V2.1.pptx

@brito-rafa
Copy link
Contributor Author

Started to code an example CRD for the testing of the code. At this point, the example CRD has only a gvk, but here is the repo: https://github.com/brito-rafa/k8s-webhooks

@brito-rafa
Copy link
Contributor Author

In the past couple of days, some discovery.

Kubernetes has a version priority list:
https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/#version-priority

When creating CRDs, the preferred version follows this priority, even if you specify differently.

In order words, if you support both v1 and v2, you can't set v1 as the preferred version. v2 will be the preferred version.

If you support v1 and v2beta2, you can't set v2beta2 as the preferred version. v1 will be the preferred version and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
versioning issues related to resource versioning
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants