New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Known Issue: CRDs cannot be restored due to fields being identical values #2383
Comments
Looks like this is very similar to what was happening in #2370, but different fields. |
@TomaszKlosinski Could you provide a link to the ElasticSearch and Kibana YAML you were using to install, so I can reproduce this with the exact same files? |
Hello, I've used ECK operator to install it: And then I've used this YAML to create Elasticsearch:
PS. I use |
Great, thanks! |
Hey @nrb! Thanks for looking into this :) It is due to the validation here: https://github.com/kubernetes/apiextensions-apiserver/blob/089b5f38f499f0d173cb1d7f424ade7214d4423e/pkg/apis/apiextensions/validation/validation.go#L238. I am guessing velero may be adding suberesources to both the top-level and per-version stanzas. I have a really simple example here that I noticed it on, which might be helpful. I also am willing to work on a fix here if that would be helpful to the team 👍 Error:
CRD:
Versions:
|
@hasheddan Thanks for digging into this! Velero shouldn't be adding anything extra to the schemas, but I could be mistaken. The CRD manipulations we do are in these plugins, if you'd like to double check my work which would be appreciated :) These were introduced with v1.3.0 Backup: Restore: Thanks for that small CRD, too...I was wondering if it was something about the definitions themselves, but given your data, clearly something else is going on. |
I did some tests with the GCP Crossplane Stack CRD that @hasheddan posted and confirmed the error he saw with Velero master. I did a Velero backup then got the JSON files using First, with the original JSON (the only difference being I ran it through % k apply -f gcpsamples.gcp.stacks.crossplane.io.json
The CustomResourceDefinition "gcpsamples.gcp.stacks.crossplane.io" is invalid: spec.versions: Invalid value: []apiextensions.CustomResourceDefinitionVersion{apiextensions.CustomResourceDefinitionVersion{Name:"v1alpha1", Served:true, Storage:true, Schema:(*apiextensions.CustomResourceValidation)(nil), Subresources:(*apiextensions.CustomResourceSubresources)(0xc009318e30), AdditionalPrinterColumns:[]apiextensions.CustomResourceColumnDefinition(nil)}}: per-version subresources may not all be set to identical values (top-level subresources should be used instead) Next, I copied the contents to % k apply -f test.json
customresourcedefinition.apiextensions.k8s.io/gcpsamples.gcp.stacks.crossplane.io created The diff: % diff gcpsamples.gcp.stacks.crossplane.io.json test.json
32,35c32
< "storage": true,
< "subresources": {
< "status": {}
< }
---
> "storage": true |
For comparison, I applied https://raw.githubusercontent.com/elastic/cloud-on-k8s/master/config/crds/all-crds.yaml to a 1.17.2 KinD cluster, then backed it up and received the same errors as @TomaszKlosinski was originally reporting. I notice that in the I don't think Velero's doing this, as when I retrieve the CRD via kubectl, I see 3 lines for that field, too:
I need to do some more research, because there's some API server behavior here that I'm not understanding. |
Ok, I think I see what's happening now. Using I got a hint from this commit, which said that v1 CRDs don't support the top-level field anymore, and also remembered that our plugin just sets the version string, while keeping the object that was fetched from the preferred version endpoint (which on Kubernetes v1.16+ would have been a v1 CRD) Instead, I think the correct path will be to get a proper v1beta1 client and retrieve the CRD from the correct endpoint on backup. |
@nrb apologies for just following up here, your digging appears to be in accordance with my understanding though :) An alternative could be to convert both |
@hasheddan No worries! I was capturing notes as I found things. I'll take a look at that approach. We also have #2373 which will get both versions, too. |
@hasheddan Storing the internal representation does seem like a good idea, especially since https://github.com/kubernetes/apiextensions-apiserver/blob/c47e10e6d5a3d95d427c5bee109d934602f0861e/pkg/apis/apiextensions/v1/zz_generated.conversion.go#L306 exists. I think I'll have to revisit the CRD backup/restore logic entirely. Thank you for your pointers! Do you mind if I tag you on reviews for these changes? |
@nrb sure thing! Thanks for the great work you do on this project! |
A summary so far: Right now, Velero only gets the server's preferred version of a resource. In Kubernetes v1.16+, that's v1 of CustomResourceDefinitions. With v1.3.0, Velero applies a heuristic to see if it had a CRD that was v1 but was really a v1beta1 CRD retrieved through the v1 endpoint. If so, then the backup plugin switched the version string from As documented in this issue, that's the wrong approach, because the overall structure returned from the v1 endpoint is invalid for v1 CRDs. Our sample size of CRDs was too small to catch this, which will be remedied in #2447. I believe the heuristics on backup are working - they're catching the v1beta1 CRDs and applying the string swap, which is what then causes this problem. We have 3 options for fixing the current behavior, captured from the discussion at the community meeting on April 21, 2020:
Of these options, I think 1 & 3 are viable for the v1.4 timeframe. 3 builds on existing work. 1 duplicates some of 3's logic. Long term, I think 2 may be the most correct thing to do, but I haven't done the proper research to know what it fully entails. All of these versions special case the CustomResourceDefinition API group, which I think is unavoidable. The upstream CRD code states that v1beta1 is planned for removal in v1.19. However, Velero will need to be able to take v1beta1 backups and restore them going forward longer than this window. @vmware-tanzu/velero-owners Please let me know what your thoughts are! I'd like to dig into this in earnest this week. @hasheddan Do you have anything to add? I see you've worked on the upstream CRD code - have I gotten anything wrong in this summary? |
Thanks for the writeup @nrb. I agree that options 1 and 3 are most feasible now. Option 2 would be a pretty big change, and I'm not not even sure at this point whether it's desirable or not. Of 1 and 3, I think 3 is the more failsafe option, since it doesn't rely on us correctly identifying whether a CRD was originally created as
This isn't something Velero has ever supported (restoring into a cluster that doesn't have the API version that the backup uses) -- if we want to support this, we'll have to embed our own conversion logic to upconvert |
I've got a branch working with the elasticsearch CRDs now, but I need to update the unit tests to account for the changes. PR should be up tomorrow. |
@nrb thanks for the awesome description. And sorry for not taking at look at this earlier. |
Edit from @nrb -> Please find a summary of the causes HERE. THIS IS A KNOWN ISSUE WITH A FIX TARGETTED FOR VERSION 1.4
What steps did you take and what happened:
I've made a backup of my namespaces with elastic-operator, elasticsearch, kibana and filebeat.
Then I've deleted them with following script:
Then I've restored it. The restored Elasticsearch+Filebeat+Kibana seems to work, but velero showed errors.
What did you expect to happen:
Restore without errors.
The output of the following commands will help us better understand what's going on:
It shows erros when restoring CRDs but afterwards they're restored:
kubectl logs deployment/velero -n velero
https://termbin.com/uvbx
velero restore describe elk-backup-20200401140210
https://termbin.com/k87h
velero restore logs elk-backup-20200401140210
https://termbin.com/u23p
Environment:
velero version
):velero client config get features
):kubectl version
):Kubernetes installer & version:
Manually installed with kubeadm. Version 1.16.
Cloud provider or hardware configuration:
Physical hardware + KVM VMs.
OS (e.g. from
/etc/os-release
):The text was updated successfully, but these errors were encountered: