Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not attempt restore resource with no available GVK in cluster #7322

Conversation

kaovilai
Copy link
Contributor

@kaovilai kaovilai commented Jan 16, 2024

It is possible to end up in a situation where CRD was not restored due to an already existing CRD with a newer GVK without prior GVK. When restoring CR with older GVK, this PR causes item restore to error early and partially fail.

In the future, it might be desirable to merge CRD.spec.versions from backup with in-cluster if it can be done without conflicts but is out of scope of this PR.

Signed-off-by: Tiger Kaovilai tkaovila@redhat.com

Thank you for contributing to Velero!

Please add a summary of your change

Does your change fix a particular issue?

Fixes #7319

Please indicate you've done the following:

  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

@kaovilai kaovilai marked this pull request as draft January 16, 2024 20:55
@kaovilai kaovilai force-pushed the do-not-create-informer-for-non-existent-GVK branch 7 times, most recently from 784599c to a7de73e Compare January 16, 2024 22:33
if !clusterHasKind {
ctx.log.Errorf("Cannot restore %s because GVK doesn't exist in the cluster", groupResource)
// Adding to errs should cause restore to partially fail
errs.Add(namespace, fmt.Errorf("cannot restore %s because gvk doesn't exist in the cluster", groupResource))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should cause restore to partially fail, but the error message maybe different than before. Alternatively, another commits attempts to preserve prior error message for when informer cache is disabled. Let me know which is preferred.

@kaovilai kaovilai changed the title do not attempt restore resource with no available GVK in cluster Do not attempt restore resource with no available GVK in cluster Jan 16, 2024
@kaovilai kaovilai force-pushed the do-not-create-informer-for-non-existent-GVK branch from a7de73e to 642efc8 Compare January 16, 2024 22:41
// Check if cluster has GVK required to restore resource, otherwise return err.
// Placed here because the object apiVersion might get modified by a RestorePlugin
// So we check for the GVK after the RestorePlugin has run.
_, _, err = ctx.discoveryHelper.KindFor(schema.GroupVersionKind{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this change isn't the correct fix because the informer syncing will still stuck if the CRDs don't exist or aren't restored: https://github.com/vmware-tanzu/velero/blob/main/pkg/restore/restore.go#L602

The second commit is the good one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. So maybe hybrid of both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the change in the second commit is enough. Seems the change here doesn't help resovling the issue, correct me if I'm wrong.

@kaovilai kaovilai force-pushed the do-not-create-informer-for-non-existent-GVK branch from 740bbd7 to 13c777d Compare January 17, 2024 01:56
@kaovilai
Copy link
Contributor Author

I think pkg/discovery and Fakes may not have been implemented fully.. will continue tomorrow

@kaovilai kaovilai force-pushed the do-not-create-informer-for-non-existent-GVK branch 2 times, most recently from 94f9e89 to a94ccb4 Compare January 18, 2024 20:19
@@ -193,7 +196,8 @@ func (h *helper) Refresh() error {

for _, resource := range resourceGroup.APIResources {
gvr := gv.WithResource(resource.Name)
gvk := gv.WithKind(resource.Kind)
// gvk kind is lowercased for consistency of hits in KindFor()
gvk := gv.WithKind(strings.ToLower(resource.Kind))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and related changes fixed the unit tests. Will have to try a run live in a real environment to confirm everything is working.

Copy link

codecov bot commented Jan 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a81e049) 61.75% compared to head (1ece005) 61.76%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7322      +/-   ##
==========================================
+ Coverage   61.75%   61.76%   +0.01%     
==========================================
  Files         262      262              
  Lines       28419    28433      +14     
==========================================
+ Hits        17549    17563      +14     
  Misses       9640     9640              
  Partials     1230     1230              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kaovilai kaovilai marked this pull request as ready for review January 18, 2024 20:35
@@ -117,6 +119,7 @@ func (h *helper) ResourceFor(input schema.GroupVersionResource) (schema.GroupVer
return gvr, apiResource, nil
}

// KindFor expects input where kind is lowercased to hit the map
Copy link
Contributor

@ywk253100 ywk253100 Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that Kind doesn't match is the test code converts the Resource to Kind incorrectly:

Kind: cases.Title(language.Und).String(strings.TrimSuffix(resource.Name, "s")),

This shouldn't happen in the real world.

So let's specify the Kind explicitly in the test code, I create a PR for this, free feel to copy the changes into your PR if it isn't merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will check

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>

do not attempt restore resource with no available GVK in cluster

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
…esource if nil

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
@kaovilai kaovilai force-pushed the do-not-create-informer-for-non-existent-GVK branch 2 times, most recently from ffb8357 to fab9fa0 Compare January 19, 2024 16:15
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
@kaovilai kaovilai force-pushed the do-not-create-informer-for-non-existent-GVK branch from fab9fa0 to 1ece005 Compare January 19, 2024 16:35
@ywk253100 ywk253100 merged commit 270b1de into vmware-tanzu:main Jan 22, 2024
24 checks passed
ywk253100 pushed a commit to ywk253100/velero that referenced this pull request Jan 22, 2024
…are-tanzu#7322)

Check for GVK before attempting restore.

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
ywk253100 pushed a commit to ywk253100/velero that referenced this pull request Jan 22, 2024
…are-tanzu#7322)

Check for GVK before attempting restore.

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
ywk253100 pushed a commit to ywk253100/velero that referenced this pull request Jan 22, 2024
…are-tanzu#7322)

Check for GVK before attempting restore.

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
ywk253100 pushed a commit to ywk253100/velero that referenced this pull request Jan 22, 2024
…are-tanzu#7322)

Check for GVK before attempting restore.

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
ywk253100 added a commit that referenced this pull request Jan 22, 2024
… cluster (#7336)

* Specify the Kind explicitly in the API resource

Specify the Kind explicitly in the API resource to avoid wrong Kind conversion


* Do not attempt restore resource with no available GVK in cluster (#7322)

Check for GVK before attempting restore.


---------

Signed-off-by: Wenkai Yin(尹文开) <yinw@vmware.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Co-authored-by: Tiger Kaovilai <tkaovila@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EnableAPIGroupVesions test failed due to restore hung at in-progress state with informer cache enabled
3 participants