Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow customizing restore order for Kubernetes controllers and their managed resources #4045

Open
DanielXiao opened this issue Aug 18, 2021 · 3 comments
Assignees
Labels
backlog Enhancement/User End-User Enhancement to Velero kind/requirement Kubernetes Resources Pertains to backup/restoration of Kubernetes resources Restore

Comments

@DanielXiao
Copy link

DanielXiao commented Aug 18, 2021

Describe the problem/challenge you have
[A description of the current limitation/problem/challenge that you are experiencing.]
When restore targets contain Kubernetes controllers, it 's possible to hit below issues:

  1. Velero is not aware of dependencies among Custom Resources and restore them in alphabetical order. E.g., invalid memory address or nil pointer dereference
  2. Race condition between Velero and a controller when they operate the same resource. See below issue from Antrea restore:

time="2021-08-10T16:41:04Z" level=info msg="Attempting to restore Tier: securityops" logSource="pkg/restore/restore.go:1070" restore=velero/restore-48c089d0-03ed-4f30-8532-a2e9837bea94
time="2021-08-10T16:41:04Z" level=info msg="error restoring securityops: admission webhook "tiervalidator.antrea.tanzu.vmware.com" denied the request: tier securityops priority 50 overlaps with existing Tier" logSource="pkg/restore/restore.go:1133" restore=velero/restore-48c089d0-03ed-4f30-8532-a2e9837bea9

error restoring application: admission webhook "tiervalidator.antrea.tanzu.vmware.com" denied the request: tier application priority 250 overlaps with existing Tier"

Describe the solution you'd like
[A clear and concise description of what you want to happen.]
From default restore order, we can see controller Pods are restored before managed Custom Resources, so we may solve this problem by:

  1. Allow user to define restore order for Custom Resource per restore.
  2. Mark controller Pod/Deployment with labels and remove them from the ordered list and append them to the end (before any managed resources).

As for Antrea cluster, the order should be default restore order -> Tier CR -> Other Antrea CRs -> Antrea controller Pod -> Antrea controller replicaset and deployment -> Antrea MutatingWebhookConfiguration and ValidatingWebhookConfiguration

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Nowadays there are tons of workloads consist of controllers and operators, both disaster recovery and migration might hit this issue.

Environment:

  • Velero version (use velero version): all
  • Kubernetes version (use kubectl version): all
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: all
  • OS (e.g. from /etc/os-release): all

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "The project would be better with this feature added"
  • 👎 for "This feature will not enhance the project in a meaningful way"
@zubron zubron added Enhancement/User End-User Enhancement to Velero Kubernetes Resources Pertains to backup/restoration of Kubernetes resources Restore labels Aug 19, 2021
@eleanor-millman eleanor-millman added this to the v1.8.0 milestone Sep 14, 2021
@reasonerjt reasonerjt added kind/requirement Kubernetes Resources Pertains to backup/restoration of Kubernetes resources Restore Enhancement/User End-User Enhancement to Velero and removed Enhancement/User End-User Enhancement to Velero Restore Kubernetes Resources Pertains to backup/restoration of Kubernetes resources labels May 20, 2022
@eleanor-millman eleanor-millman added the 1.10-candidate The label used for 1.10 planning discussion. label May 25, 2022
@reasonerjt reasonerjt added backlog and removed 1.10-candidate The label used for 1.10 planning discussion. labels Jun 28, 2022
@vineetsingh5
Copy link

We are also facing same issue with rancher cluster. We have triggered restore multiple time using same backup but sometime restore failed with similar error (nil pointer dereference).

@bluzarraga
Copy link

Hello, we are also interested in some kind of ordering mechanism through the velero restore object. Is there any plan to implement this feature or something like it given the age of this issue @reasonerjt?

@kaovilai
Copy link
Contributor

I'm interested in helping with this. One would just be adding restore priority field to restore CR, and if empty, use server restore priorities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Enhancement/User End-User Enhancement to Velero kind/requirement Kubernetes Resources Pertains to backup/restoration of Kubernetes resources Restore
Projects
None yet
Development

No branches or pull requests

7 participants