-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PartiallyFailed hit when restoring 2 namespaces with restic plugin in PKS env #2691
Comments
Hi @luwang-vmware, I'm curious if you have tried to backup each namespace individually and if so how did that go. In the mean time, marking this as something to investigate. Seems we might have to look thru the code and piece together what the source of this problem could be. |
Thanks for following it up @carlisia. I have done to backup each namespace individually and it worked as expected. The issue reported is easily to reproduce. Below is my yaml file for reference. Please let me know what anything else I need provide.
|
@luwang-vmware In Velero v1.4.2, we increased the restic timeout, as well as the amount of CPU/memory the restic daemonset has by default. This should help with larger data sets on restic. We got to the numbers by testing with datasets at 100GB. We've also recently added documentation for modifying Velero & restic CPU/memory requests and limits so that restic operations can be given more resources to be performed in a more timely manner given the size of the volumes. You can see more information about the new defaults and adjusting the values in our docs. Can you please retry with v1.4.2 and let us know if this resolves your issue? |
Thanks @nrb. I would have a try on velero 1.4.2. One question, which rule/principle could I follow when I need to use velero+restic to backup large PV or more PVs(in the same namespace) in one velero backup cli. More specific is that 100G PV is tested with your team with restic memory set as 512M(limited by 1G), if I want to use restic to backup 1T or 2T PV or 10 * 100G PV, how do I need to calculate the memory resource uses for restic? |
Hi @luwang-vmware. We don't have guidelines for calculating the amount of memory required for performing restic backups. Were you able to retry using velero 1.4.2 or the latest release (v1.5.2) using adjusted timeouts or memory limits? |
Hi @luwang-vmware! If you still have access to the environment where this issue happened, I'd like to check that the snapshot that is being restored exists in your storage location. That will help us in determining why this error is being triggered. Can you verify that the snapshot ID of your
and checking the If it matches, can you check the following path in your storage location This is the snapshot that restic is attempting to restore but can't find. Thanks! |
Hello! |
What steps did you take and what happened:
This issue can be reproducible.
velero --kubecontext=pks-st-2 backup create 1g-20g-backup --include-namespaces ns-1g-1,ns-20g-1
velero --kubecontext=pks-st-2 restore create 1g-20g-restore --from-backup 1g-20g-backup
What did you expect to happen:
restore should be succeed.
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)
logs are uploaded here https://gist.github.com/luwang-vmware/9bef01478e11ade628fa53f6b98f8261
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
velero version
):1.4.0
velero client config get features
):kubectl version
):1.17.5
VCP
/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: