-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default-volumes-to-restic Stuck #2967
Comments
What does the output of Also, can you supply the output of |
Note: Since last time I have deleted old backup and created new one name of this one is full-volume200. Everytime backup stucks at 35th of total items.I am going to share logs around this 35th item Velero logs:
kubectl get podvolumebackups -n velero
|
When I use --include-namespaces and specify namespace which PVCs locate everything works fine. velero create backup full-volume-test --default-volumes-to-restic --include-namespaces=test velero create backup full velero create backup full-volume-full --default-volumes-to-restic Note: only test namespace has two PVCs, no other namespaces contains PVC. Also only two PVs exists which binded to PVCs. Edit: I have checked namespaces , problem related with kubernetes-dashboard namespace.When I use --default-volumes-to-restic with kubernetes-dashboard it is stuck in progress and 0 items backed up out of 40. |
For the backup that was stuck, can you run This should report the pod volumes that Velero tried to back up with restic, and their current status. Thanks for the summary of the different commands you tried, that's helpful in trying to narrow down this issue. |
One more thought I had - are pod volumes you're trying to back up still annotated? |
No, they do not have annotations. Output of "velero describe backup full-volume200 --details" Phase: InProgress Errors: 0 Namespaces: Resources: Label selector: Storage Location: default Velero-Native Snapshot PVs: auto TTL: 720h0m0s Hooks: Backup Format Version: 1.1.0 Started: 2020-09-24 23:49:45 +0400 +04 Expiration: 2020-10-24 23:49:45 +0400 +04 Estimated total items to be backed up: 40 Resource List: Velero-Native Snapshots: Restic Backups: Trimmed output of "kubectl get deploy dashboard-metrics-scraper -o yaml -n kubernetes-dashboard" volumeMounts: So it is clear something related with that 'tmp-volume', yes? Note: there is no /bin/sh or /bin/bash available for "dashboard-metrics-scraper" pods , does restic need either of them? |
No, restic shouldn't need a shell inside the pod to work. It gets at the volume mounts via the node's hostPath. You said that the backup works without error when you run without the default-volumes-to-restic option; by "work" here, do you mean that the restic data is present? If so, that would imply to me that the pod volumes are annotated. Also, another thing to check here: the change to I double checked out documentation, and it appears that this is something we forgot to add. It's mentioned for new installs, but we should also mention it for upgrades. |
If do kubectl edit deploy/velero -n velero and add "--default-volumes-to-restic" will this flag apply to all backup by default? I want to call --default-volumes-to-restic only for some backups not all. |
It will apply to all backups, yes. It's required to make the client-side flag work right now, as the design goal was to make sure users who were using 100% restic had an easier path. There may be changes we have to make to to accomodate mixed use cases with this flag, but for the moment the solution there is to use the opt-in approach rather than the |
@whitepiratebaku You can also use the |
Is not true. The velero/pkg/controller/backup_controller.go Lines 350 to 352 in e69fac1
Can you also please clarify what changes you were thinking of wrt
|
@whitepiratebaku Would you be open to trying out a development build of Velero, with added instrumentation and debug messages to troubleshoot this further? |
You can also run this command to identify the pod volume that is probably not making progress. $ kubectl -n velero get podvolumebackups -ojson | jq '.items[] |"\(.spec.pod.namespace)/\(.spec.pod.name) \(.status.phase) \(.status.progress)"' Please run this command in a loop for about 30m to watch progress. Here is a handy script for you to use $ for i in {1..1800}; do
kubectl -n velero get podvolumebackups -ojson | jq '.items[] |"\(.spec.pod.namespace)/\(.spec.pod.name) \(.status.phase) \(.status.progress)"' > pvb-progress-$i.txt;
sleep 1;
done Further, you can also use the restic metrics to identify if there is a particular node in your cluster where restic backups may be slowing the backup to a halt. If you don't have prometheus and grafana installed in your cluster you can dump all the restic metrics from the restic daemonset into a file and share that with us for troubleshooting. To do this
NOTE: If |
This issue has been inactive for a while so I'm closing it. |
Hey, I'm facing same behavior on velero 1.5.3, did someone found any working solution? |
We have also run into this in 1.5.3. Seems to be related to projected volume type. Projected volumes seem to fail, some hang. Perhaps this volume type should be ignored like the other unsupported volume types. |
I have the same issue where --default-volumes-to-restic get stuck when I use it with Velero create backup bkp-name --include-namespaces namespace-name. It gets stuck forever and never completes. I am using Velero 1.6. I even tried passing the same parameter --default-volumes-to-restic in velero install command along with --use-restic flag but it failed with error unknown flag. Does anyone have witnessed the same behaviour? I am trying locally with a single node k8s cluster. Without using --default-volumes-to-restic flag, the backups succeeds but it doesn't take the backups of contents of Pod volumes |
I have the same issue since I updated to 1.6. |
I'm having the same issue, seems to get stuck on Current setup:
Here's the full description of the backup:
|
I have the same issue with 1.6.3. |
@ibot3 if you can use CSI it would be a great relief |
The problem occurs as soon as I can't use CSI, because my provider does not support snapshots. |
Same issues here, only occurs when |
This occurs if there are daemonsets missing on nodes that run pods to be backed up, adding tolerations fixed it for me. |
What steps did you take and what happened:
[A clear and concise description of what the bug is, and what commands you ran.)
I have installed velero with --use-restic and version was 1.4.2 ,I was able to create volume backups with opt-in approach by annotating pods.I was looking forward for 1.5.2 release so I can use opt-out methot.I have upgraded velero it works normally When I create normal backups but when I try --default-volumes-to-restic option during backup creation it stucked in "In Progress", describe shows total items 765 but backed up items stuck in 35.Size of Data in volumes are just megabytes.
What did you expect to happen:
I expected it to backup volumes
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)
#2966 describe shows this
Name: full-volume21
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion: v1.16.3
velero.io/source-cluster-k8s-major-version: 1
velero.io/source-cluster-k8s-minor-version: 16
API Version: velero.io/v1
Kind: Backup
Metadata:
Creation Timestamp: 2020-09-23T11:55:00Z
Generation: 10
Resource Version: 1311467
Self Link: /apis/velero.io/v1/namespaces/velero/backups/full-volume21
UID: 5a609312-3fff-4d68-91d4-0b82c302ccb2
Spec:
Default Volumes To Restic: true
Hooks:
Included Namespaces:
*
Storage Location: default
Ttl: 720h0m0s
Status:
Expiration: 2020-10-23T11:55:00Z
Format Version: 1.1.0
Phase: InProgress
Progress:
Items Backed Up: 35
Total Items: 765
Start Timestamp: 2020-09-23T11:55:00Z
Version: 1
Events:
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
velero version
):velero client config get features
):kubectl version
):/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: