Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataDownload fails during restore when SnapshotMoveData: true for empty PVC workload #7388

Closed
kaovilai opened this issue Feb 5, 2024 · 8 comments · Fixed by #7396
Closed

Comments

@kaovilai
Copy link
Contributor

kaovilai commented Feb 5, 2024

What steps did you take and what happened:

Create workload mounting empty CSI volume
Create Backup With SnapshotMoveData: true
Create Restore

What did you expect to happen:
Application to become available

What happened
kubectl get restore test-restore479 -o jsonpath='{.status.phase}' PartiallyFailed

# oc get datadownload test-restore479-9vwfp -n openshift-adp -oyaml
apiVersion: velero.io/v2alpha1
kind: DataDownload
metadata:
  name: test-restore479-9vwfp
status:
  completionTimestamp: "2023-11-21T06:31:45Z"
  message: 'data path restore failed: Failed to run kopia restore: Unable to load
    snapshot : snapshot not found'
  node: worker-2
  phase: Failed
  progress: {}
  startTimestamp: "2023-11-21T06:30:35Z" 

The following information will help us better understand what's going on:
Velero 1.12

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

This is crosspost for https://issues.redhat.com/browse/OADP-3106

Environment:

  • Velero version (use velero version): 1.12
  • Velero features (use velero client config get features):
  • Kubernetes version (use kubectl version):
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@kaovilai kaovilai changed the title DataDownload fails for empty PVC when SnapshotMoveData: true DataDownload fails during restore when SnapshotMoveData: true for empty PVC workload Feb 5, 2024
@Lyndon-Li
Copy link
Contributor

Good catch, let me try to refactor the code.
Actually, the current code is just a compromise to the compatibility to Restic path and it has some other problems, e.g., the detecting of empty dir is not necessary for Kopia path and it also takes time when the number of sub items in the root dir are huge.

@danfengliu
Copy link
Contributor

With PR #7442 and #7396 merged, this use case is covered by nightly kubeadm vanilla cluster with zfs-localpv internal test pipeline.

@kaovilai
Copy link
Contributor Author

kaovilai commented Mar 8, 2024

@danfengliu is there a case for empty volume?

@danfengliu
Copy link
Contributor

danfengliu commented Mar 8, 2024

Yes, in PR #7396 , there is a CSI datamover backup volume info test case for datamover test, and in PR #7442, I modified this test case to prepare more than 1 PVC and leave some of PVs empty, so if this case is running in kubeadm vanilla cluster with zfs-localpv pipeline (in this pipeline, a new PV is an actual empty PV with no files created automatically by system), your issue will be coved.

As you can see, this pipeline only has 1 failed test case which is [BackupVolumeInfo][CSIDataMover]:

image

This is the count of tests executed by this pipeline:
image

@kaovilai
Copy link
Contributor Author

kaovilai commented Mar 8, 2024

I see. So tests added, just not yet resolved. Thanks!

@hugotms
Copy link

hugotms commented Apr 24, 2024

Hi,

When will this fix be available in a version please ? I see that it is merged on main but not in 1.13 release ^^'

@blackpiglet
Copy link
Contributor

The coming v1.14.0 ships with this fix.
The ETA should be around the end of this May.

@hugotms
Copy link

hugotms commented Apr 24, 2024

Thanks a lot for the quick response !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants