-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataMover - Failed to complete restore , few datadownloads are stuck 'InProgress' status #6733
Comments
The error is as below: The restore PV is bound by another restore PVC instead of the target PVC. |
Thanks @Lyndon-Li According to the above error output from DEBUG/datadownloads_issue.log Thu Aug 31 07:03:06 UTC 2023 |
@duduvaa
|
Bug verified OK. Thank you @Lyndon-Li for the quick fix |
What steps did you take and what happened:
Running datamover restore of 100 PVs. A few 'datadownloads' resources are stuck in 'InProgress' status although the total bytes already done. The restore CR status is 'WaitingForPluginOperations' till the 4hrs timeout is reached.
100 PVC were created - 98 are 'Bound' and 2 are 'Pending'.
Steps & Commands:
datadownloadsissue.tar.gz
./velero backup create dm-backup-100pods-2gb --include-namespaces perf-busy-data-cephrbd-100pods-2gb --data-mover "velero" --snapshot-move-data=true -nupstream-velero
Backup completed in 12:05 minutes
./velero restore create dm-restore-100pods-2gb --from-backup dm-backup-100pods-2gb -nupstream-velero
98 restore completed
2 restore failed ( 2 datadownloads resources are 'InProgress' status.
What did you expect to happen:
Restore all pods & PVs successfully
The following information will help us better understand what's going on:
./velero debug --restore dm-restore-100pods-2gb -nupstream-velero
2023/08/31 11:05:54 Collecting velero resources in namespace: upstream-velero
2023/08/31 11:05:55 Collecting velero deployment logs in namespace: upstream-velero
2023/08/31 11:05:58 Collecting log and information for restore: dm-restore-100pods-2gb
2023/08/31 11:05:58 Generated debug information bundle: bundle-2023-08-31-11-05-54.tar.gz
DEBUG folder (attached) includes:
DM-Data folder includes all datadownloads yaml file (oc get datadownloads {NAME} -nupstream-velero -oyaml command)
Anything else you would like to add:
I ran the same test with 50, 80 ,90, 100 , 120 PVs per namespace.
50 & 80 PVs - Completed successfully
90 , 100 & 120 - Restore failed.
Environment:
Velero version: main (Velero-1.12) , last commit
commit 30e54b0 (HEAD -> main, origin/main, origin/HEAD)
Author: Daniel Jiang jiangd@vmware.com
Date: Wed Aug 16 15:45:00 2023 +0800
Velero features (use
velero client config get features
):./velero client config get features
features:
Kubernetes version (use
kubectl version
):oc version
Client Version: 4.12.9
Kustomize Version: v4.5.7
Server Version: 4.12.9
Kubernetes Version: v1.25.7+eab9cc9
OCP running over BM servers
3 masters & 6 workers nodes
oc get nodes
NAME STATUS ROLES AGE VERSION
master-0 Ready control-plane,master 148d v1.25.7+eab9cc9
master-1 Ready control-plane,master 148d v1.25.7+eab9cc9
master-2 Ready control-plane,master 148d v1.25.7+eab9cc9
worker000-r640 Ready worker 148d v1.25.7+eab9cc9
worker001-r640 Ready worker 148d v1.25.7+eab9cc9
worker002-r640 Ready worker 148d v1.25.7+eab9cc9
worker003-r640 Ready worker 148d v1.25.7+eab9cc9
worker004-r640 Ready worker 148d v1.25.7+eab9cc9
worker005-r640 Ready worker 148d v1.25.7+eab9cc9
/etc/os-release
):Red Hat Enterprise Linux CoreOS 412.86.202303211731-0
Part of OpenShift 4.12, RHCOS is a Kubernetes native operating system
cat /etc/os-release
NAME="Red Hat Enterprise Linux CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="412.86.202303211731-0"
VERSION_ID="4.12"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 412.86.202303211731-0 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.12/"
BUG_REPORT_URL="https://access.redhat.com/labs/rhir/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.12"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.12"
OPENSHIFT_VERSION="4.12"
RHEL_VERSION="8.6"
OSTREE_VERSION="412.86.202303211731-0"
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.
The text was updated successfully, but these errors were encountered: