New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore progress reporting #3125
Conversation
2d764c0
to
0aeba45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Could we add the restore progress reporting at
velero restore describe
, same as the backup progress report atvelero backup describe
, something like:Total items to be restored from: Items restored from:
-
I create a backup resource, at backup reporting, the total items are 106. However, when I restore from the backup resource, the total times are 22, why? 🤔
@jenting Thanks for pointing that out. I will take a look, my first guess is that maybe those resources were skipped during restore. I will confirm though. |
0aeba45
to
b30819d
Compare
@jenting In my environment, I didn't see a huge difference in the number of items backed up vs restored: apiVersion: velero.io/v1
kind: Backup
metadata:
[...]
labels:
velero.io/storage-location: default
name: bkp-1
namespace: velero
[...]
spec:
[...]
includedNamespaces:
- test-bkp
- test-bkp-2
status:
[...]
phase: Completed
progress:
itemsBackedUp: 389
totalItems: 389
startTimestamp: "2021-01-22T19:13:06Z" apiVersion: velero.io/v1
kind: Restore
metadata:
name: bkp-1-20210122141607
namespace: velero
[...]
spec:
backupName: bkp-1
excludedResources:
- nodes
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
- resticrepositories.velero.io
hooks: {}
includedNamespaces:
- '*'
status:
completionTimestamp: "2021-01-22T19:16:29Z"
phase: Completed
progress:
itemsRestored: 387
totalItems: 387
startTimestamp: "2021-01-22T19:16:08Z" The difference of 2 items in Backup and Restore comes from the fact that this PR doesn't include namespaces in the count of items restored. There is a way to update this function to return the number of ns it creates so that ns'es can be included in the count of items. I will look into that. I'd be interested to know what kind of resources you tried to restore when you saw this mismatch in numbers. |
I forgot the original test steps but I could provide a new one. Steps:
Total items to be backed up: 31
Items backed up: 31
Resource List:
apps/v1/Deployment:
- nginx/nginx-deployment
apps/v1/ReplicaSet:
- nginx/nginx-deployment-66b6c48dd5
v1/Event:
- nginx/nginx-deployment-66b6c48dd5-9l4n9.165dccc49aae8161
- nginx/nginx-deployment-66b6c48dd5-9l4n9.165dccc4b5fe1f95
- nginx/nginx-deployment-66b6c48dd5-9l4n9.165dccc4bac83caf
- nginx/nginx-deployment-66b6c48dd5-9l4n9.165dccc4c2609ca9
- nginx/nginx-deployment-66b6c48dd5-9l4n9.165dcccc5f50a571
- nginx/nginx-deployment-66b6c48dd5-czxj5.165dccd10a7cede5
- nginx/nginx-deployment-66b6c48dd5-czxj5.165dccd126e35f36
- nginx/nginx-deployment-66b6c48dd5-czxj5.165dccd13023360e
- nginx/nginx-deployment-66b6c48dd5-czxj5.165dccd13700b6cf
- nginx/nginx-deployment-66b6c48dd5-kwwmn.165dccd1083e8e3f
- nginx/nginx-deployment-66b6c48dd5-kwwmn.165dccd12510c30b
- nginx/nginx-deployment-66b6c48dd5-kwwmn.165dccd130b5c888
- nginx/nginx-deployment-66b6c48dd5-kwwmn.165dccd136ed2aa2
- nginx/nginx-deployment-66b6c48dd5-ph7tc.165dccc49ab2bd34
- nginx/nginx-deployment-66b6c48dd5-ph7tc.165dccc4b9fc9cf2
- nginx/nginx-deployment-66b6c48dd5-ph7tc.165dccc4c1536718
- nginx/nginx-deployment-66b6c48dd5-ph7tc.165dccc4c8c0fc9a
- nginx/nginx-deployment-66b6c48dd5-ph7tc.165dcccc5ec2d1b3
- nginx/nginx-deployment-66b6c48dd5.165dccc49802b570
- nginx/nginx-deployment-66b6c48dd5.165dccc4992f8858
- nginx/nginx-deployment-66b6c48dd5.165dccd107609102
- nginx/nginx-deployment-66b6c48dd5.165dccd10865dc6d
- nginx/nginx-deployment.165dccc4964f2922
- nginx/nginx-deployment.165dccd10652f80c
v1/Namespace:
- nginx
v1/Pod:
- nginx/nginx-deployment-66b6c48dd5-czxj5
- nginx/nginx-deployment-66b6c48dd5-kwwmn
v1/Secret:
- nginx/default-token-qwsgw
v1/ServiceAccount:
- nginx/default
spec:
backupName: backup-nginx
excludedResources:
- nodes
- events
- events.events.k8s.io
- backups.velero.io
- restores.velero.io
- resticrepositories.velero.io
hooks: {}
includedNamespaces:
- nginx
status:
completionTimestamp: "2021-01-26T14:06:10Z"
phase: Completed
progress:
itemsRestored: 6
totalItems: 6
startTimestamp: "2021-01-26T14:06:09Z" |
@jenting As the events are excluded from the restore, 24 instances of the v1/Event resource present in the backup were not restored and hence the difference in the number. Please also note that I am working on a way to include namespaces in the count. |
Cool, thank you for the explanation. I'd love that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand how this is all working, but I'd like some more comments, as the restore code can be a little tricky :)
Overall I think it makes sense, though, and I appreciate your efforts here!
And finally, it needs a rebase before we'll be able to merge.
pkg/restore/restore.go
Outdated
return | ||
} | ||
|
||
func (ctx *restoreContext) getSelectedRestoreableItems(resource, targetNamespace, originalNamespace string, items []string) (res restoreableResource, warnings Result, errs Result) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this code correctly, getOrderedResourceCollection
is very similar to the previous restore loop in that it's applying the selection criteria like namespace, and includes/excludes. However, the change here is that this function, getSelectedRestoreableItems
then groups those resources by namespace and applies the Kubernetes label selection. It replaces some of the functionality of the restoreResource
function, which should be removed since it's no longer relevant with this PR, from what I can tell.
Do I have that right? If so please add a comment to clarify that's what this function is doing, so we can keep track of it.
Also, based on this code it looks like we're unmarshaling twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nrb that is correct. We are unmarshling twice. Unmarshling is needed at first to apply selection criteria, and then finally to actually restore the object. Another alternative was to keep entire object in-memory until it's restored which isnt very attractive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and your'e correct about the duplicate code from restoreResource function. thanks for pointing that out. will clean it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative was to keep entire object in-memory until it's restored which isnt very attractive.
Right, that is the trade we'd have to make. I honestly think we will end up doing this at some point, though, as we'd like to form a graph of objects. For now I think this will work and we can revisit the unmarshalling mechanisms in the future.
b30819d
to
c09035c
Compare
0401254
to
a5b5a18
Compare
@nrb when you have a moment, request you to take a look at new code I added to include namespaces in the total count. That change updates |
a5b5a18
to
7db5a2d
Compare
@pranavgaikwad Looks like there was a conflict with the colorized output PR that's affecting this. |
Sure will rebase again. |
7db5a2d
to
b527a8a
Compare
if namespace != "" && !existingNamespaces.Has(selectedItem.targetNamespace) { | ||
logger := ctx.log.WithField("namespace", namespace) | ||
ns := getNamespace(logger, archive.GetItemFilePath(ctx.restoreDir, "namespaces", "", namespace), selectedItem.targetNamespace) | ||
if _, nsCreated, err := kube.EnsureNamespaceExistsAndIsReady(ns, ctx.namespaceClient, ctx.resourceTerminatingTimeout); err != nil { | ||
errs.AddVeleroError(err) | ||
continue | ||
} else { | ||
// add the newly created namespace to the list of restored items | ||
if nsCreated { | ||
itemKey := velero.ResourceIdentifier{ | ||
GroupResource: kuberesource.Namespaces, | ||
Namespace: ns.Namespace, | ||
Name: ns.Name, | ||
} | ||
ctx.restoredItems[itemKey] = struct{}{} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block code looks similar to
Lines 794 to 811 in b527a8a
// if the namespace scoped resource should be restored, ensure that the namespace into | |
// which the resource is being restored into exists. | |
// This is the *remapped* namespace that we are ensuring exists. | |
nsToEnsure := getNamespace(ctx.log, archive.GetItemFilePath(ctx.restoreDir, "namespaces", "", obj.GetNamespace()), namespace) | |
if _, nsCreated, err := kube.EnsureNamespaceExistsAndIsReady(nsToEnsure, ctx.namespaceClient, ctx.resourceTerminatingTimeout); err != nil { | |
errs.AddVeleroError(err) | |
return warnings, errs | |
} else { | |
// add the newly created namespace to the list of restored items | |
if nsCreated { | |
itemKey := velero.ResourceIdentifier{ | |
GroupResource: kuberesource.Namespaces, | |
Namespace: nsToEnsure.Namespace, | |
Name: nsToEnsure.Name, | |
} | |
ctx.restoredItems[itemKey] = struct{}{} | |
} | |
} |
And the code logic would run twice. I'm not sure what's the difference between these two or could we merge them into one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jenting This duplication is required because restoreItem()
can be called for additional items returned by plugins. The recursive call in that case only triggers restoreItem() function alone and not the rest of the previous logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@pranavgaikwad I just realized there is a conflict file. Please resolve it if you have time. 🙇 |
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
b527a8a
to
b4b7890
Compare
I'll be reviewing this tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @pranavgaikwad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
* restore progress reporting Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com> * add restore statistics to describe restore Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com> * address feedback, include namespaces in the count Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
@pranavgaikwad, @jenting, @nrb, @carlisia I have seen this issue again in my cluster which is on I have restored in same cluster with
|
* restore progress reporting Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com> * add restore statistics to describe restore Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com> * address feedback, include namespaces in the count Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
* restore progress reporting Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com> * add restore statistics to describe restore Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com> * address feedback, include namespaces in the count Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
This PR introduces progress reporting for Velero Restore.
Closes #21
The changes in this PR follow the design doc : https://github.com/vmware-tanzu/velero/blob/main/design/restore-progress.md
Summary of changes :
velero describe
command to show restore progress informationEnsureNamespaceExistsAndIsReady
method to return an additional boolean to indicate whether a new Namespace was created or not. If new ns created, it will be added to list of restored items.