New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design doc for Velero Restore progress reporting #3016
Conversation
|
||
## High-Level Design | ||
|
||
We propose to divide the restore process in two steps. The first step will collect all the items to be restored from the backup tarball. It will apply the label selector and include/exclude rules on the resources / items and store them (preserving the priority order) in an in-memory data structure. The second step will read the collected items and restore them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, glad we think we can achieve the 2-step approach
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
} | ||
``` | ||
|
||
We propose to remove the call to `restoreItem()` in the inner most loop and instead store the item in a data structure. Once all the items are collected, we loop through the array of collected items and make a call to `restoreItem()`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I was concerned this would vastly slow down the restore progress, but thinking through it we aren't making API calls to get these resources this first loop process should be fairly fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's really a tradeoff between computation and memory here:
- First step reads from tarball, stores the resources in-memory, second step uses the resources directly from memory
vs - First step reads from tarball, only stores the path to the resource in-memory, second step reads from the tarball again
Both approaches are possible. Depends on what we want weigh more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm included to lean towards option 2 for now - loading all the JSON into memory right now isn't really feasible given our default memory constraints, especially on very large clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[...] | ||
|
||
for namespace, items := range resourceList.ItemsByNamespace { | ||
[...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During this section of the loop in the current implementation, namespaces are handled differently than other resources. We don't explicitly restore them unless a resource is going to be restored into that namespace. They are created if needed, then the resources are restored into that namespace. Are you intending to capture namespaces in the count of items to restore? Will they be restored during the count of items, or will they be left out of the count and restored as they currently are during the main restore loop?
func (ctx *context) getOrderedResourceCollection(...) { | ||
collectedResources := []restoreResource | ||
for _, resource := range getOrderedResources(...) { | ||
[...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
De-duplication of resources happens in this section of the loop. This will also need to be included in this function.
We introduce two new structs to hold the collected items: | ||
|
||
```go | ||
type restoreResource struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing, but both restoreResource
and restoreItem
are used as function names in restore.go
so they will need to be updated or different names should be used here.
|
||
As an alternative, we have considered an approach which doesn't divide the restore process in two steps. | ||
|
||
With that approach, the total number of items will be read from the Backup CR. We will keep three counters, `totalItems`, `skippedItems` and `restoredItems`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this approach would work as the restore skips over resources/namespaces without iterating through to find out how many items are being skipped. We could know the number of items available in the backup to be restored, and how many have been restored so far, but without iterating, I don't know if there's a way to determine how many are being skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty reasonable to me for a first approach! I think longer term, we'll want to flesh out a more complete directed graph structured, but I think for progress reporting, this is a perfectly workable solution at the moment, and I'd be willing to proceed with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @zubron made a couple observations worth noting.
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
This is an initial design doc explaining how progress reporting can be implemented for Velero Restores.