Design doc for Velero Restore progress reporting #3016

pranavgaikwad · 2020-10-19T21:51:31Z

This is an initial design doc explaining how progress reporting can be implemented for Velero Restores.

dymurray · 2020-10-20T15:50:24Z

design/restore-progress.md

+
+## High-Level Design
+
+We propose to divide the restore process in two steps. The first step will collect all the items to be restored from the backup tarball. It will apply the label selector and include/exclude rules on the resources / items and store them (preserving the priority order) in an in-memory data structure. The second step will read the collected items and restore them. 


Nice, glad we think we can achieve the 2-step approach

Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>

dymurray · 2020-10-20T16:30:41Z

design/restore-progress.md

+}
+```
+
+We propose to remove the call to `restoreItem()` in the inner most loop and instead store the item in a data structure. Once all the items are collected, we loop through the array of collected items and make a call to `restoreItem()`:


At first I was concerned this would vastly slow down the restore progress, but thinking through it we aren't making API calls to get these resources this first loop process should be fairly fast.

There's really a tradeoff between computation and memory here:

First step reads from tarball, stores the resources in-memory, second step uses the resources directly from memory
vs

First step reads from tarball, only stores the path to the resource in-memory, second step reads from the tarball again

Both approaches are possible. Depends on what we want weigh more

I'm included to lean towards option 2 for now - loading all the JSON into memory right now isn't really feasible given our default memory constraints, especially on very large clusters.

jenting

LGTM

zubron · 2020-11-19T19:29:24Z

design/restore-progress.md

+        [...]
+
+        for namespace, items := range resourceList.ItemsByNamespace {
+            [...]


During this section of the loop in the current implementation, namespaces are handled differently than other resources. We don't explicitly restore them unless a resource is going to be restored into that namespace. They are created if needed, then the resources are restored into that namespace. Are you intending to capture namespaces in the count of items to restore? Will they be restored during the count of items, or will they be left out of the count and restored as they currently are during the main restore loop?

zubron · 2020-11-19T19:36:41Z

design/restore-progress.md

+func (ctx *context) getOrderedResourceCollection(...) {
+    collectedResources := []restoreResource
+    for _, resource := range getOrderedResources(...) {
+        [...]


De-duplication of resources happens in this section of the loop. This will also need to be included in this function.

zubron · 2020-11-19T19:37:37Z

design/restore-progress.md

+We introduce two new structs to hold the collected items:
+
+```go
+type restoreResource struct {


Small thing, but both restoreResource and restoreItem are used as function names in restore.go so they will need to be updated or different names should be used here.

zubron · 2020-11-19T19:43:30Z

design/restore-progress.md

+
+As an alternative, we have considered an approach which doesn't divide the restore process in two steps. 
+
+With that approach, the total number of items will be read from the Backup CR. We will keep three counters, `totalItems`, `skippedItems` and `restoredItems`:


I don't know if this approach would work as the restore skips over resources/namespaces without iterating through to find out how many items are being skipped. We could know the number of items available in the backup to be restored, and how many have been restored so far, but without iterating, I don't know if there's a way to determine how many are being skipped.

nrb

This looks pretty reasonable to me for a first approach! I think longer term, we'll want to flesh out a more complete directed graph structured, but I think for progress reporting, this is a perfectly workable solution at the moment, and I'd be willing to proceed with it.

carlisia

lgtm. @zubron made a couple observations worth noting.

Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>

github-actions bot requested review from ashish-amarnath, carlisia, nrb and zubron October 19, 2020 21:52

github-actions bot assigned pranavgaikwad Oct 19, 2020

pranavgaikwad force-pushed the main branch from 2aaae18 to 6eb68df Compare October 19, 2020 21:52

pranavgaikwad changed the title ~~Propose progress reporting for Velero Restore~~ Design doc for Velero Restore progress reporting Oct 19, 2020

dymurray reviewed Oct 20, 2020

View reviewed changes

nrb added the Area/Design Design Documents label Oct 20, 2020

pranavgaikwad force-pushed the main branch from 6eb68df to 29843b3 Compare October 20, 2020 16:25

propose restore progress

cd70ef4

Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>

pranavgaikwad force-pushed the main branch from 29843b3 to cd70ef4 Compare October 20, 2020 16:26

dymurray reviewed Oct 20, 2020

View reviewed changes

pranavgaikwad mentioned this pull request Oct 26, 2020

Progress Reporting Tracking Issue migtools/mig-controller#686

Closed

8 tasks

jenting approved these changes Nov 11, 2020

View reviewed changes

zubron reviewed Nov 19, 2020

View reviewed changes

nrb approved these changes Nov 19, 2020

View reviewed changes

carlisia approved these changes Nov 19, 2020

View reviewed changes

carlisia merged commit a757304 into vmware-tanzu:main Nov 19, 2020

georgettica mentioned this pull request Dec 23, 2020

georgettica/bump external snapshotter version #3201

Closed

georgettica pushed a commit to georgettica/velero that referenced this pull request Dec 23, 2020

propose restore progress (vmware-tanzu#3016)

5580150

Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>

georgettica pushed a commit to georgettica/velero that referenced this pull request Jan 26, 2021

propose restore progress (vmware-tanzu#3016)

aa29ad1

Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>

vadasambar pushed a commit to vadasambar/velero that referenced this pull request Feb 3, 2021

propose restore progress (vmware-tanzu#3016)

21475c9

Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>

dharmab pushed a commit to dharmab/velero that referenced this pull request May 25, 2021

propose restore progress (vmware-tanzu#3016)

8130c33

Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>

ywk253100 pushed a commit to ywk253100/velero that referenced this pull request Jun 29, 2021

propose restore progress (vmware-tanzu#3016)

c301cad

Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>

gyaozhou pushed a commit to gyaozhou/velero-read that referenced this pull request May 14, 2022

propose restore progress (vmware-tanzu#3016)

f750224

Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design doc for Velero Restore progress reporting #3016

Design doc for Velero Restore progress reporting #3016

pranavgaikwad commented Oct 19, 2020

dymurray Oct 20, 2020

dymurray Oct 20, 2020

pranavgaikwad Nov 12, 2020 •

edited

nrb Nov 19, 2020

jenting left a comment

zubron Nov 19, 2020

zubron Nov 19, 2020

zubron Nov 19, 2020

zubron Nov 19, 2020

nrb left a comment

carlisia left a comment


		## High-Level Design

		We propose to divide the restore process in two steps. The first step will collect all the items to be restored from the backup tarball. It will apply the label selector and include/exclude rules on the resources / items and store them (preserving the priority order) in an in-memory data structure. The second step will read the collected items and restore them.


		As an alternative, we have considered an approach which doesn't divide the restore process in two steps.

		With that approach, the total number of items will be read from the Backup CR. We will keep three counters, `totalItems`, `skippedItems` and `restoredItems`:

Design doc for Velero Restore progress reporting #3016

Design doc for Velero Restore progress reporting #3016

Conversation

pranavgaikwad commented Oct 19, 2020

dymurray Oct 20, 2020

Choose a reason for hiding this comment

dymurray Oct 20, 2020

Choose a reason for hiding this comment

pranavgaikwad Nov 12, 2020 • edited

Choose a reason for hiding this comment

nrb Nov 19, 2020

Choose a reason for hiding this comment

jenting left a comment

Choose a reason for hiding this comment

zubron Nov 19, 2020

Choose a reason for hiding this comment

zubron Nov 19, 2020

Choose a reason for hiding this comment

zubron Nov 19, 2020

Choose a reason for hiding this comment

zubron Nov 19, 2020

Choose a reason for hiding this comment

nrb left a comment

Choose a reason for hiding this comment

carlisia left a comment

Choose a reason for hiding this comment

pranavgaikwad Nov 12, 2020 •

edited