Skip to content

Commit

Permalink
propose restore progress
Browse files Browse the repository at this point in the history
Signed-off-by: Pranav Gaikwad <pgaikwad@redhat.com>
  • Loading branch information
pranavgaikwad committed Oct 20, 2020
1 parent 28a46d3 commit cd70ef4
Showing 1 changed file with 220 additions and 0 deletions.
220 changes: 220 additions & 0 deletions design/restore-progress.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# Restore progress reporting

Velero _Backup_ resource provides real-time progress of an ongoing backup by means of a _Progress_ field in the CR. Velero _Restore_, on the other hand, only shows one of the phases (InProgress, Completed, PartiallyFailed, Failed) of the ongoing restore. In this document, we propose detailed progress reporting for Velero _Restore_. With the introduction of the proposed _Progress_ field, Velero _Restore_ CR will look like:

```yml
apiVersion: velero.io/v1
kind: Restore
metadata:
name: test-restore
namespace: velero
spec:
[...]
status:
phase: InProgress
progress:
itemsRestored: 100
totalItems: 140
```

## Goals

- Enable progress reporting for Velero Restore

## Non Goals

- Estimate time to completion

## Background

The current _Restore_ CR lets users know whether a restore is in-progress or completed (failed/succeeded). While this basic piece of information is useful to the end user, there seems to be room for improvement in the user experience. The _Restore_ CR can show detailed progress in terms of the number of resources restored so far and the total number of resources to be restored. This will be particularly useful for restores that run for a longer duration of time. Such progress reporting already exists for Velero _Backup_. This document proposes similar implementation for Velero _Restore_.

## High-Level Design

We propose to divide the restore process in two steps. The first step will collect all the items to be restored from the backup tarball. It will apply the label selector and include/exclude rules on the resources / items and store them (preserving the priority order) in an in-memory data structure. The second step will read the collected items and restore them.

## Detailed Design

### Progress struct

A new struct will be introduced to store progress information:

```go
type RestoreProgress struct {
TotalItems int `json:"totalItems,omitempty`
ItemsRestored int `json:"itemsRestored,omitempty`
}
```

`RestoreStatus` will include the above struct:

```go
type RestoreStatus struct {
[...]

Progress *RestoreProgress `json:"progress,omitempty"`
}
```

### Modifications to restore.go

Currently, the restore process works by looping through the resources in the backup tarball and restoring them one-by-one in the same pass:

```go
func (ctx *context) execute(...) {
[...]

for _, resource := range getOrderedResources(...) {
[...]

for namespace, items := range resourceList.ItemsByNamespace {
[...]

for _, item := range items {
[...]

// restore item here
w, e := restoreItem(...)
}
}
}
}
```

We propose to remove the call to `restoreItem()` in the inner most loop and instead store the item in a data structure. Once all the items are collected, we loop through the array of collected items and make a call to `restoreItem()`:

```go
func (ctx *context) getOrderedResourceCollection(...) {
collectedResources := []restoreResource
for _, resource := range getOrderedResources(...) {
[...]

for namespace, items := range resourceList.ItemsByNamespace {
[...]
collectedResource := restoreResource{}
for _, item := range items {
[...]

// store item in a data structure
collectedResource.itemsByNamespace[originalNamespace] = append(collectedResource.itemsByNamespace[originalNamespace], item)
}
}
collectedResources.append(collectedResources, collectedResource)
}
return collectedResources
}

func (ctx *context) execute(...) {
[...]

// get all items
resources := ctx.getOrderedResourceCollection(...)

for _, resource := range resources {
[...]

for _, items := range resource.itemsByNamespace {
[...]

for _, item := range items {
[...]

// restore the item
w, e := restoreItem(...)
}
}
}

[...]
}
```

We introduce two new structs to hold the collected items:

```go
type restoreResource struct {
resource string
itemsByNamespace map[string][]restoreItem
totalItems int
}

type restoreItem struct {
targetNamespace string
name string
}
```

Each group resource is represented by `restoreResource`. The map `itemsByNamespace` is indexed by `originalNamespace`, and the values are list of `items` in the original namespace. `totalItems` is simply the count of all items which are present in the nested map of namespace and items. It is updated every time an item is added to the map. Each item represented by `restoreItem` has `name` and the resolved `targetNamespace`.

### Calculating progress

The total number of items can be calculated by simply adding the number of total items present in the map of all resources.

```go
totalItems := 0

for _, resource := range collectedResources {
totalItems += resource.totalItems
}
```

The additional items returned by the plugins will still be discovered at the time of plugin execution. The number of `totalItems` will be adjusted to include such additional items. As a result, the number of total items is expected to change whenever plugins execute:

```go
i := 0
for _, resource := range resources {
[...]

for _, items := range resource.itemsByNamespace {
[...]

for _, item := range items {
[...]

// restore the item
w, e := restoreItem(...)
i++
// calculate the actual count of resources
actualTotalItems := len(ctx.restoredItems) + (totalItems - i)
}
}
}
```

### Updating progress

The updates to the `progress` field in the CR can be sent on a channel as soon as an item is restored. A goroutine receiving update on that channel can make an `Update()` call to update the _Restore_ CR. This will require us to pass an instance of `RestoresGetter` to the `kubernetesRestorer` struct.


## Alternatives Considered

As an alternative, we have considered an approach which doesn't divide the restore process in two steps.

With that approach, the total number of items will be read from the Backup CR. We will keep three counters, `totalItems`, `skippedItems` and `restoredItems`:

```yml
status:
phase: InProgress
progress:
totalItems: 100
skippedItems: 20
restoredItems: 79
```

This approach doesn't require us to find the number of total items beforehand.

## Security Considerations

Omitted

## Compatibility

Omitted

## Implementation

TBD

## Open Issues

https://github.com/vmware-tanzu/velero/issues/21

0 comments on commit cd70ef4

Please sign in to comment.