Skip to content

Commit

Permalink
Add the PV backup information design document.
Browse files Browse the repository at this point in the history
Signed-off-by: Xun Jiang <jxun@vmware.com>
  • Loading branch information
Xun Jiang committed Oct 16, 2023
1 parent b4fb2d9 commit 52b3737
Showing 1 changed file with 140 additions and 0 deletions.
140 changes: 140 additions & 0 deletions design/pv_backup_info.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# PersistentVolume backup information design

## Abstract
Create a new metadata file in backup repository directory to store the backup PVC and PV information. The information includes the way of backing up the PVC and PV data, snapshot information, snapshot status. The needed snapshot status can also be recorded there, but the Velero-Native snapshot's plugin doesn't provide a way get the snapshot size from the API, so it's possible that not all snapshot size information is available.

This new additional metadata file is needed when:
* Get a summary of the backup's PVC and PV information, including how the data in them is backed up, or whether the data in them is skipped from backup.
* Find out how the PVC and PV should be restored in restore process.
* Retrieve the PV's snapshot information for backup.

## Background
There is already a [PR](https://github.com/vmware-tanzu/velero/pull/6496) to track the skipped PVC in backup. This design will depend on it and go further to get a summary of PVC and PV information, then persist into a metadata file in backup repository.
In restore process, the Velero server needs to decide how the PV resource should be restored according to how the PV is backed up. The current logic is check whether it's backed up by Velero-native snapshot, by file-system backup, having `DeletionPolicy` set as `Delete`.
The checks are made by the backup-generated PVBs or Snapshots. There is no generic way to find these information, and the CSI backup and Snapshot data movement backup are not covered.
Another thing needs notice is when describing the backup, there is no generic way to find the PV's snapshot information.

## Goals
- Generate backup's PVCs and PVs information.
- Create a generic way to let the Velero server know how the PV resources are backed up.
- Create a generic way to let the Velero server to find the PV corresponding snapshot information.

## Non Goals
- Unify how to get snapshot size information for all PV backing up methods, and all other currently not ready PVs' information.

## High-Level Design
Create <backup-name>-volumes-info.json metadata file in the backup's repository. This file will be encoded to contain the all the PVC and PV information included in the backup. The information covers whether the PV or PVC's data is skipped during backup, how its data is backed up, and the backed-up detail information.
The `restoreItem` function can decode the <backup-name>-volumes-info.json file to determine how to handle the PV resource.

## Detailed Design

### The VolumeInfo structure
<backup-name>-volumes-info.json file is an array of structure `VolumeInfo`.

The `VolumeInfo` definition is:
``` golang
type VolumeInfo struct {
PVCName string // The PVC's name. The format should be <namespace-name>/<PVC-name>
PVName string // The PV name.
BackupMethod string // The way of how the volume data is backed up. The valid value includes `VeleroNativeSnapshot`, `PodVolumeBackup`, `CSISnapshot`, `SnapshotDataMover` and `Skipped`.

// This section is used to generate the skipped summary.
SkippedReason string // The reason of the volume is skipped in the backup.

// This section is used for displaying generic Volume status.
SnapshotHandle string // The actual snapshot ID. It can be the cloud provider's snapshot ID, or the file-system uploader's snapshot.
Status string // The snapshot's final status.
Size int64 // The snapshot corresponding volume size. Some of the volume backup methods cannot retrieve the data by current design, for example, the Velero native snapshot.
StartTimestamp *metav1.Time // Snapshot starts timestamp.
CompletionTimestamp *metav1.Time // Snapshot completes timestamp.

// This section is used for displaying the snapshot data mover and PodVolumeBackup snapshot status.
DataMover string // The name of the data mover that uploads the snapshot data. The valid values are `kopia` and `restic`. It's useful for file-system backup and snapshot data mover.

// This section is used for displaying the Velero native snapshot status.
VolumeType string // The cloud provider snapshot volume type.
VolumeAZ string // The cloud provider snapshot volume's availability zones.
IOPS string // The cloud provider snapshot volume's IOPS.

// This section is used for displaying the PodVolumeBackup snapshot status.
VolumeName string // The PVC's corresponding volume name used by Pod
PodName string // The Pod name mounting this PVC. The format should be <namespace-name>/<pod-name>.
}
```

### How the VolumeInfo array is generated.

Two new methods are added into BackupStore for upload and download the VolumeInfo metadata file.

``` golang
type BackupStore interface {
...
PutVolumeInfos(backup string, volumeInfos io.Reader) error
GetVolumeInfos(name string) ([]*VolumeInfo, error)
...
}
```

There are two phases to handle the VolumeInfo array.
First, the VolumeInfo array is generated in the before backup is persisted by this function `persistBackup`. Then the VolumeInfo array will be passed to `persistBackup` function as a parameter. The VolumeInfo array will be uploaded to backup repository by `PutBackup` method too. At this phase, the basic information, the skipped information, the Velero native snapshot information, the PodVolumeBackup information are ready.

Second, in the backup finalize controller, the snapshot data mover and CSI snapshot information will be updated according to the BackupItemOperations result. The VolumeInfo will be updated by method `PutVolumeInfos`.

### How the VolumeInfo array is used.

#### Generate the PVC backed-up information summary
The upstream tools can use this VolumeInfo array to format and display their volume information. This is in the scope of this feature.

#### Retrieve volume backed-up information for `velero backup describe` command
The `velero backup describe` can also use this VolumeInfo array structure to display the volume information. The snapshot data mover volume should use this structure at first, then the Velero native snapshot and CSI snapshot and PodVolumeBackup can also use this structure. The detail implementation is also not in this feature's scope.

#### Let restore knows how to restore the PV
In function `restoreItem`, it will determine whether to restore the PV resource by checking it in the Velero native snapshots list, PodVolumeBackup list, and its DeletionPolicy.

``` golang
if groupResource == kuberesource.PersistentVolumes {
switch {
case hasSnapshot(name, ctx.volumeSnapshots):
...
case hasPodVolumeBackup(obj, ctx):
...
case hasDeleteReclaimPolicy(obj.Object):
...
default:
...
```
After introducing the VolumeInfo array, the handling PV logic should be changed to something like the following.
``` golang
if groupResource == kuberesource.PersistentVolumes {
volumeInfo := GetVolumeInfo(pvName)
switch volumeInfo.BackupMethod {
case VeleroNativeSnapshot:
...
case PodVolumeBackup:
...
case CSISnapshot:
...
case SnapshotDataMover:
...
case Skipped:
fallthrough
default:
// restore the PV resource.
```
## Alternatives Considered
Restore process needs more information about how the PVs are backed up to determine whether ths PV should be restored. The released branches also needs the similar function, but backport a new feature into previous releases may not be a good idea, so according to [Anshul Ahuja suggestion](https://github.com/vmware-tanzu/velero/issues/6595#issuecomment-1731081580), adding more cases here to support checking PV backed-up by CSI plugin and CSI snapshot data mover: https://github.com/vmware-tanzu/velero/blob/main/pkg/restore/restore.go#L1206-L1324.

Check failure on line 128 in design/pv_backup_info.md

View workflow job for this annotation

GitHub Actions / Run Codespell

ths ==> the, this
## Security Considerations
There should be no security impact introduced by this design.
## Compatibility
After this design is implemented, there should be no impact on the existing [skipped PVC summary feature](https://github.com/vmware-tanzu/velero/pull/6496).
## Implementation
This will be implemented in the Velero v1.13 development cycle.
## Open Issues
There is no open issues identified by now.

0 comments on commit 52b3737

Please sign in to comment.