Skip to content

Commit

Permalink
Unified Repository Design
Browse files Browse the repository at this point in the history
  • Loading branch information
Lyndon-Li committed May 19, 2022
1 parent e0a3f83 commit 2869223
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 24 deletions.
Binary file modified design/unified-repo-and-kopia-integration/br-workflow.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Expand Up @@ -31,7 +31,7 @@ Moreover, as reflected by user-reported issues, Restic seems to have many perfor

On the other hand, based on a previous analysis and testing, we found that Kopia has better performance, with more features and more suitable to fulfill Velero’s repository targets (Kopia’s architecture divides modules more clearly according to their responsibilities, every module plays a complete role with clear interfaces. This makes it easier to take individual modules to Velero without losing critical functionalities).

## Goal
## Goals

- Define a Unified Repository Interface that various data movers could interact with. This is for below purposes:
- All kinds of data movers acquire the same set of backup repository capabilities very easily
Expand All @@ -44,7 +44,7 @@ On the other hand, based on a previous analysis and testing, we found that Kopia
- Use the existing logic or add new logic to manage the unified repository and Kopia uploader
- Preserve the legacy Restic path, this is for the consideration of backward compatibility

## No-Goal
## Non-Goals

- The Unified Repository supports all kinds of data movers to save logic objects into it. How these logic objects are organized for a specific data mover (for example, how a volume’s block data is organized and represented by a unified repository object) should be included in the related data mover design.
- At present, Velero saves Kubernetes resources, backup metedata, debug logs separately. Eventually, we want to save them in the Unified Repository. How to organize these data into the Unified Repository should be included in a separate design.
Expand All @@ -56,10 +56,10 @@ On the other hand, based on a previous analysis and testing, we found that Kopia
Below shows the primary modules and their responsibilities:

- Kopia uploader is used as a generic file system data mover, so it could move all file system data either from the production PV (as Velero’s Pod Volume Backup does), or from any kind of snapshot (i.e., CSI snapshot).
- Kopia uploader, the same as other data movers, calls the Unified Repository Interface – udmrepo.BackupRepo to write/read data to/from the Unified Repository.
- Kopia uploader, the same as other data movers, calls the Unified Repository Interface to write/read data to/from the Unified Repository.
- Kopia repository layers, CAOS and CABS, work as the backup repository and expose the Kopia Repository interface.
- A Kopia Repository Library works as an adapter between udmrepo.BackupRepo and Kopia Repository interface. Specifically, it implements udmrepo.BackupRepo interface and calls Kopia Repository interface.
- At present, there is only one kind of backup repository -- Kopia Repository. If a new backup repository/storage is required, we need to create a new Library as an adapter to udmrepo.BackupRepo
- A Kopia Repository Library works as an adapter between Unified Repository Interface and Kopia Repository interface. Specifically, it implements Unified Repository Interface and calls Kopia Repository interface.
- At present, there is only one kind of backup repository -- Kopia Repository. If a new backup repository/storage is required, we need to create a new Library as an adapter to the Unified Repository Interface
- At present, the Kopia Repository works as a single piece in the same process of the caller, in future, we may run its CABS into a dedicated process or node.
- At present, we don’t have a requirement to extend the backup repository, if needed, an extra module could be added as an upper layer into the Unified Repository without changing the data movers.

Expand All @@ -80,16 +80,16 @@ For Unified Repository Object/Manifest, a brief guidance to data movers are as b
Velero by default uses the Unified Repository for all kinds of data movement, it is also able to integrate with other data movement paths from any party, for any purpose. Details are concluded as below:

- Build-in Data Path: this is the default data movement path, which uses Velero build-in data movers to backup/restore workloads, the data is written to/read from the Unified Repository.
- Data Mover Replacement: Any party could write its own data movers and plug them into Velero. Meanwhile, these plugin data movers could also write/read data to/from Velero’s Unified Repository so that these data movers could expose the same capabilities that provided by the Unified Repository. In order to do this, the data mover providers need to call the udmrepo.BackupRepo interfaces from inside their plugin data movers.
- Data Mover Replacement: Any party could write its own data movers and plug them into Velero. Meanwhile, these plugin data movers could also write/read data to/from Velero’s Unified Repository so that these data movers could expose the same capabilities that provided by the Unified Repository. In order to do this, the data mover providers need to call the Unified Repository Interface from inside their plugin data movers.
- Data Path Replacement: Some vendors may already have their own data movers and backup repository and they want to replace Velero’s entire data path (including data movers and backup repository). In this case, the providers only need to implement their plugin data movers, all the things downwards are a black box to Velero and managed by providers themselves (including API call, data transport, installation, life cycle management, etc.). Therefore, this case is out of the scope of Unified Repository.
![A Scope](scope.png)

# Detailed Design

## The udmrepo.BackupRepo Interface

## The Unified Repository Interface
Below are the definitions of the Unified Repository Interface
```
///BackupRepoService is used to create or open a backup repository
///BackupRepoService is used to initialize, open or maintain a backup repository
type BackupRepoService interface {
///Create a backup repository or connect to an existing backup repository
///repoOption: option to the backup repository and the underlying backup storage
Expand All @@ -103,7 +103,7 @@ type BackupRepoService interface {
///Periodically called to maintain the backup repository to eliminate redundant data and improve performance
///config: options to open the backup repository and the underlying storage
Maintain(config map[string]stringl) error
Maintain(config map[string]string) error
}
///BackupRepo provides the access to the backup repository
Expand Down Expand Up @@ -156,9 +156,24 @@ type ObjectWriter interface {
///Wait for the completion of the object write
///Result returns the object's unified identifier after the write completes
Result() (ID, error)
}
}
```

Some data structure & constants used by the interfaces:
Some data structure & constants used by the interfaces:
```
type RepoOptions struct {
///A repository specific string to identify a backup storage, i.e., "s3", "filesystem"
StorageType string
///Backup repository password, if any
RepoPassword string
///A custom path to save the repository's configuration, if any
ConfigFilePath string
///Other repository specific options
GeneralOptions map[string]string
///Storage specific options
StorageOptions map[string]string
}
///ObjectWriteOptions defines the options when creating an object for write
type ObjectWriteOptions struct {
FullPath string ///Full logical path of the object
Expand Down Expand Up @@ -195,7 +210,7 @@ type RepoManifest struct {
type ManifestFilter struct {
Labels map[string]string
}
```
```

## Workflow

Expand All @@ -214,18 +229,18 @@ In the new design, we will have separate and independent modules/logics for back

The Repository Provider and Uploader Provider use an option called “Legacy” to choose the path --- Restic Repository vs. Unified Repository or Restic Uploader vs. Kopia Uploader. Specifically, if Legacy = true, Repository Provider will manage Restic Repository only, otherwise, it manages Unified Repository only; if Legacy = true, Uploader Provider calls Restic to do the BR, otherwise, it calls Kopia to do the BR.
In order to manage Restic Repository, the Repository Provider calls Restic Repository Provider, the latter invokes the existing Restic CLIs.
In order to manage Unified Repository, the Repository Provider calls Unified Repository Provider, the latter calls the Unified Repository module through the udmrepo.BackupRepo interface. It doesn’t know how the Unified Repository is implemented necessarily.
In order to manage Unified Repository, the Repository Provider calls Unified Repository Provider, the latter calls the Unified Repository module through the udmrepo.BackupRepoService interface. It doesn’t know how the Unified Repository is implemented necessarily.
In order to use Restic to do BR, the Uploader Provider calls Restic Uploader Provider, the latter invokes the existing Restic CLIs.
In order to use Kopia to do BR, the Uploader Provider calls Kopia Uploader Provider, the latter do the following things:

- Call Unified Repository through the udmrepo.BackupRepo interface to open the unified repository for read/write. Again, it doesn’t know how the Unified Repository is implemented necessarily. It gets a BackupRepo’s read/write handle after the call succeeds
- Call Unified Repository through the udmrepo.BackupRepoService interface to open the unified repository for read/write. Again, it doesn’t know how the Unified Repository is implemented necessarily. It gets a BackupRepo’s read/write handle after the call succeeds
- Wrap the BackupRepo handle into a Kopia Shim which implements Kopia Repository interface
- Call the Kopia Uploader. Kopia Uploader is a Kopia module without any change, so it only understands Kopia Repository interface
- Kopia Uploader starts to backup/restore the corresponding PV’s file system data and write/read data to/from the provided Kopia Repository implementation, that is, Kopia Shim here
- When read/write calls go into Kopia Shim, it in turn calls the BackupRepo handle for read/write
- Finally, the read/write calls flow to Unified Repository module

The Unified Repository provides all-in-one functionalities of a Backup Repository and exposes the udmrepo.BackupRepo interface. Inside, Kopia Library is an adapter for Kopia Repository to translate the udmrepo.BackupRepo interface calls to Kopia Repository interface calls.
The Unified Repository provides all-in-one functionalities of a Backup Repository and exposes the Unified Repository Interface. Inside, Kopia Library is an adapter for Kopia Repository to translate the Unified Repository Interface calls to Kopia Repository interface calls.
![A BR Workflow](br-workflow.png)
The modules in blue color in below diagram represent the newly added modules/logics or reorganized logics.
The modules in yellow color in below diagram represent the called Kopia modules without changes.
Expand All @@ -244,12 +259,12 @@ Velero already has an existing workflow to call Restic maintenance (it is called
- When a BackupRepsoitory CR (originally called ResticRepository CR) is created by PodVolumeBackup/Restore Controller, the BackupRepository controller checks if it reaches to the Prune Due Time, if so, it calls PruneRepo
- In the new design, the Repository Provider implements PruneRepo call, it uses the same way to switch between Restic Repository Provider and Unified Repository Provider, then:
- For Restic Repository, Restic Repository Provider invokes the existing “Prune” CLI of Restic
- For Unified Repository, Unified Repository Provider calls udmrepo.BackupRepo’s Maintain
- For Unified Repository, Unified Repository Provider calls udmrepo.BackupRepoService’s Maintain function

A special feature of Kopia’s maintenance that needs to be noticed by the caller is that Kopia supports two maintenance modes – the full maintenance and quick maintenance. There are many differences between full and quick mode, but briefly speaking, quick mode only processes the hottest data (primarily, it is the metadata and index data), in this way, the maintenance will finish very fast and make less impact. Therefore, it is better to take this quick maintenance into Velero.
Kopia supports two maintenance modes – the full maintenance and quick maintenance. There are many differences between full and quick mode, but briefly speaking, quick mode only processes the hottest data (primarily, it is the metadata and index data), in this way, the maintenance will finish very fast and make less impact. We will also take this quick maintenance into Velero.
We will add a new Due Time to Velero, finally, we have two Prune Due Time:
- Normal Due Time: For Restic, this will invoke Restic Prune; for Unified Repository, this will invoke udmrepo.BackupRepo’s Maintain(full) call and finally call Kopia’s full maintenance
- Quick Due Time: For Restic, this does nothing; for Unified Repository, this will invoke udmrepo.BackupRepo’s Maintain(quick) call and finally call Kopia’s quick maintenance
- Normal Due Time: For Restic, this will invoke Restic Prune; for Unified Repository, this will invoke udmrepo.BackupRepoService’s Maintain(full) call and finally call Kopia’s full maintenance
- Quick Due Time: For Restic, this does nothing; for Unified Repository, this will invoke udmrepo.BackupRepoService’s Maintain(quick) call and finally call Kopia’s quick maintenance

We assign different values to Normal Due Time and Quick Due Time, as a result of which, the quick maintenance happens more frequently than full maintenance.
![A Maintenance Workflow](maintenance-workflow.png)
Expand All @@ -264,15 +279,15 @@ In this way, Velero will be able to get the progress as shown in the diagram bel
In the current design, Velero is using two unchanged Kopia modules --- the Kopia Uploader and the Kopia Repository. Both will generate debug logs during their run. Velero will collect these logs in order to aid the debug.
Kopia’s Uploader and Repository both get the Logger information from the current GO Context, therefore, the Kopia Uploader Provider/Kopia Library could set the Logger interface into the current context and pass the context to Kopia Uploader/Kopia Repository.
Velero will set Logger interfaces separately for Kopia Uploader and Kopia Repository. In this way, the Unified Repository could serve other uploaders/data movers without losing the debug log capability; and the Kopia Uploader could write to any repository without losing the debug log capability.
Kopia’s debug logs will be written to the same log file as Velero server or BackupRepository daemonset, so Velero doesn’t need to upload/download these debug logs separately.
Kopia’s debug logs will be written to the same log file as Velero server or PodVolumeBackup daemonset, so Velero doesn’t need to upload/download these debug logs separately.
![A Debug Log for Uploader](debug-log-uploader.png)
![A Debug Log for Repository](debug-log-repository.png)

## Path Switch & Coexist
As mentioned above, we will use an option “Legacy” to choose different paths. We don’t pursue a dynamic switch as there is no user requirement.
Instead, we assume the value of this option is set at the time of installation of Velero and never changed unless Velero is uninstalled. This means, if users want to switch the path, they need to uninstall Velero first and reinstall it.
Specifically, we will have the “Legacy” option/mode in two places:
- Add the “Legacy” option as a parameter of the Velero server and PodVolume daemonset. The parameters will be set by the installation. For details of installation, see below Installation section.
- Add the “Legacy” option as a parameter of the Velero server and PodVolumeBackup daemonset. The parameters will be set by the installation. For details of installation, see below Installation section.
- Add a mode value in the BackupRepository CRs and PodVolumeBackup CRs.

The corresponding controllers handle the CRs with the matched mode only, the mismatched ones will be ignored. In this way, the corresponding controllers could handle the switch correctly for both fresh installation and upgrade.
Expand All @@ -292,7 +307,7 @@ As a side effect, when upgrading from an old release, even though the path is no
Therefore, users are recommended to uninstall Velero and delete all the resources in the Velero namespace before installing the new release.

## Installation
The “legacy” flag will be set into Velero server deployment and PodVolume daemonset deployment so that both the RepositoryProvider and UploaderProvider see the flag.
The “legacy” flag will be set into Velero server deployment and PodVolumeBackup daemonset deployment so that both the RepositoryProvider and UploaderProvider see the flag.
The same “legacy” option will be added for Velero’s installation, including CLI installation and Helm Chart Installation. Then we need to transfer users’ selection into the two deployments mentioned above:
- Helm Chart Installation: add a “Legacy” value into its value.yaml and then generate the deployments according to the value. Value.yaml is the user-provided configuration file, therefore, users could set this value at the time of installation.
- CLI Installation: add the “Legacy” option into the installation command line, and then set the flag when creating the two deployments accordingly. Users could change the option at the time of installation.
Expand All @@ -301,5 +316,5 @@ The same “legacy” option will be added for Velero’s installation, includin
Below user experiences are changed for this design:
- Installation CLI change: a new option is added to the installation CLI, see the Installation section for details
- CR change: One or more existing CRs have been renamed, see the Velero CR Changes section for details
- Wording Alignment: as the existing situation, many places are using the word of "Restic", for example, "Restic" daemonset, "default-to-restic" option, most of them are not accurate anymore, we will change these words and give a detail list of the changes
- Wording Alignment: as the existing situation, many places are using the word of "Restic", for example, "Restic" daemonset, "default-volume-to-restic" option, most of them are not accurate anymore, we will change these words and give a detailed list of the changes

0 comments on commit 2869223

Please sign in to comment.