Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

design: progress on backup/restore operations by volume snapshotters #2543

Merged
merged 12 commits into from Jul 8, 2020

Conversation

vishnuitta
Copy link
Contributor

This PR contains design proposal using which volume snapshotters can let know its users about the progress on backup/restore operations.

Signed-off-by: Vitta vitta@mayadata.io

Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
@skriss skriss added the Area/Design Design Documents label May 19, 2020
@vishnuitta
Copy link
Contributor Author

just want to make sure that nothing is pending on me to make this PR go to next stage.. please let me know if thats not the case
sorry for asking it..

Copy link
Contributor

@nrb nrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @vishnuitta! Definitely appreciate the though that's gone into it, and it's a feature we want in Velero.

On the whole, I like the proposal. I have some requests I've made on comments inline that I think would make the proposal clearer. @vmware-tanzu/velero-maintainers Please read through when you have a chance.

design/plugin-backup-and-restore-progress-design.md Outdated Show resolved Hide resolved
Though no restrictions are required on the name of CR, as a general practice, volume snapshotter can name this CR with the value same as return value of CreateSnapshot.

After return from `CreateSnapshot` in `takePVSnapshot`, if VolumePluginBackup CR exists for particular backup of the volume, velero adds this CR to `backupRequest`.
During persistBackup call, this CR also will be backed up to backup location.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these would need to be written in a json.gz file, similar to how the snapshots are now?

It may be helpful to create a section in the design for "core" Velero server/client changes. I think the only things that need to be changed are as follows, based on what I've read in this document.

  • Creation of the VolumePluginBackup/VolumePluginRestore CRDs at installation time
  • Persistence of VolumePluginBackup CRs towards the end of the back up operation
  • Sync of VolumePluginBackup/VolumePluginRestore CRDs at the same time as backup synchronization.

Otherwise, volumesnapshotter plugins themselves are responsible for interacting with the VolumePlugin(Backup|Restore) CRs, and Velero's core won't change them.

Then, to see this reported, the velero describe commands would need to be aware of the VolumePlugin(Backup|Restore) CRs associated with a Backup CR in order to display the progress fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. I will add a section @nrb. And, the only changes that are required in "core" are those that you mentioned.
Regarding 'velero describe' support for these CRs, if I'm not wrong, 'velero describe' is not aware of PodVolume(Backup|Restore) CRs as well. To keep this PR simple, I thought - we can take it for these new CRs also when PodVolume(Backup|Restore) CRs support is added to 'velero describe'.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, no, it would not be aware of these new CRDs, but we could make it support them.

Within the backup describer, we have a function for describing PodVolumeBackups, as well as a function for describing VolumeSnapshotContents.

I would think describing these new CRs would work roughly the same way, where they would get queried from the Kube API and then passed to the top level Describe function.

I don't know that all this detail needs to go into the design document, but this is kind of what I envisioned the scope of work to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on Slack, I think the progress tracking is information that's really only necessary when scoped to an in-progress back up operation. When a Backup is in a Completed or Failed state, we know it's at 100% completion.

Given that, does it make sense to synchronize these CRs across clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronizing VolumePluginBackup CR to another cluster will help user to know the backup size, number of volumes in it even before user triggers restore of the backup.

// Plugin name
Plugin string `json:"plugin"`

// PluginSpecific are a map of key-value pairs that plugin want to provide
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of what values might go in here, and how it might differ from ConfigMaps that we use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't thought any specific, but, added a way so that plugins can provide more info specific to this restore operation to users.
Configmaps in plugin is related to config at plugin level. This field is specific to details regarding this restore operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing which I had in my mind when added this was specific to OpenEBS plugin, but, also thought that every plugin may have their specific things to convey to user about that particular backup/restore operation.
OpenEBS plugin supports incremental snapshots also. So, an entry can be added to this map which provides name of its parent backup.


## Security Considerations

N/A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any sort of RBAC considerations keeping plugins from modifying VolumePlugin(Backup|Restore) CRs that were created by another plugin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good question @nrb. I hadn't thought of this. Let me also think of this.
Do you have any suggestions? and any such considerations already handled in Velero?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently everything runs under the same velero service account so all plugins have broad access, which would include being able to modify CRs created by another plugin.

@nrb
Copy link
Contributor

nrb commented Jun 2, 2020

@dsu-igeek I would appreciate your eyes on this, even a quick glance. Would this proposal work for the vSphere plugin?

Copy link
Contributor

@ashish-amarnath ashish-amarnath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments to explore potential re-use of the existing pod_volume_backup type.

I would also like to understand how we might use something like this in the CSI snapshot APIs.

@nrb
Copy link
Contributor

nrb commented Jun 2, 2020

I would also like to understand how we might use something like this in the CSI snapshot APIs.

I'm unclear how we might be able to get information from the CSI snapshot APIs. I'd have to look at how they work again, as right now I think the CSI drivers are more or less a black box to Velero, and we don't have a way of asking for progress. I don't believe there's more than one field for bytes written, just the final total for restoreSize.

- backward compatibility
- label value fix
- typo fix

Signed-off-by: Vitta <vitta@mayadata.io>
@skriss
Copy link
Member

skriss commented Jun 3, 2020

Couple helpful links from today's call:

Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
Copy link
Contributor

@nrb nrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more questions from me around the move to the snapshot struct.

….gz file

Signed-off-by: Vitta <vitta@mayadata.io>
skriss
skriss previously requested changes Jun 9, 2020
Copy link
Member

@skriss skriss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more comments; I think the general flow and the changes to velero core make sense, we just need to finalize the exact API design.

design/plugin-backup-and-restore-progress-design.md Outdated Show resolved Hide resolved
design/plugin-backup-and-restore-progress-design.md Outdated Show resolved Hide resolved
// PluginSpecific are a map of key-value pairs that plugin want to provide
// to user to identify plugin properties related to this backup
// +optional
PluginSpecific map[string]string `json:"pluginSpecific,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to come up with a different name here...I'll think on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this is supposed to be properties specific to Backup/restore operation, thinking of changing to
BackupProperties and RestoreProperties..
does that make sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the properties be the same on backup and restore? Are they copied from the backup CR to the restore CR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need not be @nrb .. Backup related and Restore related ones are different.
As mentioned in one of the comments, one Backup property can be its parent snapshot name, in case of OpenEBS as it supports incremental backups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackupProperties or BackupInfo would work.


### Progress of backup operation handled by volume snapshotter

Progress will be updated by volume snapshotter in VolumePluginBackup CR which is specific to that backup operation.
Copy link
Member

@skriss skriss Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just going back and forth on the API design a bit. I think either option is workable for now for this specific use case, just trying to think through other uses/future extensions and whether that should push us in one direction or the other.

  • Approach 1 combines the volume snapshot & volume backup info into a single CRD, and then has a parallel CRD on the restore side
  • Approach 2 models the backup separately from the snapshot (the snapshot remains as an internal struct, not a CRD), but there's a lot of duplication of info between the snapshot & backup structs. On the restore side, there's one struct that models both the restore from durable storage, and the snapshot restore.
  • we could consider fully breaking things out, so 4 CRDs: on the backup side, one for the snapshot, one for the backup to durable storage; on the restore side, one for the restore from durable storage, and one for the snapshot restore

I think I'm still leaning towards approach 1, as it solves the need and is simplest overall, but I'd like other maintainers' inputs on this (including any other options you see).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm leaning towards approach 1, too, with the caveat that it needs to be optional. We can't require it if not all plugins are updating these CRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats true @nrb .. With approach 1, a CR will be created for every backup/restore operation even though plugin doesn't update it.


## Security Considerations

N/A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently everything runs under the same velero service account so all plugins have broad access, which would include being able to modify CRs created by another plugin.

Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
@nrb nrb added the kind/changelog-not-required PR does not require a user changelog. Often for docs, website, or build changes label Jun 22, 2020

At a high level, in this approach, this struct will be converted to a CR by adding new fields (related to Progress tracking) to it, and gets rid of `volume.Snapshot` struct.

Instead of backing up of Go struct, CRs will be backed up to backup location, and it gets synced into other cluster by backupSyncController running in that cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 47 contradicts line 41. Is the plan to sync the CR on a regular basis, or at restore time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First para i.e., lines 41, 42 and 43 are related to current behavior. I will update it.
At line 47 is the proposed solution. So, plan is to sync the CR on regular basis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the text

Copy link
Contributor

@nrb nrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fairly close, but I need to read the backwards compatibility section more closely to be sure I understand.

@ashish-amarnath and @carlisia, please read through this proposal as well

Signed-off-by: Vitta <vitta@mayadata.io>
@carlisia
Copy link
Contributor

carlisia commented Jul 1, 2020

Will review tomorrow.

@vishnuitta
Copy link
Contributor Author

@carlisia or @nrb or @ashish-amarnath .. would you guys prefer a hangout to close the things on this thread?

@stephbman
Copy link
Contributor

@vishnuitta we could do a hangout or zoom meeting to review this - @carlisia since you are going to take a look at reviewing this would it be better for you to provide comments here or to have a short call w/ Vishnu on this?

Copy link
Contributor

@nrb nrb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies for my delay. I don't believe I need a call to sync on this, but can if others believe its helpful.

I'm in favor of approach 1, and if others are as well, then I'd simply ask approach 2 be marked as an alternative considered.

Also, this isn't necessarily for the design phase, but upon implementation, please use Kubebuilder to implement the new CRD.

Copy link
Contributor

@carlisia carlisia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is in really good shape and it's good to go! 🎉

I agree that approach 1 is better, and I also like the continuous sync'ing to the storage location.

// PluginSpecific are a map of key-value pairs that plugin want to provide
// to user to identify plugin properties related to this backup
// +optional
PluginSpecific map[string]string `json:"pluginSpecific,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackupProperties or BackupInfo would work.

ProviderSnapshotID string `json:"providerSnapshotID,omitempty"`

// Phase is the current state of the VolumeSnapshot.
Phase SnapshotPhase `json:"phase,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will the phases be for backup/restore?

@carlisia
Copy link
Contributor

carlisia commented Jul 7, 2020

@vishnuitta if you use the scaffold feature of kubebuilder to generate the new CRDs with the Velero project it won't work, because we structure things differently than kubebuilder expects (no main.go for example). The way we do it is to have a dummy project and copy/paste things. Ping me on #velero if I can help you with this.

@carlisia
Copy link
Contributor

carlisia commented Jul 7, 2020

I'm in favor of approach 1, and if others are as well, then I'd simply ask approach 2 be marked as an alternative considered

@vishnuitta please move approach 2 to the "alternatives" section.

Copy link
Contributor

@ashish-amarnath ashish-amarnath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience on this proposal @vishnuitta
Apart from moving the creation of the CRs into the plugins themselves, the other comments are minor one.


### Approach 1

Existing `Snapshot` Go struct from `volume` package have most of the details related to backup operation performed by volumesnapshotters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest making these as links to the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update with code links.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't have to block the merge of this PR.

This struct also gets backed up to backup location. But, this struct doesn't get synced on other clusters at regular intervals.
It is currently synced only during restore operation, and velero CLI shows few of its contents.

At a high level, in this approach, this struct will be converted to a CR by adding new fields (related to Progress tracking) to it, and gets rid of `volume.Snapshot` struct.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean CRD?
Can you please update what this CRD will look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR have CR struct below (not exact CRD) for better readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look for heading VolumePluginBackup CR

Copy link
Contributor

@ashish-amarnath ashish-amarnath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be good to move forward.
Thank you for your patience on this PR @vishnuitta. Really appreciated!

@carlisia carlisia merged commit c3cac0a into vmware-tanzu:master Jul 8, 2020
@vishnuitta
Copy link
Contributor Author

thanks @carlisia for the merge and thanks to every one for the reviews.
Apologizes for delay in updating the last set of review comments. I will get them incorporated in the implementation related PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Design Design Documents kind/changelog-not-required PR does not require a user changelog. Often for docs, website, or build changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants