New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design: progress on backup/restore operations by volume snapshotters #2543
Conversation
Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
just want to make sure that nothing is pending on me to make this PR go to next stage.. please let me know if thats not the case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @vishnuitta! Definitely appreciate the though that's gone into it, and it's a feature we want in Velero.
On the whole, I like the proposal. I have some requests I've made on comments inline that I think would make the proposal clearer. @vmware-tanzu/velero-maintainers Please read through when you have a chance.
Though no restrictions are required on the name of CR, as a general practice, volume snapshotter can name this CR with the value same as return value of CreateSnapshot. | ||
|
||
After return from `CreateSnapshot` in `takePVSnapshot`, if VolumePluginBackup CR exists for particular backup of the volume, velero adds this CR to `backupRequest`. | ||
During persistBackup call, this CR also will be backed up to backup location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these would need to be written in a json.gz
file, similar to how the snapshots are now?
It may be helpful to create a section in the design for "core" Velero server/client changes. I think the only things that need to be changed are as follows, based on what I've read in this document.
- Creation of the VolumePluginBackup/VolumePluginRestore CRDs at installation time
- Persistence of VolumePluginBackup CRs towards the end of the back up operation
- Sync of VolumePluginBackup/VolumePluginRestore CRDs at the same time as backup synchronization.
Otherwise, volumesnapshotter plugins themselves are responsible for interacting with the VolumePlugin(Backup|Restore) CRs, and Velero's core won't change them.
Then, to see this reported, the velero describe
commands would need to be aware of the VolumePlugin(Backup|Restore) CRs associated with a Backup CR in order to display the progress fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. I will add a section @nrb. And, the only changes that are required in "core" are those that you mentioned.
Regarding 'velero describe' support for these CRs, if I'm not wrong, 'velero describe' is not aware of PodVolume(Backup|Restore) CRs as well. To keep this PR simple, I thought - we can take it for these new CRs also when PodVolume(Backup|Restore) CRs support is added to 'velero describe'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, no, it would not be aware of these new CRDs, but we could make it support them.
Within the backup describer, we have a function for describing PodVolumeBackups, as well as a function for describing VolumeSnapshotContents.
I would think describing these new CRs would work roughly the same way, where they would get queried from the Kube API and then passed to the top level Describe function.
I don't know that all this detail needs to go into the design document, but this is kind of what I envisioned the scope of work to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned on Slack, I think the progress tracking is information that's really only necessary when scoped to an in-progress back up operation. When a Backup
is in a Completed
or Failed
state, we know it's at 100% completion.
Given that, does it make sense to synchronize these CRs across clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synchronizing VolumePluginBackup CR to another cluster will help user to know the backup size, number of volumes in it even before user triggers restore of the backup.
// Plugin name | ||
Plugin string `json:"plugin"` | ||
|
||
// PluginSpecific are a map of key-value pairs that plugin want to provide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of what values might go in here, and how it might differ from ConfigMaps that we use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't thought any specific, but, added a way so that plugins can provide more info specific to this restore operation to users.
Configmaps in plugin is related to config at plugin level. This field is specific to details regarding this restore operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing which I had in my mind when added this was specific to OpenEBS plugin, but, also thought that every plugin may have their specific things to convey to user about that particular backup/restore operation.
OpenEBS plugin supports incremental snapshots also. So, an entry can be added to this map which provides name of its parent backup.
|
||
## Security Considerations | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any sort of RBAC considerations keeping plugins from modifying VolumePlugin(Backup|Restore) CRs that were created by another plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good question @nrb. I hadn't thought of this. Let me also think of this.
Do you have any suggestions? and any such considerations already handled in Velero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently everything runs under the same velero
service account so all plugins have broad access, which would include being able to modify CRs created by another plugin.
@dsu-igeek I would appreciate your eyes on this, even a quick glance. Would this proposal work for the vSphere plugin? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments to explore potential re-use of the existing pod_volume_backup type.
I would also like to understand how we might use something like this in the CSI snapshot APIs.
I'm unclear how we might be able to get information from the CSI snapshot APIs. I'd have to look at how they work again, as right now I think the CSI drivers are more or less a black box to Velero, and we don't have a way of asking for progress. I don't believe there's more than one field for bytes written, just the final total for |
- backward compatibility - label value fix - typo fix Signed-off-by: Vitta <vitta@mayadata.io>
Couple helpful links from today's call:
|
Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more questions from me around the move to the snapshot struct.
….gz file Signed-off-by: Vitta <vitta@mayadata.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few more comments; I think the general flow and the changes to velero core make sense, we just need to finalize the exact API design.
// PluginSpecific are a map of key-value pairs that plugin want to provide | ||
// to user to identify plugin properties related to this backup | ||
// +optional | ||
PluginSpecific map[string]string `json:"pluginSpecific,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to come up with a different name here...I'll think on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this is supposed to be properties specific to Backup/restore operation, thinking of changing to
BackupProperties
and RestoreProperties
..
does that make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the properties be the same on backup and restore? Are they copied from the backup CR to the restore CR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need not be @nrb .. Backup related and Restore related ones are different.
As mentioned in one of the comments, one Backup property can be its parent snapshot name, in case of OpenEBS as it supports incremental backups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BackupProperties
or BackupInfo
would work.
|
||
### Progress of backup operation handled by volume snapshotter | ||
|
||
Progress will be updated by volume snapshotter in VolumePluginBackup CR which is specific to that backup operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just going back and forth on the API design a bit. I think either option is workable for now for this specific use case, just trying to think through other uses/future extensions and whether that should push us in one direction or the other.
- Approach 1 combines the volume snapshot & volume backup info into a single CRD, and then has a parallel CRD on the restore side
- Approach 2 models the backup separately from the snapshot (the snapshot remains as an internal struct, not a CRD), but there's a lot of duplication of info between the snapshot & backup structs. On the restore side, there's one struct that models both the restore from durable storage, and the snapshot restore.
- we could consider fully breaking things out, so 4 CRDs: on the backup side, one for the snapshot, one for the backup to durable storage; on the restore side, one for the restore from durable storage, and one for the snapshot restore
I think I'm still leaning towards approach 1, as it solves the need and is simplest overall, but I'd like other maintainers' inputs on this (including any other options you see).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm leaning towards approach 1, too, with the caveat that it needs to be optional. We can't require it if not all plugins are updating these CRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats true @nrb .. With approach 1, a CR will be created for every backup/restore operation even though plugin doesn't update it.
|
||
## Security Considerations | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently everything runs under the same velero
service account so all plugins have broad access, which would include being able to modify CRs created by another plugin.
Signed-off-by: Vitta <vitta@mayadata.io>
Signed-off-by: Vitta <vitta@mayadata.io>
|
||
At a high level, in this approach, this struct will be converted to a CR by adding new fields (related to Progress tracking) to it, and gets rid of `volume.Snapshot` struct. | ||
|
||
Instead of backing up of Go struct, CRs will be backed up to backup location, and it gets synced into other cluster by backupSyncController running in that cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 47 contradicts line 41. Is the plan to sync the CR on a regular basis, or at restore time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First para i.e., lines 41, 42 and 43 are related to current behavior. I will update it.
At line 47 is the proposed solution. So, plan is to sync the CR on regular basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fairly close, but I need to read the backwards compatibility section more closely to be sure I understand.
@ashish-amarnath and @carlisia, please read through this proposal as well
Signed-off-by: Vitta <vitta@mayadata.io>
Will review tomorrow. |
@carlisia or @nrb or @ashish-amarnath .. would you guys prefer a hangout to close the things on this thread? |
@vishnuitta we could do a hangout or zoom meeting to review this - @carlisia since you are going to take a look at reviewing this would it be better for you to provide comments here or to have a short call w/ Vishnu on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My apologies for my delay. I don't believe I need a call to sync on this, but can if others believe its helpful.
I'm in favor of approach 1, and if others are as well, then I'd simply ask approach 2 be marked as an alternative considered.
Also, this isn't necessarily for the design phase, but upon implementation, please use Kubebuilder to implement the new CRD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is in really good shape and it's good to go! 🎉
I agree that approach 1 is better, and I also like the continuous sync'ing to the storage location.
// PluginSpecific are a map of key-value pairs that plugin want to provide | ||
// to user to identify plugin properties related to this backup | ||
// +optional | ||
PluginSpecific map[string]string `json:"pluginSpecific,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BackupProperties
or BackupInfo
would work.
ProviderSnapshotID string `json:"providerSnapshotID,omitempty"` | ||
|
||
// Phase is the current state of the VolumeSnapshot. | ||
Phase SnapshotPhase `json:"phase,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will the phases be for backup/restore?
@vishnuitta if you use the scaffold feature of kubebuilder to generate the new CRDs with the Velero project it won't work, because we structure things differently than kubebuilder expects (no |
@vishnuitta please move approach 2 to the "alternatives" section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patience on this proposal @vishnuitta
Apart from moving the creation of the CRs into the plugins themselves, the other comments are minor one.
|
||
### Approach 1 | ||
|
||
Existing `Snapshot` Go struct from `volume` package have most of the details related to backup operation performed by volumesnapshotters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest making these as links to the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update with code links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't have to block the merge of this PR.
This struct also gets backed up to backup location. But, this struct doesn't get synced on other clusters at regular intervals. | ||
It is currently synced only during restore operation, and velero CLI shows few of its contents. | ||
|
||
At a high level, in this approach, this struct will be converted to a CR by adding new fields (related to Progress tracking) to it, and gets rid of `volume.Snapshot` struct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean CRD?
Can you please update what this CRD will look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR have CR struct below (not exact CRD) for better readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look for heading VolumePluginBackup CR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be good to move forward.
Thank you for your patience on this PR @vishnuitta. Really appreciated!
thanks @carlisia for the merge and thanks to every one for the reviews. |
This PR contains design proposal using which volume snapshotters can let know its users about the progress on backup/restore operations.
Signed-off-by: Vitta vitta@mayadata.io