Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI support proposal #1661

Open
wants to merge 14 commits into
base: master
from

Conversation

@nrb
Copy link
Contributor

commented Jul 12, 2019

Remaining:

  • disabling CSI snapshotting, similar to disabling Velero-native snapshots
  • skipping CSI snapshotting in the plugins if the volume is covered by restic
  • mention feature flags/gates
  • use ownerRefs
  • deleting the underlying snapshot when the backup expires (using a Retain deletion policy means it will not be cleaned up by default)

Signed-off-by: Nolan Brubaker brubakern@vmware.com

design/csi-snapshots.md Outdated Show resolved Hide resolved
@nrb nrb added the Design label Jul 16, 2019
@nrb nrb requested review from carlisia and prydonius Jul 17, 2019
design/csi-snapshots.md Outdated Show resolved Hide resolved
design/csi-snapshots.md Outdated Show resolved Hide resolved
It was [introduced with dynamic provisioning support][15] in 2016, predating CSI.

In the `BackupItemAction` for PVCs, the associated PV will be queried and checked for the presence of `PersistentVolume.Spec.PersistentVolumeSource.CSI`.
Volumes with any other `PersistentVolumeSource` set will use Velero's current VolumeSnapshotter plugin code path.

This comment has been minimized.

Copy link
@prydonius

prydonius Jul 24, 2019

Member

Is there a case where you'd want to use Velero's current VolumeSnapshotter instead of CSI even if CSI was used to provision the volume? For example, if the CSI driver supported provisioning but not snapshotting?

This comment has been minimized.

Copy link
@nrb

nrb Jul 25, 2019

Author Contributor

Hm, that's a good possibility. I know there's gRPC methods for finding out if the driver supports snapshots, but I'd have to check to see if we can query the driver from k8s.

This comment has been minimized.

Copy link
@skriss

skriss Aug 12, 2019

Member

Our in-tree VolumeSnapshotters won't work as-is for CSI volumes, even if they're on the right platform, due to how the GetVolumeID/SetVolumeID functions work (and maybe other things too, not sure). We could look at changing that if we did want to be able to support using them to snapshot CSI volumes, but right now it won't work.

This comment has been minimized.

Copy link
@nrb

nrb Aug 14, 2019

Author Contributor

Update: currently there is no way to query what a driver supports from k8s. So we can't really know if CSI snapshotting is supported via a query to something like the storageclass at the moment.

I would say to simplify things, if CSI volumes are in use but the driver doesn't have snapshotting, restic would be the best option.

Copy link

left a comment

My direct comments are all just typos/re-words.
Generally, though, this looks good. In the list of "changes needed in plugins", though, it'd be good to see a "To support aspect X of feature Y, we need to change Z". You're just including "Z" without the context. IMO, this design doc reads pretty densely to someone not deeply familiar with Velero's implementation.

design/csi-snapshots.md Outdated Show resolved Hide resolved
design/csi-snapshots.md Outdated Show resolved Hide resolved
design/csi-snapshots.md Outdated Show resolved Hide resolved
design/csi-snapshots.md Outdated Show resolved Hide resolved
design/csi-snapshots.md Outdated Show resolved Hide resolved
Copy link
Member

left a comment

I read a bit about CSI and read this closely a few times. No input at the moment, lgtm.

@nrb

This comment has been minimized.

Copy link
Contributor Author

commented Aug 8, 2019

Per feedback, we need a way of telling Velero not to create CSI snapshots. Maybe use the SnapshotVolumes field on the backup?

@skriss

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

I know you brought up feature flags recently - I do think we should consider adding them, and adding one for the CSI work, as we're developing here. I imagine we're going to ship this in parts (e.g. first add support for creating CSI snapshots, then add support for restoring them, ...) and it would be nice to allow users to opt-in until they're ready.

@skriss

This comment has been minimized.

Copy link
Member

commented Aug 12, 2019

One other thing to think about - similar to existing code, we'll want to skip creating snapshots if restic is being used to back up a PVC - I'm not exactly sure how we'll do that given that the CSI code is in plugins. Need to think about it.

@nrb

This comment has been minimized.

Copy link
Contributor Author

commented Aug 12, 2019

I created #1759 for the feature flag work. It likely needs a bit more detail, but at least we have a place to discuss it now.

@nrb nrb changed the title Initial CSI proposal CSI support proposal Aug 13, 2019
nrb added 13 commits Jul 12, 2019
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
@nrb nrb force-pushed the nrb:csi-snapshot-proposal branch from 33cf33e to 7d3ab7b Aug 19, 2019
@nrb

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2019

@skriss @prydonius @carlisia I believe all the items in the checklist are now addressed. I've also done a bit of reorganization of the sections. Please take another look!

Copy link
Member

left a comment

lgtm 👍


To ensure this is the case, server code must be extended so that PVCs are marked in such a way that the CSI BackupItemAction plugin is not invoked for PVCs known to be backed up by restic.

The restic backupper's `BackupPodVolumes` method will be extended to get PersistentVolumeClaims where relevant and add a `velero.io/restic-backed: true` label.

This comment has been minimized.

Copy link
@skriss

skriss Aug 19, 2019

Member

I see a couple of challenges with this approach - (1) BackupPodVolumes will actually run after the PVC has been backed up, since the PVC is backed up as an additional item for the pod prior to the restic backups running; and (2) the label value would need to have the name or UID of the backup, so we know if the PVC is being backed up with restic for this backup, not just any previous backup.

One alternative approach for consideration - you could have the csi-pvc backup item action list all pods in the namespace that the PVC is in, find the pod that is mounting the PVC, and check if it's annotated for restic backup. If we do something like that, it means we don't need to add a new label/annotation which I think is desirable.

This comment has been minimized.

Copy link
@skriss

skriss Aug 19, 2019

Member

Longer-term, I wonder if we should pass a richer "context" object to our plugins that includes information like this (i.e. the resticSnapshotTracker). Right now one of the drawbacks of implementing logic in plugins is that you don't get access to some of those internal data structures used by pkg/backup or pkg/restore to track what's happening. If we could expose those to plugins, things would be a lot easier.

This comment has been minimized.

Copy link
@nrb

nrb Aug 19, 2019

Author Contributor

Yeah, I really wanted to use the resticSnapshotTracker, but opted for the label since that could be used with the AppliesTo function, thereby not even invoking the Execute method for csi-pvc when a PVC should be skipped. Was hoping to avoid too many extra API server calls to determine a no-op, but it may be unavoidable for now.

I'm not sure I want to completely redo the plugin interface in this proposal, as I'd like for CSI support to be a 1.x feature. However, I think providing some sort of context object would be very useful for plugins in the future.

Thanks for the call out on the PVCs being backed up during the Pod's process; I had forgotten that the PVCs were backed up immediately instead of being added to the PVC group's loop.

Your alternative sounds do-able, I think.

Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
@nrb

This comment has been minimized.

Copy link
Contributor Author

commented Aug 27, 2019

I'm going to hold off on updating this pending decisions made in kubernetes/kubernetes#81792 and kubernetes/enhancements#989


## Alternatives Considered

* Implementing similar logic in a Velero VolumeSnapshotter plugin was considered.

This comment has been minimized.

Copy link
@nrb

nrb Oct 9, 2019

Author Contributor

@betta1 Antony, is this summary of your experiments with making a VolumeSnapshotter plugin accurate? Is there anything I missed here that you encountered?

This comment has been minimized.

Copy link
@betta1

betta1 Oct 9, 2019

Contributor

@nrb yes this is accurate and is in line with our initial experiments with a CSI VolumeSnapshotter plugin -- it's been a while but i remember our biggest issue was that the VolumeSnapshotter interface didn't provide some of the metadata like PVC/NameSpace/StorageClass/ that are required for CSI snapshots. We preferred the approach you had taken with your CSI prototype utilizing Backup and restore Item actions since with item actions we can get access to these metadata. Btw I tested your prototype and had filed an issue (vmware-tanzu/velero-plugin-for-csi#1) -- backup worked ok, ran into some issue with restore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.