Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epic] Two stage API for the snapshot and backup/upload snapshot #1519

Closed
satyamz opened this issue May 23, 2019 · 18 comments
Closed

[Epic] Two stage API for the snapshot and backup/upload snapshot #1519

satyamz opened this issue May 23, 2019 · 18 comments
Labels
Area/Plugins Issues related to plugin infra/internal plugins Enhancement/User End-User Enhancement to Velero Epic Needs Product Blocked needing input or feedback from Product Reviewed Q2 2021

Comments

@satyamz
Copy link

satyamz commented May 23, 2019

Hi Velero community 馃憢

We have written a velero plugin which implements all the BlockStore APIs aka VolumeSnapshotter APIs. Today, we only have a CreateSnapshot API where we take a snapshot and upload it in one go because, openebs velero plugin implements an API CreateSnapshot which takes the snapshot and uploads data.

When we have an application like mysql, we need to apply pre-hooks and post-hooks to freeze the io's coming to the application, so that, we can take an application consistent snapshot. However, in today's velero implementation, we apply the pre-hook on the Pod, then find PVC, PV and once PV is found, take the snapshot using plugin and apply post-hook.

In most of the blockStore plugin cases, plugins do the upload data, as a part of the CreateSnapshot API execution. This creates an unecessary down time for an application. Application io's will be frozen for the time equivalent to the time requires to upload the data. Velero should provide a separate API to upload the backup which will be executed as soon as snapshot is taken and post-hooks are executed.

Thanks!!

@skriss skriss added Enhancement/Dev Internal or Developer-focused Enhancement to Velero Area/Plugins Issues related to plugin infra/internal plugins Enhancement/User End-User Enhancement to Velero and removed Enhancement/Dev Internal or Developer-focused Enhancement to Velero labels May 23, 2019
@skriss
Copy link
Member

skriss commented May 23, 2019

@satyamz thanks for filing this issue! really helpful to get this kind of feedback from plugin authors. we have definitely already had some thoughts and discussions along these lines.

@satyamz
Copy link
Author

satyamz commented May 23, 2019

Hi @skriss ,

We are very much interested in discussing how this can be solved. We tried to resolve this by adding an additional method to the VolumeSnapshotter interface called UploadSnapshot which would be called just after the Post hook of the Pod is executed. We actually got it working for taking snapshot using CreateSnapshot and uploading the backup / snapshot data using UploadSnapshot. I want to validate this approach.

@skriss
Copy link
Member

skriss commented Jun 26, 2019

@satyamz sorry for the delay in getting back here. At first glance this makes a lot of sense, and I think would help not just for your plugin but for others as well.

In your POC, what did the signature for UploadSnapshot look like, and were there any changes necessary for CreateSnapshot?

@skriss
Copy link
Member

skriss commented Jun 26, 2019

Also, in terms of execution flow - could UploadSnapshot execute in the background while other resources are being backed up? I'm guessing yes, and we'd just want to wait for all of the UploadSnapshot calls to complete before completing the overall backup.

@skriss
Copy link
Member

skriss commented Jun 26, 2019

cc @VMmore

@satyamz
Copy link
Author

satyamz commented Jun 27, 2019

Hi @skriss, Thanks for considering this. 馃檪

At first glance this makes a lot of sense, and I think would help not just for your plugin but for others as well.

Yes! It can be useful for all the storage providers.

In your POC, what did the signature for UploadSnapshot look like, and were there any changes necessary for CreateSnapshot?

func UploadSnapshot(volumeID string) error {
...
}

AFAIK, There are two kind of storage snapshots. first one where snapshot is present as a file (copy of the data) on the disk (Like restic does for the localPV) and second type where snapshot is CoW which is logical entity altogether. In both the cases, volumeID can help us identify the volume, in our case i.e. openebs/velero-plugin we need backupName as well as volumeId. We find the backupName with help of volumeID. Need of having backupName for upload method differs from the provider to provider. There's no change in CreateSnapshot method.

Also, in terms of execution flow - could UploadSnapshot execute in the background while other resources are being backed up?

Yes, that's correct. Also, we should call UploadSnapshot as soon as post hook on Pod is released. We might also need to create multiple go routines calling UploadSnapshot method for multiple volumes. Implementation for this is not quite clear to me. We tried it for the single volume. And it worked!!

we'd just want to wait for all of the UploadSnapshot calls to complete before completing the overall backup.

Yes! We should mark backup as completed only when all UploadSnapshot methods are returned.

@skriss skriss added Needs info Waiting for information Needs Product Blocked needing input or feedback from Product labels Jun 27, 2019
@skriss
Copy link
Member

skriss commented Jul 2, 2019

馃憤 thanks for the info. One thing we'll need to think about is whether adding this function to the interface would be considered a breaking change/necessitate a 2.0, or whether we can gracefully handle plugins that don't implement the function. This isn't really something we've tackled in Velero before.

@satyamz
Copy link
Author

satyamz commented Jul 3, 2019

One thing we'll need to think about is whether adding this function to the interface would be considered a breaking change/necessitate a 2.0, or whether we can gracefully handle plugins that don't implement the function.

Good point 馃憤

We need to add a method to the volumesnapshotter interface, hence there'll be changes in the volumesnapshotter.proto which means we need to generate the gRPC code and update GRPCServer and GRPCClient implementation for the UploadSnapshot method in the Velero. These steps need to be followed for the Plugin too. So yes, this is something which is required on both the side Velero and plugin.
I had figured this out a few months back. Let me check if I can find some time get it to work for velero 1.0 .

@skriss
Copy link
Member

skriss commented Jul 8, 2019

I played around with this some, and it looks like if we add a new method to the VolumeSnapshotter interface, existing plugin implementations that don't implement it will return a grpc error with a code of Unimplemented when the call is attempted. So we could check for this type of error in the Velero code and ignore it if encountered, which would maintain our backwards-compatibility.

@satyamz
Copy link
Author

satyamz commented Jul 9, 2019

Sounds good! How can we proceed further?

@skriss
Copy link
Member

skriss commented Jul 9, 2019

I'd like to have a brief design doc written up containing the proposal, that we can review with the community and especially other plugin authors before moving on to execution. I just opened a PR this morning with a proposed design doc template (#1636) - the idea is that you'd copy the template, fill out the details, then open a PR against this repo with your design doc for review/comment. Specifically I'd like to lock down the signature of the new function and how the backup execution flow changes (scope of hooks, any phase changes, etc).

Once the design doc is created, it'd be awesome if you could present it at a Velero community meeting.

Does that sound reasonable?

@skriss
Copy link
Member

skriss commented Jul 9, 2019

cc @nrb @carlisia @prydonius

@carlisia
Copy link
Contributor

+1. @satyamz and we can do a review prior to you presenting this if you'd like to present, which would be super useful.

@carlisia
Copy link
Contributor

So we could check for this type of error in the Velero code and ignore it

I think this is completely harmless and since it's temporary until we can make a breaking release, I'm on board. Maybe logging should be done alongside ignoring the error.

@satyamz
Copy link
Author

satyamz commented Jul 11, 2019

Hi @skriss

Once the design doc is created, it'd be awesome if you could present it at a Velero community meeting.
Does that sound reasonable?

Yes, our overall approach is remarkable. I will start writing the design document and open a PR for the same with the problem statement and approach to solving the problem by covering topics discussed here.

Thank you!

@skriss
Copy link
Member

skriss commented Aug 19, 2019

hi @satyamz - just wondering if you were able to start work on a design doc here? let me know if I can provide any input!

@satyamz
Copy link
Author

satyamz commented Aug 20, 2019

Hi @skriss, Apologies. I was dragged into something else. I should send PR this week.

satyamz added a commit to satyamz/velero that referenced this issue Aug 26, 2019
More discussion on this can be found in issue:
vmware-tanzu#1519

Signed-off-by: Satyam Zode <satyamzode@gmail.com>
@skriss skriss added the Epic label Mar 19, 2020
@skriss skriss changed the title Two stage API for the snapshot and backup/upload snapshot [Epic] Two stage API for the snapshot and backup/upload snapshot Mar 19, 2020
@dsu-igeek dsu-igeek removed the Needs info Waiting for information label Oct 22, 2020
@eleanor-millman
Copy link
Contributor

Closing because covered by #3533

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Plugins Issues related to plugin infra/internal plugins Enhancement/User End-User Enhancement to Velero Epic Needs Product Blocked needing input or feedback from Product Reviewed Q2 2021
Projects
None yet
Development

No branches or pull requests

5 participants