Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to enhance plugin with Timeout #2922

Open
phuongatemc opened this issue Sep 10, 2020 · 8 comments
Open

Request to enhance plugin with Timeout #2922

phuongatemc opened this issue Sep 10, 2020 · 8 comments
Assignees
Labels
Area/Plugins Issues related to plugin infra/internal plugins Enhancement/Dev Internal or Developer-focused Enhancement to Velero Reviewed Q2 2021

Comments

@phuongatemc
Copy link
Contributor

For App Consistent backup, the database application pod will be quiesce during the time Velero execute plugin (on PVC) to make snapshot of the PVC. If for some reason, the plugin takes a long time to finish (i.e. fail to create snapshot and retry, network failure), the application pod will be in quiesce state which will block user from using the application. This is undesirable condition.

We would like to enhance the plugin to have a timeout so that the plugin will fail if it takes more than specified timeout to execute.

  • Velero version: v1.3.2
  • Kubernetes version: v1.17.2 and v1.18.8
@nrb
Copy link
Contributor

nrb commented Sep 15, 2020

Which plugin type are you referring to? VolumeSnapshotter?

@nrb nrb added the Area/Plugins Issues related to plugin infra/internal plugins label Sep 15, 2020
@phuongatemc
Copy link
Contributor Author

Yes, we use VMWare VolumeSnapshotter for First Class Disk (FCD) PV.

@phuongatemc
Copy link
Contributor Author

We also plan to use BackupItem plugin in the near future.

@nrb nrb added the Enhancement/Dev Internal or Developer-focused Enhancement to Velero label Sep 24, 2020
@phuongatemc
Copy link
Contributor Author

I propose we enhance the BackupItemActionPlugin struct to contain a timeout (in seconds) and enhance BackupItemActionGRPCClient.Execute as following:
if timeout is not specify, call grpcClient.Execute directly
res, err := c.grpcClient.Execute(context.Background(), req)
if timeout is specify, execute it in a goroutine (using context.WithCancel to terminate the goroutine if timeout occurs) and select timeout. Roughly like the code below

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

ch := make(chan Response, 1)

go func() {
res, err := c.grpcClient.Execute(context.Background(), req)
select {
default:
ch <- res
case <-ctx.Done():
fmt.Println("Canceled by timeout")
return
}
}()

select {
case res <-ch:
//handle res
case <-time.After(timeout):
// print timeout to log
}

@nrb
Copy link
Contributor

nrb commented Oct 13, 2020

This may also be relevant to #2867

@phuongatemc
Copy link
Contributor Author

The Restore datapath is much more complicated because of behavior described in that #2867 I think we should consider it separately. This should focus on the Backup datapath.

@phuongatemc
Copy link
Contributor Author

I hold off this one until the plugin versioning is done. We now will first implement adding context into plugin interface using plugin versioning design (by @zubron) then enhance the Backup with the timeout.

@eleanor-millman
Copy link
Contributor

@phuongatemc Ah, I see, thanks for the added context. That sounds like a good plan.

cc @reasonerjt @yanji09

@eleanor-millman eleanor-millman added this to the v1.8.0 milestone Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Plugins Issues related to plugin infra/internal plugins Enhancement/Dev Internal or Developer-focused Enhancement to Velero Reviewed Q2 2021
Projects
None yet
Development

No branches or pull requests

3 participants