Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when passing AdditionalItems from restore item action #1350

Closed
sseago opened this issue Apr 4, 2019 · 16 comments · Fixed by #5933
Closed

Race condition when passing AdditionalItems from restore item action #1350

sseago opened this issue Apr 4, 2019 · 16 comments · Fixed by #5933
Assignees
Labels
Area/Plugins Issues related to plugin infra/internal plugins Enhancement/User End-User Enhancement to Velero
Milestone

Comments

@sseago
Copy link
Collaborator

sseago commented Apr 4, 2019

What steps did you take and what happened:
We have a restore plugin which returns resources via AdditionalItems when those items must be restored before the current item. This happens when there is a dependency relationship between two items of the same resource type. In our particular case, for the imagestreamtags.image.openshift.io resource, in some cases some imagestream tags reference other imagestreamtags, and a restore for imageStreamTagA will fail if it references imageStreamTagB which does not exist. The plugin returns imageStreamTagB in the AdditionalItems slice.

In Restore.go, after running Execute on the plugin, for any AdditionalItems returned, restoreItem is called on each one.

From looking at the logs, everything seems to be happening in the right order. The plugin action runs for the resource that references the other imagestream tag, it passes the AdditionalItem reference, and then restore is called on that resource, including calling its restore plugins. However, when I look at the target namespace in the cluster after restore, I see the returned AdditionalItem resource, but I am not seeing the first resource restored at all. I know that if we attempt to restore an imagestreamtag with a non-existent reference, it will fail to restore, which is the expected behavior (i.e. if we have an alias tag for alpine:3.x which is supposed to track alpine 3.2, if 3.2 does not exist, the tracking tag will not be restored).

In this case, however, what seems to be happening is that restoreItem for the referenced resource returns before the resource is fully created in the cluster, so when the resourceClient.Create call is made, the referenced resource does not yet exist, according to the cluster.

I just modified my local velero checkout to add time.Sleep(10 * time.Second) immediately following the restoreItem call (linked below), and everything works as expected.
https://github.com/heptio/velero/blob/master/pkg/restore/restore.go#L938

What did you expect to happen:
My expectation was that both of the resources would be restored correctly.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

(I don't really think anything in my logs will give any more useful information than is already in the description above)

Anything else you would like to add:
I'm not sure exactly how to resolve this. If there was some "after restore plugin action", then for cases like this, I could have a plugin that blocked until the resource appeared in the cluster. In this case, on the returned AdditionalItems resource, the after restore plugin would wait until it was created to return, that way the initial imagestream tag which references it wouldn't actually have its resourceClient.Create call made until the cluster initialization of the resource was done.

Environment:

  • Velero version (use velero version): Server version is a build off the master branch
  • Kubernetes version (use kubectl version): 1.12+
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration: openshift 4 on aws
  • OS (e.g. from /etc/os-release):
@nrb
Copy link
Contributor

nrb commented Apr 5, 2019

Instead of a sleep, could a watch happen on the resource? We do something similar when checking if a PV or related PVC are deleted (code), and I'm working on similar logic in the install command. Similar behavior also happens in Helm when instances of CRDs are made before the definitions are ready, like in #1195.

The problem is that plugins don't receive a client (though they could generate one, I suppose), and the ready conditions for resources are different.

Would it make sense for the restore item action plugin interface to include some sort of Ready function? I'm not sure if this would be per item or not. @skriss @carlisia PTAL at the issue and share any ideas you might have.

@carlisia
Copy link
Contributor

carlisia commented Apr 5, 2019

Not sure yet about the how (let's discuss) but initially I'm thinking we might want the behavior of one resource waiting (watching?) multiple resources.

@sseago
Copy link
Collaborator Author

sseago commented Apr 5, 2019

@nrb Sure -- I never intended sleeping as a solution -- I just used it to confirm that the only reason for the failure was that the newly-created resource returned in the AdditionalItems slice wasn't there yet. Sleeping after calling restoreItem on AdditionalItems made the failure go away. Watching until a resource is ready could work, but it can't be done in the restore item action plugin, since that plugin is called before the restore happens in the cluster. If there were some plugin that got called after the resourceClient.Create call, that plugin could watch the resource. It's probably something we'd want to do within a plugin action which would only wait if it was required, since we don't want to always wait for one item to finish before going onto the next across all resource types. Also, it would need to be available per item, since in the case I'm hitting, the two dependent items are of the same resource type -- one imagestreamtag is dependent on another imagestreamtag.

@nrb
Copy link
Contributor

nrb commented Apr 5, 2019

If there were some plugin that got called after the resourceClient.Create call, that plugin could watch the resource.

That was kind of my thinking with a Ready function - it might be part of the same plugin, but the Ready function would be invoked after the Create call, effectively pausing restore til Ready returns true or a timeout is reached. If the functions not defined, move on as we do now.

And yeah, we'd need to think about how it might link between items like you said.

@skriss
Copy link
Member

skriss commented Jun 6, 2019

This looks like a specific instance of #964

@skriss
Copy link
Member

skriss commented Feb 13, 2020

@nrb @sseago I'm pretty sure we can close this out, now that we've closed #964 via #1937 - I'm going to close, but please reopen if you disagree with my assessment.

@skriss skriss closed this as completed Feb 13, 2020
@sseago
Copy link
Collaborator Author

sseago commented Feb 24, 2020

@skriss Circling back to this. I don't think it's an example of #964 since this applies to waiting for the creation of resources other than CRDs. The concern here is that AdditionalItems needed by the restore aren't necessarily ready when this item's restore is processed. For example, say you're restoring an ScheduledAppointment resource that requires the existence of the User first. Creating a ScheduledAppointment resource with a reference to a non-existent user will fail, so the restore item action plugin passes the appropriate User in the AdditionalItems list, but if this User resource gets restored but isn't ready yet, the ScheduledAppointment will still fail. This is why in my original investigation, adding a sleep call made the race condition go away (even though, as I mentioned, that's not the right solution here).

@skriss skriss reopened this Feb 24, 2020
@nrb
Copy link
Contributor

nrb commented Feb 26, 2020

Revisiting this, I think while they're generally the same problem (race conditions waiting for something to be ready), this is waiting at a different point in the code - namely for arbitrary AdditionalItems like Scott pointed out, not just CRDs.

I still think designing some sort of mechanism in to say that there needs to be a readiness check on the returned AdditionalItems makes sense. Very roughly, it could be a boolean added to the field on RestoreItemActionExecuteOutput, similar to SkipRestore, and if it's true, then invoke the plugin's Wait method, polling for readiness on the AdditionalItems.

@sseago
Copy link
Collaborator Author

sseago commented Feb 27, 2020

This makes sense. I think you almost have to do this via the plugin itself, since there's no way for Velero to know how to determine whether some arbitrary resource is ready, especially considering that the resource in question may have custom actions taken on it in a plugin. One consequence of this is you may have a situation where a plugin is returning AdditionalItems of a resource type that doesn't have its own plugin today, since there may be no need for custom actions on it, but since we need to know when that non-customized resource restore is done, we may need to add a plugin for that resource that has an empty/no-op Execute method but has a Wait method which determines when it's done. The Wait approach would also work for the use case I've run into, where in some cases we just need the Velero-restored item to show up in the cluster (so waiting for client.... to return the appropriate item), and in other cases we need the action carried out by the resource's plugin to complete successfully (perhaps a docker copy, or some other operation).

@sseago
Copy link
Collaborator Author

sseago commented Mar 20, 2020

@nrb If I'm understanding you correctly, you're proposing adding a new method (Wait())to the RestoreItemAction interface, in conjunction with adding a Wait field to RestoreItemActionExecuteOutput. If Wait is false, then current behavior is unchanged. If Wait is true, then after restoring the resource, call Wait() and don't move on to the next action (either the next additionalItems element, if we're there, or the resource item on the list if we're in a normal resource restore) until Wait() returns. As for the new plugin func, I'm assuming the same AppliesTo would be used to determine which Wait methods to call (perhaps more than one). If there's a generic plugin that runs on all resources (for setting annotations, etc.), then that plugin will probably have an empty Wait method, while a plugin which does processing specific to that resource type will implement the actual readiness check. For a resource that we don't currently have a plugin but which we may have other plugins which pull resources in, we may need to add a new plugin which would basically only return the original item with the Wait bool set to true, and then would have a Wait() method implemented which makes sure the resource is created.

@sseago
Copy link
Collaborator Author

sseago commented Mar 20, 2020

The other issue is what about timeouts? Would velero call Wait() with a global timeout setting? Would it be per restore, or would the plugin itself define it, so different resources might have different timeout values?

@nrb
Copy link
Contributor

nrb commented Mar 24, 2020

@sseago Yes, there would be a field on RestoreItemActionExecuteOutput and a method on the RestoreItemAction interface, operating in tandem as you described it.

I'm not sure about using AppliesTo on the Wait() function; I was envisioning it to be the exact same plugin, so I suppose it would be with same AppliesTo criteria that had already been applied...but if there's a chain of plugins, I need to think a little bit more about how things might get selected. Right now, I believe AdditionalItems are excluded from any kind of filtering logic, so including a PV from a PVC, which came from a Pod will get restore even if you explicitly excluded cluster-scoped resources.

The timeouts are tricky too - there is a global timeout now, but we actually have different numbers for waiting for PVs & associated objects to delete (in case someone deletes something and wants to immediately restore it) and for CRDs to be ready. Loads of command line flags certainly per type isn't an option. We could use ConfigMaps for plugin settings, and a core plugin that already does it.

@sseago
Copy link
Collaborator Author

sseago commented Apr 9, 2020

@nrb Right. We're using additional items to pull in cluster-scoped resources in some cases. In the case where this race condition has hit us, though, the AdditionalItems are actually resources of the same type but which this resource references. I guess the question is whether we want to call wait on the plugin that returned the AdditionalItems (even if it's a different resource type), or whether we'd want to call Wait on the chain of plugins that normally applies to the additional item's resource type. If it's a PVC calling out a PV, for example, would we want the PVC's plugin to call wait (being passed an ObjectReference for the PV), which would have to deal with the full range of resource types that it is able to pass via AdditionalItems? Alternatively, we might want to call Wait on each AdditionalItem's normal plugins, based on the AppliesTo logic. In this case, the normal PV plugin would implement wait for itself -- or if we don't already have a PV plugin, we'd need to write one which had a no-op Execute method and a filled-in Wait method.

In our ImageStreamTag use case, the distinction won't matter so much, since the case here is in the process of restoring tag1 we recognize that this tag contains an object reference to tag2, so we have to have tag2 restored before tag1, so we pass tag2 as an additional item. It gets restored, then wait is called on this plugin again with a tag2 reference -- same plugin whether it's via the appliesTo chain or because velero remembered that this plugin was the one that returned the additional item.

@sseago
Copy link
Collaborator Author

sseago commented Aug 26, 2020

I've created a PR for a design for this here: #2867

@eleanor-millman
Copy link
Contributor

Closing because the design was merged.

@sseago
Copy link
Collaborator Author

sseago commented Jun 29, 2022

Reopening because only the design was merged, not the implementation, and we're hoping to tackle this in the 1.10 time frame once plugin API versioning lands.

@sseago sseago reopened this Jun 29, 2022
@sseago sseago self-assigned this Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Plugins Issues related to plugin infra/internal plugins Enhancement/User End-User Enhancement to Velero
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants