New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition when passing AdditionalItems from restore item action #1350
Comments
Instead of a sleep, could a watch happen on the resource? We do something similar when checking if a PV or related PVC are deleted (code), and I'm working on similar logic in the install command. Similar behavior also happens in Helm when instances of CRDs are made before the definitions are ready, like in #1195. The problem is that plugins don't receive a client (though they could generate one, I suppose), and the ready conditions for resources are different. Would it make sense for the restore item action plugin interface to include some sort of Ready function? I'm not sure if this would be per item or not. @skriss @carlisia PTAL at the issue and share any ideas you might have. |
Not sure yet about the how (let's discuss) but initially I'm thinking we might want the behavior of one resource waiting (watching?) multiple resources. |
@nrb Sure -- I never intended sleeping as a solution -- I just used it to confirm that the only reason for the failure was that the newly-created resource returned in the AdditionalItems slice wasn't there yet. Sleeping after calling restoreItem on AdditionalItems made the failure go away. Watching until a resource is ready could work, but it can't be done in the restore item action plugin, since that plugin is called before the restore happens in the cluster. If there were some plugin that got called after the resourceClient.Create call, that plugin could watch the resource. It's probably something we'd want to do within a plugin action which would only wait if it was required, since we don't want to always wait for one item to finish before going onto the next across all resource types. Also, it would need to be available per item, since in the case I'm hitting, the two dependent items are of the same resource type -- one imagestreamtag is dependent on another imagestreamtag. |
That was kind of my thinking with a And yeah, we'd need to think about how it might link between items like you said. |
This looks like a specific instance of #964 |
@skriss Circling back to this. I don't think it's an example of #964 since this applies to waiting for the creation of resources other than CRDs. The concern here is that AdditionalItems needed by the restore aren't necessarily ready when this item's restore is processed. For example, say you're restoring an ScheduledAppointment resource that requires the existence of the User first. Creating a ScheduledAppointment resource with a reference to a non-existent user will fail, so the restore item action plugin passes the appropriate User in the AdditionalItems list, but if this User resource gets restored but isn't ready yet, the ScheduledAppointment will still fail. This is why in my original investigation, adding a sleep call made the race condition go away (even though, as I mentioned, that's not the right solution here). |
Revisiting this, I think while they're generally the same problem (race conditions waiting for something to be ready), this is waiting at a different point in the code - namely for arbitrary AdditionalItems like Scott pointed out, not just CRDs. I still think designing some sort of mechanism in to say that there needs to be a readiness check on the returned AdditionalItems makes sense. Very roughly, it could be a boolean added to the field on |
This makes sense. I think you almost have to do this via the plugin itself, since there's no way for Velero to know how to determine whether some arbitrary resource is ready, especially considering that the resource in question may have custom actions taken on it in a plugin. One consequence of this is you may have a situation where a plugin is returning AdditionalItems of a resource type that doesn't have its own plugin today, since there may be no need for custom actions on it, but since we need to know when that non-customized resource restore is done, we may need to add a plugin for that resource that has an empty/no-op Execute method but has a Wait method which determines when it's done. The Wait approach would also work for the use case I've run into, where in some cases we just need the Velero-restored item to show up in the cluster (so waiting for client.... to return the appropriate item), and in other cases we need the action carried out by the resource's plugin to complete successfully (perhaps a docker copy, or some other operation). |
@nrb If I'm understanding you correctly, you're proposing adding a new method ( |
The other issue is what about timeouts? Would velero call Wait() with a global timeout setting? Would it be per restore, or would the plugin itself define it, so different resources might have different timeout values? |
@sseago Yes, there would be a field on I'm not sure about using The timeouts are tricky too - there is a global timeout now, but we actually have different numbers for waiting for PVs & associated objects to delete (in case someone deletes something and wants to immediately restore it) and for CRDs to be ready. Loads of command line flags certainly per type isn't an option. We could use ConfigMaps for plugin settings, and a core plugin that already does it. |
@nrb Right. We're using additional items to pull in cluster-scoped resources in some cases. In the case where this race condition has hit us, though, the AdditionalItems are actually resources of the same type but which this resource references. I guess the question is whether we want to call wait on the plugin that returned the AdditionalItems (even if it's a different resource type), or whether we'd want to call Wait on the chain of plugins that normally applies to the additional item's resource type. If it's a PVC calling out a PV, for example, would we want the PVC's plugin to call wait (being passed an ObjectReference for the PV), which would have to deal with the full range of resource types that it is able to pass via AdditionalItems? Alternatively, we might want to call Wait on each AdditionalItem's normal plugins, based on the AppliesTo logic. In this case, the normal PV plugin would implement wait for itself -- or if we don't already have a PV plugin, we'd need to write one which had a no-op Execute method and a filled-in Wait method. In our ImageStreamTag use case, the distinction won't matter so much, since the case here is in the process of restoring tag1 we recognize that this tag contains an object reference to tag2, so we have to have tag2 restored before tag1, so we pass tag2 as an additional item. It gets restored, then wait is called on this plugin again with a tag2 reference -- same plugin whether it's via the appliesTo chain or because velero remembered that this plugin was the one that returned the additional item. |
I've created a PR for a design for this here: #2867 |
Closing because the design was merged. |
Reopening because only the design was merged, not the implementation, and we're hoping to tackle this in the 1.10 time frame once plugin API versioning lands. |
What steps did you take and what happened:
We have a restore plugin which returns resources via
AdditionalItems
when those items must be restored before the current item. This happens when there is a dependency relationship between two items of the same resource type. In our particular case, for theimagestreamtags.image.openshift.io
resource, in some cases some imagestream tags reference other imagestreamtags, and a restore forimageStreamTagA
will fail if it referencesimageStreamTagB
which does not exist. The plugin returnsimageStreamTagB
in theAdditionalItems
slice.In Restore.go, after running Execute on the plugin, for any
AdditionalItems
returned,restoreItem
is called on each one.From looking at the logs, everything seems to be happening in the right order. The plugin action runs for the resource that references the other imagestream tag, it passes the AdditionalItem reference, and then restore is called on that resource, including calling its restore plugins. However, when I look at the target namespace in the cluster after restore, I see the returned AdditionalItem resource, but I am not seeing the first resource restored at all. I know that if we attempt to restore an imagestreamtag with a non-existent reference, it will fail to restore, which is the expected behavior (i.e. if we have an alias tag for alpine:3.x which is supposed to track alpine 3.2, if 3.2 does not exist, the tracking tag will not be restored).
In this case, however, what seems to be happening is that
restoreItem
for the referenced resource returns before the resource is fully created in the cluster, so when theresourceClient.Create
call is made, the referenced resource does not yet exist, according to the cluster.I just modified my local velero checkout to add
time.Sleep(10 * time.Second)
immediately following the restoreItem call (linked below), and everything works as expected.https://github.com/heptio/velero/blob/master/pkg/restore/restore.go#L938
What did you expect to happen:
My expectation was that both of the resources would be restored correctly.
The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)
(I don't really think anything in my logs will give any more useful information than is already in the description above)
Anything else you would like to add:
I'm not sure exactly how to resolve this. If there was some "after restore plugin action", then for cases like this, I could have a plugin that blocked until the resource appeared in the cluster. In this case, on the returned
AdditionalItems
resource, the after restore plugin would wait until it was created to return, that way the initial imagestream tag which references it wouldn't actually have itsresourceClient.Create
call made until the cluster initialization of the resource was done.Environment:
velero version
): Server version is a build off the master branchkubectl version
): 1.12+/etc/os-release
):The text was updated successfully, but these errors were encountered: