Race condition when passing AdditionalItems from restore item action #1350

sseago · 2019-04-04T20:20:30Z

What steps did you take and what happened:
We have a restore plugin which returns resources via AdditionalItems when those items must be restored before the current item. This happens when there is a dependency relationship between two items of the same resource type. In our particular case, for the imagestreamtags.image.openshift.io resource, in some cases some imagestream tags reference other imagestreamtags, and a restore for imageStreamTagA will fail if it references imageStreamTagB which does not exist. The plugin returns imageStreamTagB in the AdditionalItems slice.

In Restore.go, after running Execute on the plugin, for any AdditionalItems returned, restoreItem is called on each one.

From looking at the logs, everything seems to be happening in the right order. The plugin action runs for the resource that references the other imagestream tag, it passes the AdditionalItem reference, and then restore is called on that resource, including calling its restore plugins. However, when I look at the target namespace in the cluster after restore, I see the returned AdditionalItem resource, but I am not seeing the first resource restored at all. I know that if we attempt to restore an imagestreamtag with a non-existent reference, it will fail to restore, which is the expected behavior (i.e. if we have an alias tag for alpine:3.x which is supposed to track alpine 3.2, if 3.2 does not exist, the tracking tag will not be restored).

In this case, however, what seems to be happening is that restoreItem for the referenced resource returns before the resource is fully created in the cluster, so when the resourceClient.Create call is made, the referenced resource does not yet exist, according to the cluster.

I just modified my local velero checkout to add time.Sleep(10 * time.Second) immediately following the restoreItem call (linked below), and everything works as expected.
https://github.com/heptio/velero/blob/master/pkg/restore/restore.go#L938

What did you expect to happen:
My expectation was that both of the resources would be restored correctly.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

(I don't really think anything in my logs will give any more useful information than is already in the description above)

Anything else you would like to add:
I'm not sure exactly how to resolve this. If there was some "after restore plugin action", then for cases like this, I could have a plugin that blocked until the resource appeared in the cluster. In this case, on the returned AdditionalItems resource, the after restore plugin would wait until it was created to return, that way the initial imagestream tag which references it wouldn't actually have its resourceClient.Create call made until the cluster initialization of the resource was done.

Environment:

Velero version (use velero version): Server version is a build off the master branch
Kubernetes version (use kubectl version): 1.12+
Kubernetes installer & version:
Cloud provider or hardware configuration: openshift 4 on aws
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

nrb · 2019-04-05T18:50:35Z

Instead of a sleep, could a watch happen on the resource? We do something similar when checking if a PV or related PVC are deleted (code), and I'm working on similar logic in the install command. Similar behavior also happens in Helm when instances of CRDs are made before the definitions are ready, like in #1195.

The problem is that plugins don't receive a client (though they could generate one, I suppose), and the ready conditions for resources are different.

Would it make sense for the restore item action plugin interface to include some sort of Ready function? I'm not sure if this would be per item or not. @skriss @carlisia PTAL at the issue and share any ideas you might have.

carlisia · 2019-04-05T19:05:12Z

Not sure yet about the how (let's discuss) but initially I'm thinking we might want the behavior of one resource waiting (watching?) multiple resources.

sseago · 2019-04-05T19:24:11Z

@nrb Sure -- I never intended sleeping as a solution -- I just used it to confirm that the only reason for the failure was that the newly-created resource returned in the AdditionalItems slice wasn't there yet. Sleeping after calling restoreItem on AdditionalItems made the failure go away. Watching until a resource is ready could work, but it can't be done in the restore item action plugin, since that plugin is called before the restore happens in the cluster. If there were some plugin that got called after the resourceClient.Create call, that plugin could watch the resource. It's probably something we'd want to do within a plugin action which would only wait if it was required, since we don't want to always wait for one item to finish before going onto the next across all resource types. Also, it would need to be available per item, since in the case I'm hitting, the two dependent items are of the same resource type -- one imagestreamtag is dependent on another imagestreamtag.

nrb · 2019-04-05T20:08:56Z

If there were some plugin that got called after the resourceClient.Create call, that plugin could watch the resource.

That was kind of my thinking with a Ready function - it might be part of the same plugin, but the Ready function would be invoked after the Create call, effectively pausing restore til Ready returns true or a timeout is reached. If the functions not defined, move on as we do now.

And yeah, we'd need to think about how it might link between items like you said.

skriss · 2019-06-06T19:59:36Z

This looks like a specific instance of #964

skriss · 2020-02-13T21:33:55Z

@nrb @sseago I'm pretty sure we can close this out, now that we've closed #964 via #1937 - I'm going to close, but please reopen if you disagree with my assessment.

sseago · 2020-02-24T15:40:31Z

@skriss Circling back to this. I don't think it's an example of #964 since this applies to waiting for the creation of resources other than CRDs. The concern here is that AdditionalItems needed by the restore aren't necessarily ready when this item's restore is processed. For example, say you're restoring an ScheduledAppointment resource that requires the existence of the User first. Creating a ScheduledAppointment resource with a reference to a non-existent user will fail, so the restore item action plugin passes the appropriate User in the AdditionalItems list, but if this User resource gets restored but isn't ready yet, the ScheduledAppointment will still fail. This is why in my original investigation, adding a sleep call made the race condition go away (even though, as I mentioned, that's not the right solution here).

nrb · 2020-02-26T22:01:02Z

Revisiting this, I think while they're generally the same problem (race conditions waiting for something to be ready), this is waiting at a different point in the code - namely for arbitrary AdditionalItems like Scott pointed out, not just CRDs.

I still think designing some sort of mechanism in to say that there needs to be a readiness check on the returned AdditionalItems makes sense. Very roughly, it could be a boolean added to the field on RestoreItemActionExecuteOutput, similar to SkipRestore, and if it's true, then invoke the plugin's Wait method, polling for readiness on the AdditionalItems.

sseago · 2020-02-27T01:48:39Z

This makes sense. I think you almost have to do this via the plugin itself, since there's no way for Velero to know how to determine whether some arbitrary resource is ready, especially considering that the resource in question may have custom actions taken on it in a plugin. One consequence of this is you may have a situation where a plugin is returning AdditionalItems of a resource type that doesn't have its own plugin today, since there may be no need for custom actions on it, but since we need to know when that non-customized resource restore is done, we may need to add a plugin for that resource that has an empty/no-op Execute method but has a Wait method which determines when it's done. The Wait approach would also work for the use case I've run into, where in some cases we just need the Velero-restored item to show up in the cluster (so waiting for client.... to return the appropriate item), and in other cases we need the action carried out by the resource's plugin to complete successfully (perhaps a docker copy, or some other operation).

sseago · 2020-03-20T15:57:35Z

@nrb If I'm understanding you correctly, you're proposing adding a new method (Wait())to the RestoreItemAction interface, in conjunction with adding a Wait field to RestoreItemActionExecuteOutput. If Wait is false, then current behavior is unchanged. If Wait is true, then after restoring the resource, call Wait() and don't move on to the next action (either the next additionalItems element, if we're there, or the resource item on the list if we're in a normal resource restore) until Wait() returns. As for the new plugin func, I'm assuming the same AppliesTo would be used to determine which Wait methods to call (perhaps more than one). If there's a generic plugin that runs on all resources (for setting annotations, etc.), then that plugin will probably have an empty Wait method, while a plugin which does processing specific to that resource type will implement the actual readiness check. For a resource that we don't currently have a plugin but which we may have other plugins which pull resources in, we may need to add a new plugin which would basically only return the original item with the Wait bool set to true, and then would have a Wait() method implemented which makes sure the resource is created.

sseago · 2020-03-20T15:58:43Z

The other issue is what about timeouts? Would velero call Wait() with a global timeout setting? Would it be per restore, or would the plugin itself define it, so different resources might have different timeout values?

nrb · 2020-03-24T21:45:53Z

@sseago Yes, there would be a field on RestoreItemActionExecuteOutput and a method on the RestoreItemAction interface, operating in tandem as you described it.

I'm not sure about using AppliesTo on the Wait() function; I was envisioning it to be the exact same plugin, so I suppose it would be with same AppliesTo criteria that had already been applied...but if there's a chain of plugins, I need to think a little bit more about how things might get selected. Right now, I believe AdditionalItems are excluded from any kind of filtering logic, so including a PV from a PVC, which came from a Pod will get restore even if you explicitly excluded cluster-scoped resources.

The timeouts are tricky too - there is a global timeout now, but we actually have different numbers for waiting for PVs & associated objects to delete (in case someone deletes something and wants to immediately restore it) and for CRDs to be ready. Loads of command line flags certainly per type isn't an option. We could use ConfigMaps for plugin settings, and a core plugin that already does it.

sseago · 2020-04-09T16:01:27Z

@nrb Right. We're using additional items to pull in cluster-scoped resources in some cases. In the case where this race condition has hit us, though, the AdditionalItems are actually resources of the same type but which this resource references. I guess the question is whether we want to call wait on the plugin that returned the AdditionalItems (even if it's a different resource type), or whether we'd want to call Wait on the chain of plugins that normally applies to the additional item's resource type. If it's a PVC calling out a PV, for example, would we want the PVC's plugin to call wait (being passed an ObjectReference for the PV), which would have to deal with the full range of resource types that it is able to pass via AdditionalItems? Alternatively, we might want to call Wait on each AdditionalItem's normal plugins, based on the AppliesTo logic. In this case, the normal PV plugin would implement wait for itself -- or if we don't already have a PV plugin, we'd need to write one which had a no-op Execute method and a filled-in Wait method.

In our ImageStreamTag use case, the distinction won't matter so much, since the case here is in the process of restoring tag1 we recognize that this tag contains an object reference to tag2, so we have to have tag2 restored before tag1, so we pass tag2 as an additional item. It gets restored, then wait is called on this plugin again with a tag2 reference -- same plugin whether it's via the appliesTo chain or because velero remembered that this plugin was the one that returned the additional item.

sseago · 2020-08-26T14:04:46Z

I've created a PR for a design for this here: #2867

eleanor-millman · 2021-05-03T21:20:52Z

Closing because the design was merged.

sseago · 2022-06-29T20:12:24Z

Reopening because only the design was merged, not the implementation, and we're hoping to tackle this in the 1.10 time frame once plugin API versioning lands.

skriss mentioned this issue Jul 9, 2019

Custom Resources are not restored #1633

Closed

skriss closed this as completed Feb 13, 2020

skriss reopened this Feb 24, 2020

nrb mentioned this issue Mar 24, 2020

Test and document behavior when backing up applications deployed by Helm #2351

Closed

carlisia added the Enhancement/User End-User Enhancement to Velero label Jun 24, 2020

nrb added the Area/Plugins Issues related to plugin infra/internal plugins label Jun 24, 2020

djwhatle mentioned this issue Aug 10, 2020

Handle ImageStreamTag restore failures related to resource dependencies and race conditions migtools/mig-controller#621

Closed

sseago mentioned this issue Aug 26, 2020

Design doc for RestoreItemAction wait for AdditionalItems to be ready #2867

Merged

sseago mentioned this issue Feb 16, 2021

Updated design to take into account protocol buffer limitations. #3464

Merged

eleanor-millman closed this as completed May 3, 2021

sseago reopened this Jun 29, 2022

sseago mentioned this issue Jun 29, 2022

RestoreItemAction v2 API #5063

Closed

sseago self-assigned this Oct 25, 2022

sseago added the 1.11-candidate label Oct 25, 2022

reasonerjt removed the 1.11-candidate label Jan 10, 2023

reasonerjt added this to the v1.11 milestone Jan 10, 2023

sseago mentioned this issue Feb 28, 2023

Wait for additional items to be ready before restoring current item #5933

Merged

1 task

Lyndon-Li closed this as completed in #5933 Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition when passing AdditionalItems from restore item action #1350

Race condition when passing AdditionalItems from restore item action #1350

sseago commented Apr 4, 2019

nrb commented Apr 5, 2019

carlisia commented Apr 5, 2019

sseago commented Apr 5, 2019

nrb commented Apr 5, 2019

skriss commented Jun 6, 2019

skriss commented Feb 13, 2020

sseago commented Feb 24, 2020

nrb commented Feb 26, 2020

sseago commented Feb 27, 2020

sseago commented Mar 20, 2020

sseago commented Mar 20, 2020

nrb commented Mar 24, 2020

sseago commented Apr 9, 2020

sseago commented Aug 26, 2020

eleanor-millman commented May 3, 2021

sseago commented Jun 29, 2022

Race condition when passing AdditionalItems from restore item action #1350

Race condition when passing AdditionalItems from restore item action #1350

Comments

sseago commented Apr 4, 2019

nrb commented Apr 5, 2019

carlisia commented Apr 5, 2019

sseago commented Apr 5, 2019

nrb commented Apr 5, 2019

skriss commented Jun 6, 2019

skriss commented Feb 13, 2020

sseago commented Feb 24, 2020

nrb commented Feb 26, 2020

sseago commented Feb 27, 2020

sseago commented Mar 20, 2020

sseago commented Mar 20, 2020

nrb commented Mar 24, 2020

sseago commented Apr 9, 2020

sseago commented Aug 26, 2020

eleanor-millman commented May 3, 2021

sseago commented Jun 29, 2022