Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify BackupStoreGetter to avoid BSL spec changes #5122

Merged
merged 1 commit into from Jul 19, 2022

Conversation

sseago
Copy link
Collaborator

@sseago sseago commented Jul 14, 2022

Signed-off-by: Scott Seago sseago@redhat.com

Thank you for contributing to Velero!

Modify BackupStoreGetter to avoid BSL spec changes

Pass in a new copy of the map of config values rather than
modifying the BSL Spec.Config and then pass in that field.

Those extra fields in spec.config are only needed for the in-memory copy passed into plugins. Without this change to the BackupStoreGetter, the BSL spec is modified on disk and in the OADP case, the OADP controller then noticed the change and removed the fields, causing constant BSL spec updates, causing constant validation reconciles.

Does your change fix a particular issue?

Fixes #(issue)

Please indicate you've done the following:

  • [x ] Accepted the DCO. Commits without the DCO will delay acceptance.
  • [x ] Created a changelog file or added /kind changelog-not-required as a comment on this pull request.
  • Updated the corresponding documentation in site/content/docs/main.

@codecov-commenter
Copy link

codecov-commenter commented Jul 14, 2022

Codecov Report

Merging #5122 (4a21825) into main (eaf97e7) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #5122   +/-   ##
=======================================
  Coverage   41.33%   41.33%           
=======================================
  Files         211      211           
  Lines       18443    18443           
=======================================
  Hits         7624     7624           
  Misses      10247    10247           
  Partials      572      572           
Impacted Files Coverage Δ
...g/controller/backup_storage_location_controller.go 64.40% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eaf97e7...4a21825. Read the comment docs.

@blackpiglet
Copy link
Contributor

@sseago
#5101 is similar to this issue. Not sure whether this PR also fix your case.

@kaovilai
Copy link
Contributor

kaovilai commented Jul 15, 2022

@blackpiglet #5101 is what caused issue with OADP.
#5101 causes BSL to revalidate when spec changes.

Each time revalidation is called objectStoreGetter.Get is called inside Reconcile function

backupStore, err := r.BackupStoreGetter.Get(&location, pluginManager, log)

location's spec.Config is modified

location.Spec.Config["bucket"] = bucket
location.Spec.Config["prefix"] = prefix

OADP will try to undo any unexpected spec changes with BSL inside our custom resource.

If BSL inside OADP custom resource do not contain bsl.spec.Config["bucket"], we will remove any added configs from above reconcile which causes spec change and revalidation to run again and this causes infinite and high frequency execution competition between two controllers.

This PR has to be looked at carefully since it can break some plugins that may expect to read spec.Config["bucket"] and spec.Config["prefix"]. It may be a third party plugin written by non velero authors.

@kaovilai
Copy link
Contributor

kaovilai commented Jul 15, 2022

@sseago @shubham-pampattiwar a better fix might be from OADP side where we put the following logic into OADP itself. This would not break anything upstream or third party plugins and fixes our problem. Since the location.Spec.Config code is from 17 months ago I would expect there to be some niche plugins relying on it already.

if location.Spec.ObjectStorage != nil {
if location.Spec.Config == nil {
location.Spec.Config = make(map[string]string)
}
location.Spec.Config["bucket"] = bucket
location.Spec.Config["prefix"] = prefix
// Only include a CACert if it's specified in order to maintain compatibility with plugins that don't expect it.
if location.Spec.ObjectStorage.CACert != nil {
location.Spec.Config["caCert"] = string(location.Spec.ObjectStorage.CACert)
}
}

@sseago
Copy link
Collaborator Author

sseago commented Jul 16, 2022

@kaovilai It's not really only an OADP issue -- this will happen any time a BSL is managed by an external controller/operator, since the Velero BSL validation code was actually modifying spec and updating the resource. The fix in this PR is to validate a copy of the BSL resource, not the same in-memory object that we're going to update status on and make a patch API call. This PR shouldn't affect actually calling the plugin, since I only modified the call to BackupStoreGetter used for validation here. But yes, we should confirm that the plugin only ever uses in-memory BSL modified by this method rather than making a separate Get call for it later, but if the latter were the case, then OADP would already be broken with respect to BSLs, since any time this is done, the OADP controller will un-do the change.

Passing a deep copy of the BSL into the getter feels like the right thing to do here, as Velero shouldn't need to modify the spec like that -- if it did, then everyone who manages BSLs automatically would have to hack around this like you suggested OADP do. That feels like an awful hack.

@sseago
Copy link
Collaborator Author

sseago commented Jul 16, 2022

I could be completely wrong here and prior versions of Velero did end up modifying the BSL in-cluster, and prior versions of OADP ended up in a continual modify-the-BSL battle between OADP operator and Velero controller. It would be easy enough to verify with OADP 1.0.3 by doing a watch on the BSL resource yaml.

@sseago
Copy link
Collaborator Author

sseago commented Jul 16, 2022

@kaovilai ok, so basically every time we need an objectstore to work with (backup controller, restore controller, etc.) we call ObjectBackupStoreGetter.Get() which modifies the passed-in BSL object and then uses that modified config to initialize the plugin. Of note is that the Init call takes a map with the contents of config in it (so it comes from memory, not the cluster), and that's used to initialize the plugin. The plugin doesn't work with the BSL kube resource directly, only the map of config values passed in.

Also of note is that in every other controller that calls ObjectBackupStoreGetter.Get() does not issue a patch or update API call on the BSL afterwards. Only this one does that. This one also only uses the initialize object store for validation, so I do think that not modifying the resource on disk is the original intent of the code here, since none of the controllers that use the object store for real work modify the in-cluster resource -- and the one that does is only initializing the object store for validation, so it seems like we shouldn't be putting this change into the cluster.

There is one remaining part of the BSL controller reconcile that could still trip OADP up here -- the controller also updates location.Spec.Default, so if velero ever ends up changing that, we'll hit the same continual refresh problem with OADP, but I think in that case the fix would be on the user side -- for the OADP user to update spec.Default in the DPA spec to whatever Velero thinks it should be.

@@ -122,7 +122,7 @@ func (r *BackupStorageLocationReconciler) Reconcile(ctx context.Context, req ctr
}
}()

backupStore, err := r.BackupStoreGetter.Get(&location, pluginManager, log)
backupStore, err := r.BackupStoreGetter.Get(location.DeepCopy(), pluginManager, log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment on why we are doing DeepCopy() here. Alternatively do deep copy in the objectStoreGetter Get() function and add a comment there instead. Fixing Get function (so that it actually just get something and not setting anything else) rather than fixing Reconcile function which innocently called Get not expecting change.

Suggested change
backupStore, err := r.BackupStoreGetter.Get(location.DeepCopy(), pluginManager, log)
// Using DeepCopy to avoid modifying in-cluster BackupStorageLocation.
// Get function modifies the location.Spec.Config which we don't want updated to the in-cluster version
backupStore, err := r.BackupStoreGetter.Get(location.DeepCopy(), pluginManager, log)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaovilai Yeah, I'm leaning towards the latter. The only aspect of the BSL that's passed into the ObjectStore plugin is actually the map of config values. Lets copy that map instead of deep copying the whole BSL, and pass it in. No side effect on the kube object. As for 3rd party plugins relying on the BSL in-cluster mod, while I don't think any are doing that now, if they are, it's a dangerous assumption even with the released 1.9 code, since as we've seen with OADP, the change isn't reliably made. Better to ensure it's never made so that breakage is obvious rather than nondeterministic.

Copy link
Collaborator Author

@sseago sseago Jul 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "Get" function should not generally have side effects. If it does have side effects, they had better be well-documented and made with good reason. In this case, they are neither documented nor made with any reason to do so.

@blackpiglet
Copy link
Contributor

I agree with using deepcopy to avoid overwritten, but BackupStoreGetter.Get is called in several other places too.
Is it possible to have a generic fix for all the called places in this PR?

@blackpiglet
Copy link
Contributor

blackpiglet commented Jul 18, 2022

May take more time to evaluate how to create a modification for all places, so agree with @kaovilai, adding more comments will be good enough for this case.

@sseago sseago changed the title Pass in copy of BSL to BackupStoreGetter to avoid spec changes Modify BackupStoreGetter to avoid BSL spec changes Jul 18, 2022
@sseago
Copy link
Collaborator Author

sseago commented Jul 18, 2022

@kaovilai Update PR leaves controller alone and modifies BackupStoreGetter

@blackpiglet So in the original PR, we didn't need the DeepCopy elsewhere because the BSL passed in was never saved after modification. Still, it's confusing, so the updated PR simply makes a deep copy of BSL.Spec.Config and passes that into the object store, so there's no need for anyone to DeepCopy the entire BSL object before passing it into BackupStoreGetter.

Pass in a new copy of the map of config values rather than
modifying the BSL Spec.Config and then pass in that field.

Signed-off-by: Scott Seago <sseago@redhat.com>
@@ -131,19 +131,25 @@ func (b *objectBackupStoreGetter) Get(location *velerov1api.BackupStorageLocatio
return nil, errors.Errorf("backup storage location's bucket name %q must not contain a '/' (if using a prefix, put it in the 'Prefix' field instead)", location.Spec.ObjectStorage.Bucket)
}

// Pass a new map into the object store rather than modifying the passed-in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blackpiglet blackpiglet merged commit 68730cb into vmware-tanzu:main Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants