Make all Helm releases through the operator #1906

hiddeco · 2019-04-05T17:33:26Z

This PR changes a couple of things in the Helm operator (details are commented in each commit), but in one sentence: it ensures all releases are made by the operator.

(If someone has a better one sentence recap of this PR; be my guest to change it)

squaremo · 2019-04-08T11:03:29Z

Trigger HelmRelease update on repository change to trigger an update through the operator, instead of calling the blocking chs.reconcileReleaseDef(fhr). This can probably be done by adding, editing or removing an annotation on the HelmRelease, as those count as updates.

Per your comment, if we can do this via the controller/informer/thingdoer, that would be preferable I reckon .. IIRC though, it is tricky to get at the individual bits of a controller once you've constructed it, so it might require some contortions :-/

Before this change there was a ticker in the ChartChangeSync loop which 'rereleased' all HelmReleases on the set interval, to prevent mutations due to e.g. manual chart releases. This commit moves this logic to the operator by utilizing the resync interval that can be set on the shared informer. On every resync interval, all objects known at that time to the informer are redelivered as 'updates', causing the same effect as the ticker process described above, but with several improvements. 1. The syncHandler validates if a HelmRelease still exists before attempting to run the release. This is an improvement compared to the blind run we did before, where in case a user had many HelmReleases, the Helm release could be removed at the time the release attempt was made. 2. We now act on information from one source (except for the `reconcileReleaseDef()` call on repository changes, see added comment).

hiddeco · 2019-04-09T16:38:36Z

Do not trigger update event on status changes

From my observations patching the subresource still triggers an update event @stefanprodan, from the brief discussion we had I thought this would not be the case. Is there something wrong with how I apply the patch or did we have a misunderstanding?

Besides that, PTAL @squaremo @2opremio, I think you will like it.

squaremo · 2019-04-09T16:57:55Z

I think you will like it.

At +192 −209 that seems a near certainty :-)

Before this change, and due to the fact that we no longer compare the spec of the given HelmRelease against the spec of the old one, every status and condition update would enqueue a new release job. By updating the subresource instead of the resource, we no longer trigger the update event, causing no unwanted trigger of a release.

2opremio · 2019-04-15T16:48:39Z

@hiddeco Have you reviewed the large comment at the beginning of chartsync.go? I would guess it requires updating.

hiddeco · 2019-04-15T16:49:43Z

@2opremio err, no, that caught my eye but got lost in my mind. Will rewrite it, good catch.

integrations/client/clientset/versioned/typed/flux.weave.works/v1beta1/fake/fake_helmrelease.go

integrations/client/clientset/versioned/typed/flux.weave.works/v1beta1/helmrelease.go

This commit changes a couple of things in how the operator processes and acts on mirror changes. 1. On mirror changes, instead of first requesting all namespaces, and then collecting all HelmReleases (per namespace) from the API, all HelmReleases are now collected from the informer (which caches all HelmReleases the operator has seen), which already takes the namespace restriction into account. 2. Instead of looping over a set of _all_ HelmReleases, determining if they make use of one of the changed mirrors, and calculating what we should do based upon the changed mirror state; we now loop over the changed mirrors we received and request HelmReleases for each one of them, we then calculate what we should do once per mirror and execute it for each HelmRelease we requested earlier. This greatly reduces the amount of calculations we make without changing the outcome. 3. We no longer call the blocking `chs.reconcileReleaseDef()` for each HelmRelease but instead enqueue the HelmReleases in the working queue the operator consumes from. As the operator consuming items from the queue validates if a HelmRelease still exists just before executing release, this prevents cases where we would attempt to release a HelmRelease we still had in memory (due to collecting all HelmReleases at once), while the HelmRelease already had been removed.

This is a optimistic lock and prevents the operator from potentially releasing an old revision of a `HelmRelease`. As there is a (small) chance the spec is changed between the time we started preparing and calculating the upgrade, and the actual upgrade itself.

And just broadcast the chart was synced, without implying success, as the event will always fire regardless of the sync outcome.

hiddeco · 2019-04-16T09:41:08Z

@2opremio I addressed the comment block in chartsync.go by rewriting history (5893fa0).

2opremio · 2019-04-16T10:59:52Z

It LGTM, but I am still no expert on the helm operator so I may be missing problems.

@stefanprodan mind taking a look?

stefanprodan

LGTM

great job @hiddeco

During the refactor of synchronization mechanics in #1906, this got accidentally moved outside the condition that checks if we have a clone or it needs updating, with the result that a release is always enqueued on a signal from the mirror it is bound to. Move it back inside the condition so we only enqueue releases we actually have updates from git for.

Before this change the operator would hang at random on "waiting for informer caches to sync", without anything else happening or it timing out. After an extensive search in recently merged PRs, this seems to be related to an change of order in #1906, where the informer would be initialized before the operator had registered its event handlers. In addition to reverting this change, reorder everything so that the operator is started before anything else that may depend on it, so it can start processing changes from the queue as soon as possible.

Before this change the operator would hang at random on "waiting for informer caches to sync", without anything else happening or it timing out. After an extensive search in recently merged PRs, this seems to be related to an change of order in #1906, where the informer would be initialized before the operator had registered its event handlers. In addition to reverting this change and ensuring the informer cache is synced before it is used, reorder everything so that the operator is started before anything else that may depend on it, so it can start processing changes from the queue as soon as possible.

hiddeco added the helm label Apr 5, 2019

hiddeco force-pushed the helm/refactor-hr-sync branch from c5377e9 to 0f34784 Compare April 5, 2019 17:36

hiddeco force-pushed the helm/refactor-hr-sync branch 5 times, most recently from 3f60c90 to 1ed7486 Compare April 9, 2019 14:30

hiddeco force-pushed the helm/refactor-hr-sync branch 3 times, most recently from 05cc815 to 0e503ed Compare April 9, 2019 16:16

hiddeco changed the title ~~Refactor Helm operator sync loops~~ Make all Helm releases through the controller Apr 9, 2019

hiddeco changed the title ~~Make all Helm releases through the controller~~ Make all Helm releases through the operator Apr 9, 2019

hiddeco requested review from squaremo and 2opremio April 9, 2019 16:36

hiddeco force-pushed the helm/refactor-hr-sync branch from 1485c8c to 93a34f4 Compare April 11, 2019 12:55

hiddeco marked this pull request as ready for review April 11, 2019 12:56

2opremio reviewed Apr 15, 2019

View reviewed changes

integrations/client/clientset/versioned/typed/flux.weave.works/v1beta1/fake/fake_helmrelease.go Show resolved Hide resolved

2opremio reviewed Apr 15, 2019

View reviewed changes

integrations/client/clientset/versioned/typed/flux.weave.works/v1beta1/helmrelease.go Show resolved Hide resolved

hiddeco added 4 commits April 15, 2019 22:33

Revise and uniform all Helm operator logs

a02f152

Stop logging broadcasted events in the operator

439787a

And just broadcast the chart was synced, without implying success, as the event will always fire regardless of the sync outcome.

hiddeco force-pushed the helm/refactor-hr-sync branch from 93a34f4 to 439787a Compare April 15, 2019 20:33

2opremio approved these changes Apr 16, 2019

View reviewed changes

stefanprodan approved these changes Apr 16, 2019

View reviewed changes

hiddeco merged commit ff93ae0 into master Apr 16, 2019

hiddeco deleted the helm/refactor-hr-sync branch April 16, 2019 11:22

hiddeco mentioned this pull request May 23, 2019

Enqueue release on clone change only #2081

Merged

hiddeco mentioned this pull request May 29, 2019

Reorder boot and verify informer cache sync #2103

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make all Helm releases through the operator #1906

Make all Helm releases through the operator #1906

hiddeco commented Apr 5, 2019 •

edited

Loading

squaremo commented Apr 8, 2019

hiddeco commented Apr 9, 2019

squaremo commented Apr 9, 2019

2opremio commented Apr 15, 2019

hiddeco commented Apr 15, 2019

hiddeco commented Apr 16, 2019

2opremio commented Apr 16, 2019

stefanprodan left a comment

Make all Helm releases through the operator #1906

Make all Helm releases through the operator #1906

Conversation

hiddeco commented Apr 5, 2019 • edited Loading

squaremo commented Apr 8, 2019

hiddeco commented Apr 9, 2019

squaremo commented Apr 9, 2019

2opremio commented Apr 15, 2019

hiddeco commented Apr 15, 2019

hiddeco commented Apr 16, 2019

2opremio commented Apr 16, 2019

stefanprodan left a comment

Choose a reason for hiding this comment

hiddeco commented Apr 5, 2019 •

edited

Loading