Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Rewrite operator to use chartsync #1254

Merged
merged 5 commits into from
Aug 20, 2018

Conversation

squaremo
Copy link
Member

No description provided.

@stefanprodan
Copy link
Member

stefanprodan commented Aug 7, 2018

@squaremo I've found a problem, if I do kubectl -n dev delete fluxhelmreleases/podinfo-dev the helm operator will not be able to reinstall it anymore:

ts=2018-08-07T13:06:48.710764001Z caller=operator.go:314 component=operator info="DELETING release"
ts=2018-08-07T13:06:48.710848148Z caller=operator.go:315 component=operator info="Custom Resource driven release deletion"
ts=2018-08-07T13:06:48.805145698Z caller=release.go:123 component=release info="Release [podinfo-dev] status: DEPLOYED"
ts=2018-08-07T13:06:48.805257812Z caller=release.go:126 component=release info="Deleting release (podinfo-dev)"
ts=2018-08-07T13:06:50.335015465Z caller=release.go:236 component=release info="Release deleted: [podinfo-dev]"
ts=2018-08-07T13:06:50.335080302Z caller=operator.go:105 component=operator info="CREATING release"
ts=2018-08-07T13:06:50.335091671Z caller=operator.go:106 component=operator info="Custom Resource driven release install"
ts=2018-08-07T13:06:50.340301738Z caller=operator.go:173 component=operator debug="PROCESSING item [\"dev/podinfo-dev\"]"
ts=2018-08-07T13:06:50.340360019Z caller=operator.go:230 component=operator debug="Starting to sync cache key dev/podinfo-dev"
ts=2018-08-07T13:06:50.340462079Z caller=operator.go:215 component=operator info="Successfully synced 'dev/podinfo-dev'"
ts=2018-08-07T13:06:50.340473249Z caller=operator.go:170 component=operator debug="Processing next work queue job ..."
I0807 13:06:50.340922       7 event.go:221] Event(v1.ObjectReference{Kind:"FluxHelmRelease", Namespace:"dev", Name:"podinfo-dev", UID:"be68a4c2-9a42-11e8-b837-42010a9c0267", APIVersion:"helm.integrations.flux.weave.works/v1alpha2", ResourceVersion:"2715292", FieldPath:""}): type: 'Normal' reason: 'ChartSynced' Chart managed by FluxHelmRelease processed successfully
ts=2018-08-07T13:06:50.5326949Z caller=chartsync.go:239 component=chartsync warning="failed to get release from Tiller" release=podinfo-dev error="rpc error: code = Unknown desc = release: \"podinfo-dev\" not found"
ts=2018-08-07T13:06:52.309333935Z caller=status.go:75 component=annotator err="rpc error: code = Unknown desc = release: \"podinfo-dev\" not found"

Kubernetes' client-go package uses glog, which obnoxiously installs
its own flags into flag.CommandLine, then complains if you don't call
`flag.Parse()`.

Since we use pflag instead of flag, to shut glog up we have to give it
what it wants by supplying a dummy set of args (just the one that
makes it log only to stderr).
Both the chartsync package and the operator package need to install or
upgrade helm releases depending on the definitions given in
FluxHelmRelease resources. To deduplicate, drive these operations
through the chartsync, by factoring out the bit which compares a
FluxHelmRelease to the state of a release, and takes action if
necessary.

The operator package also deletes releases, when it's told a
FluxHelmRelease has been deleted. So that we only have one thing
through which we operate on the cluster, I've added that to chartsync
as well.

When starting, the operator is told about all existing FluxHelmRelease
resources as though they had been added; previously this meant that it
would (often redundantly) upgrade all the associated releases. Since
the reconcilation compares what would be released with what _is_
released, before issuing an upgrade, a side-effect of this factoring
is that the redundant upgrades aren't done.
If we just clone the repo at the branch HEAD whenever the operator
wants to apply a change, we may end up undoing, or redoing, work,
since the ChartChangeSync only advances the revision it looks at when
it's ready to.

To avoid that, pass the desired reconcilations to the ChartChangeSync
and have it do the work, at the revision it's currently working from.

The mechanism is to send the resource in question on an unbuffered
channel, and receive it in the Run loop of ChartChangeSync. This has a
couple of flaws:

 - the event handlers have to wait for the Run loop to get around to
   doing the work (or time out and retry);

 - there's not much point in running more than one worker to process
   events, since they all get serialised through the ChartChangeSync
   anyway (not quite true -- deletes are done straight away because
   they don't need to consult the git repo)

But it is good enough for now.
@squaremo squaremo force-pushed the rewrite/run-operator-events-via-chartsync branch from 077939a to 20ce84a Compare August 17, 2018 13:14
Before I rewrote things to drive all the syncing through chartsync,
syncing used `release.Exists(...)` to determine whether to Install or
Upgrade a given release. That worked by coincidence since any error,
including but not only a "not found" error, was treated as a negative
result.

However: it was a useful coincidence, because _just_ checking the
error return value means a "not found" will exit early rather than go
on to install the (possibly) missing release. So, ignore the error
value, and choose whether to install or upgrade based on whether there
was a release value returned or not, as `Exists` did.
Using a RW lock means that we only have to serialise access when we
swap the clone for another. It's also a bit easier to grok than
running requests through a channel, in my opinion.
@squaremo squaremo changed the title [WIP] Rewrite operator to use chartsync Rewrite operator to use chartsync Aug 17, 2018
@squaremo
Copy link
Member Author

@stefanprodan Thanks for finding that problem -- I reproduced it, and fixed it in subsequent commits.

@squaremo squaremo merged commit 2967072 into master Aug 20, 2018
@squaremo squaremo deleted the rewrite/run-operator-events-via-chartsync branch August 20, 2018 12:15
@squaremo
Copy link
Member Author

(merged using my "Get out of alpha free" card)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants