Fix when Helm provider ignores FAILED release state #161

stepanstipl · 2018-11-26T16:39:18Z

This moves status field out of metadata ~~non-computed field~~ and uses CustomizeDiff in order to track a state of Helm release and enforce that the state is DEPLOYED.

This fixes behavior when Terraform would ignore previously created release that is in a FAILED state and failed to try to bring it to the desired state (assuming this will always be DEPLOYED)

See #159 for more detail (the issue has detailed steps about how to reproduce the issue).

This PR adds a test for this issue, to run the integration test(s):

you need to have kube cluster setup & kubectl configured (verify by running kubectl get pods)
you need to have Helm installed & Helm client configured (verify by running helm list)
set env variable TF_ACC to true to enable acceptance tests (export TF_ACC=true)
you need to have go 1.10+ installed and this project in correct path as per Go conventions($GOPATH/src/github.com/terraform-providers/terraform-provider-helm)
you can run all the tests by running
```
go test -v helm/*
```
~~but expect some of them to fail (I believe it's not related to this change, these were failing for me both with and without this code change),~~
To run just the test for this issue:
```
go test -v -run "TestAccResourceRelease_updateExistingFailed" helm/*
```

Update: After rebasing on latest master and fixing 2 remaining tests (some fail only when running agains GKE and not when running against minikube) all the tests are expected to pass, both against minikube & GKE.

ianks · 2018-11-30T01:41:55Z

We desperately need this. this bug is causing our state to become unrecoverable in some cases.

legal90 · 2018-11-30T09:50:37Z

@stepanstipl Thank you! That's really needed.

But I just looked on the trick with a default value and verification. Maybe we can make the "status" attribute Computed and assign its desired value in the scope of CustomizeDiff, like we do it with "version" attribute? I guess the resource UX would be a bit more clear, since this attribute will be computed-only, so user will not be able to set it via configuration.
Check it out if you are interested: legal90@5ceda8d

stepanstipl · 2018-12-02T16:03:07Z

@legal90 thanks, that sounds good. I'll have a closer look and try to test if it works as expected in our case, but I like it! I originally tried to keep the status nested under the computed metadata, but I wasn't able to figure that one out, so I gave up on it and moved it one level up and as Optional field.

One other thing I noticed in your code - r.GetInfo().GetStatus().GetCode().String() vs r.Info.Status.Code.String(). I agree that the 2nd one is more readable, but at the same time I had an impression it's better to access objects via. accessor Get* methods if they're available. Myself being quite new to go - can you elaborate on your preference?

legal90 · 2018-12-03T09:19:21Z

r.GetInfo().GetStatus().GetCode().String() vs r.Info.Status.Code.String() <...> Myself being quite new to go - can you elaborate on your preference?

I was just keeping the same approach as in the rest of the code here. I believe this approach could be changed in a separated Pull-Request if needed.

stepanstipl · 2018-12-05T19:12:46Z

I have refactored code as per @legal90's suggestion (thanks, it's better this way) - the status is now Computed field. The resulting functionality stays same. I've tested that It still does fix the issue in our use case, it does work for the steps descibed in #159 and the test code added originally stayed same and passes.

meyskens · 2018-12-13T15:40:47Z

@stepanstipl can you rebase on master? We fixed a lot of things in the acceptance tests.

bluemalkin · 2018-12-13T21:50:37Z

Which scheduled release can we expect this to be in please? Keep up the good work!

This moves status field out of metadata non-computer field and uses `CustomizeDiff` in order to track a state of Helm release and enforce that the state is `DEPLOYED`. This fixes behavior when Terraform would ignore previously created release that is in a `FAILED` state and failed to try to bring it to the desired state (assuming this will always be `DEPLOYED`) See hashicorp#159 for more details (the issue has detailed steps about how to reproduce the issue).

stepanstipl · 2018-12-14T15:49:24Z

@meyskens done, nice job with tests!

I've rebased on master and cleaned up my commits, also fixed a bug introduced by the earlier refactoring.

TestAccResourceRelease_updateAfterFail was still failing for me (even on the original master, but I think this is related to running tests on minikube, as service of type LoadBalancer will never be ready), so I've fixed that one as well.

Finally TestAccResourceRelease_repository was failing as well when running against GKE cluster with a message chart "telegraf" matching 1.0.2 not found in stable-repo index. I believe this has to do with a combination of newer helm (0.12) and the fact that telegraf chart is marked as deprecated everywhere (see https://github.com/helm/charts/tree/master/stable/telegraf).

Now the tests pass both against minikube and real GKE clusters.

When running against GKE cluster, the test with `telegraf` chart would fail on chart "telegraf" matching 1.0.2 not found in stable-repo index. I believe it has to do with combination of new helm version (0.12) and the fact that all telegraf charts are marked as deprecated.

stepanstipl · 2019-01-10T18:14:06Z

Hi, @meyskens is there anything else that's needed? Sorry for rushing, I'm just wondering if it looks like we can get this in :)

meyskens · 2019-01-11T09:08:35Z

@stepanstipl sorry for the delay, the code looks great! Just going to run the tests again as it seems Travis timed out but still flagged it green.

legal90 · 2019-02-10T12:39:57Z

helm/resource_release.go

-	if err == nil {
-		return d.SetNew("version", c.Metadata.Version)
+	if err != nil {
+		return nil


@stepanstipl Could you please remind - why did you make the call of getChart tolerant to errors?

As I see, if the helm client (library) fails to fetch the chart, that should be a valid point for throwing an error, like it happens with other places when getChart is called (see resourceReleaseCreate, resourceReleaseUpdate functions). Otherwise we can miss the diff and the update won't be triggered.

Hah, it took me a while to remember/re-figure out why I did that, long time :).

So the CustomizeDiff, and therefore resourceDiff, is executed before the chart is available in some cases. In that scenario getChart would fail and therefore fail the whole plan/apply run, while in fact the run would complete just fine.

Good example of this is the TestAccResourceRelease_repository test case, when the chart is only available after the repo is added via the helm_repository resource.

@stepanstipl Thank you for the explanation! Now I understand. Your test case failed because CustomizeDiff functions of all resources are called before these resources are created/updated.
I asked just because I run into the opposite situation - my custom repo was unavailable on the network level, but the terraform apply didn't show any ~~diff or~~ error, when it was expected.

It seems that downloading the chart on the diff evaluation stage is not a good idea and we need to find a way how to store everything needed in the terraform state.
/cc @rporres

but the terraform apply didn't show any diff or error, when it was expected

@legal90 do you mean that even running the apply step, with helm_release resource trying to install chart from an unavailable repo, did not fail? I would expect the TF to fail there.

@stepanstipl My case was pretty dummy - the repository URL was unavailable and the HTTP connection to it was timing with a very long delay (~3-4 minutes), which just caused terraform apply to run very long without any error thrown.
Though, there were no diff expected in my case. You're right - if there would be a diff, then terraforn eventually throws an error because it downloads the chart once again in resourceReleaseUpdate.

ghost added the size/M label Nov 26, 2018

stepanstipl force-pushed the fix-ignore-failed-up branch from c9048d6 to c74d30f Compare December 5, 2018 19:13

stepanstipl changed the title ~~Fix when Helm provider ignores FAILED release state~~ [WIP] Fix when Helm provider ignores FAILED release state Dec 14, 2018

stepanstipl added 4 commits December 14, 2018 14:25

Refactor testAccHelmReleaseConfigValues - add chart name param

63bdeb6

Add test for FAILED release

29788c8

Fix TestAccResourceRelease_updateAfterFail test

e12727b

stepanstipl force-pushed the fix-ignore-failed-up branch from 5b0400a to e12727b Compare December 14, 2018 14:33

stepanstipl changed the title ~~[WIP] Fix when Helm provider ignores FAILED release state~~ Fix when Helm provider ignores FAILED release state Dec 14, 2018

stepanstipl force-pushed the fix-ignore-failed-up branch from e2a4224 to c959468 Compare December 14, 2018 16:21

meyskens self-requested a review December 17, 2018 15:45

meyskens merged commit 1524e6f into hashicorp:master Jan 11, 2019

stepanstipl mentioned this pull request Jan 11, 2019

Update TF Helm provider to 0.7.0 gpii-ops/exekube#34

Merged

legal90 mentioned this pull request Feb 4, 2019

Deleted releases #54

Closed

legal90 reviewed Feb 10, 2019

View reviewed changes

stepanstipl mentioned this pull request Feb 10, 2019

Helm provider ignores FAILED state of release #159

Closed

alexsomesan added this to the v0.8.0 milestone Feb 11, 2019

legal90 mentioned this pull request Feb 11, 2019

helm_release: Suggestion to split chart attribute to local and remote #212

Closed

hashicorp locked and limited conversation to collaborators Apr 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix when Helm provider ignores FAILED release state #161

Fix when Helm provider ignores FAILED release state #161

stepanstipl commented Nov 26, 2018 •

edited

ianks commented Nov 30, 2018

legal90 commented Nov 30, 2018 •

edited

stepanstipl commented Dec 2, 2018

legal90 commented Dec 3, 2018

stepanstipl commented Dec 5, 2018

meyskens commented Dec 13, 2018

bluemalkin commented Dec 13, 2018

stepanstipl commented Dec 14, 2018

stepanstipl commented Jan 10, 2019

meyskens commented Jan 11, 2019

legal90 Feb 10, 2019

stepanstipl Feb 10, 2019 •

edited

legal90 Feb 10, 2019 •

edited

stepanstipl Feb 11, 2019 •

edited

legal90 Feb 11, 2019

Fix when Helm provider ignores FAILED release state #161

Fix when Helm provider ignores FAILED release state #161

Conversation

stepanstipl commented Nov 26, 2018 • edited

ianks commented Nov 30, 2018

legal90 commented Nov 30, 2018 • edited

stepanstipl commented Dec 2, 2018

legal90 commented Dec 3, 2018

stepanstipl commented Dec 5, 2018

meyskens commented Dec 13, 2018

bluemalkin commented Dec 13, 2018

stepanstipl commented Dec 14, 2018

stepanstipl commented Jan 10, 2019

meyskens commented Jan 11, 2019

legal90 Feb 10, 2019

Choose a reason for hiding this comment

stepanstipl Feb 10, 2019 • edited

Choose a reason for hiding this comment

legal90 Feb 10, 2019 • edited

Choose a reason for hiding this comment

stepanstipl Feb 11, 2019 • edited

Choose a reason for hiding this comment

legal90 Feb 11, 2019

Choose a reason for hiding this comment

stepanstipl commented Nov 26, 2018 •

edited

legal90 commented Nov 30, 2018 •

edited

stepanstipl Feb 10, 2019 •

edited

legal90 Feb 10, 2019 •

edited

stepanstipl Feb 11, 2019 •

edited