New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix when Helm provider ignores FAILED release state #161
Conversation
We desperately need this. this bug is causing our state to become unrecoverable in some cases. |
@stepanstipl Thank you! That's really needed. But I just looked on the trick with a default value and verification. Maybe we can make the "status" attribute |
@legal90 thanks, that sounds good. I'll have a closer look and try to test if it works as expected in our case, but I like it! I originally tried to keep the status nested under the computed One other thing I noticed in your code - |
I was just keeping the same approach as in the rest of the code here. I believe this approach could be changed in a separated Pull-Request if needed. |
I have refactored code as per @legal90's suggestion (thanks, it's better this way) - the |
c9048d6
to
c74d30f
Compare
@stepanstipl can you rebase on master? We fixed a lot of things in the acceptance tests. |
Which scheduled release can we expect this to be in please? Keep up the good work! |
This moves status field out of metadata non-computer field and uses `CustomizeDiff` in order to track a state of Helm release and enforce that the state is `DEPLOYED`. This fixes behavior when Terraform would ignore previously created release that is in a `FAILED` state and failed to try to bring it to the desired state (assuming this will always be `DEPLOYED`) See hashicorp#159 for more details (the issue has detailed steps about how to reproduce the issue).
5b0400a
to
e12727b
Compare
@meyskens done, nice job with tests! I've rebased on master and cleaned up my commits, also fixed a bug introduced by the earlier refactoring.
Finally Now the tests pass both against minikube and real GKE clusters. |
When running against GKE cluster, the test with `telegraf` chart would fail on chart "telegraf" matching 1.0.2 not found in stable-repo index. I believe it has to do with combination of new helm version (0.12) and the fact that all telegraf charts are marked as deprecated.
e2a4224
to
c959468
Compare
Hi, @meyskens is there anything else that's needed? Sorry for rushing, I'm just wondering if it looks like we can get this in :) |
@stepanstipl sorry for the delay, the code looks great! Just going to run the tests again as it seems Travis timed out but still flagged it green. |
if err == nil { | ||
return d.SetNew("version", c.Metadata.Version) | ||
if err != nil { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stepanstipl Could you please remind - why did you make the call of getChart
tolerant to errors?
As I see, if the helm client (library) fails to fetch the chart, that should be a valid point for throwing an error, like it happens with other places when getChart
is called (see resourceReleaseCreate
, resourceReleaseUpdate
functions). Otherwise we can miss the diff and the update won't be triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, it took me a while to remember/re-figure out why I did that, long time :).
So the CustomizeDiff
, and therefore resourceDiff
, is executed before the chart is available in some cases. In that scenario getChart
would fail and therefore fail the whole plan
/apply
run, while in fact the run would complete just fine.
Good example of this is the TestAccResourceRelease_repository
test case, when the chart is only available after the repo is added via the helm_repository
resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stepanstipl Thank you for the explanation! Now I understand. Your test case failed because CustomizeDiff
functions of all resources are called before these resources are created/updated.
I asked just because I run into the opposite situation - my custom repo was unavailable on the network level, but the terraform apply didn't show any diff or error, when it was expected.
It seems that downloading the chart on the diff evaluation stage is not a good idea and we need to find a way how to store everything needed in the terraform state.
/cc @rporres
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the terraform apply didn't show any diff or error, when it was expected
@legal90 do you mean that even running the apply
step, with helm_release
resource trying to install chart from an unavailable repo, did not fail? I would expect the TF to fail there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stepanstipl My case was pretty dummy - the repository URL was unavailable and the HTTP connection to it was timing with a very long delay (~3-4 minutes), which just caused terraform apply to run very long without any error thrown.
Though, there were no diff expected in my case. You're right - if there would be a diff, then terraforn eventually throws an error because it downloads the chart once again in resourceReleaseUpdate
.
This moves status field out of metadata
non-computed fieldand usesCustomizeDiff
in order to track a state of Helm release and enforce that the state isDEPLOYED
.This fixes behavior when Terraform would ignore previously created release that is in a
FAILED
state and failed to try to bring it to the desired state (assuming this will always beDEPLOYED
)See #159 for more detail (the issue has detailed steps about how to reproduce the issue).
This PR adds a test for this issue, to run the integration test(s):
kubectl
configured (verify by runningkubectl get pods
)helm list
)TF_ACC
totrue
to enable acceptance tests (export TF_ACC=true
)$GOPATH/src/github.com/terraform-providers/terraform-provider-helm
)but expect some of them to fail (I believe it's not related to this change, these were failing for me both with and without this code change),To run just the test for this issue:
Update: After rebasing on latest master and fixing 2 remaining tests (some fail only when running agains GKE and not when running against minikube) all the tests are expected to pass, both against minikube & GKE.