-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The controller should not perform drift detection if it's triggered by a planning request #890
Comments
Hi @tao-zhang-shell btw, we have a new planning system called the Branch Planner which has been released recently. |
Thank you for your response @chanwit ! I downloaded Actually, after the Custom Resource
The plan is good, just want to know how to apply this update. As I said, there is no new plan ID for this update. |
Hi @chanwit, I did more tests and found some interesting behaviours. Two scenarios:
Here is my guess. The new plan ID is re-generated only when the referenced TF module has some change. This of course makes sense. But if the values of input variables change, the new plan ID should be re-generated as well. Then it might not be good to use the TF module commit hash to form the plan ID. |
In the current design, a plan ID is computed from a Git commit, yes. |
Hi @chanwit, I can apply the example at https://github.com/tf-controller/branch-planner-demo successfully when I manually approve the plan with the plan ID. But when I change the value of I also tried Branch Planner today. Apparently, it only watches the changes (PR in this case) of the TF module git repo. It doesn't watch the input variable value change either. Do you have any plans to fix this? Thanks!
|
A workaround would be putting Terraform CR above in a Git repo too. |
Hi @chanwit, I tried the workaround moving the Terraform CR to the same git repo as the Terraform module. You can find the forked git repo at https://github.com/tao-zhang-shell/branch-planner-demo. However, it didn't work either when I pushed a new commit to change the input variable value. You can find the Terraform CR status after the commit is pushed. I also tried to patch the CR with the new commit SHA
|
It worked without problem in my setup. I'll write down a quick tutorial to use the manual approval mode based on the above setup. |
@chanwit I really appreciate your help, but still no luck for me. The following are the TF controller logs, no plan ID like By the way, I also used the v0.16.0-rc.2 version.
|
Found the problem.
|
Do you mean the TF controller doesn't support TF resource Just like the message shows, the controller actually made the correct plan. |
We have the concept of "drift detection" implemented. When a drift on a live system occurs, it means that a resource is changed by something else, and not by what we described in our desired states. The use of your random_id demonstrated the same behavior as a live system having drifts. |
There is no any other resource than the random_id in my test TF project. As you can understand, nothing can change this random_id from live system. I know what |
Also one suggestion, can you please add one resource to your test project and try it again? I see currently in your test project, there is no TF resource created. Only one output is defined. That's why I added this random_id resource. Maye for any resource update, TF controller mistakenly regards it as a drift? |
Yep, I confirmed that it's definitely a bug in the manual approval mode. |
I narrowed down the problem and will tackle this issue as a bug fix. |
A fix can be done by adding a criterion in the above code to:
|
Thank you very much for the confirmation @chanwit, I will be looking forward to the bug fix! |
@chanwit @LappleApple Do you have any estimation of when this bug will be fixed? Thanks! |
I disabled 51 and 52 tests as they are failing 100% of the time, it's not flaky, but we don't know what causes the issue. With manual approval, it stuck in a loop between "drift detected" and "in progress". This is a known issue [^1]. Closes #905 Related to #890 [^1]: #890 Signed-off-by: Balazs Nadasdi <balazs@weave.works>
Related tests were disabled because they were flaky, now I tried to fix and enable them. Two related tests are still disabled1, please enable them as part of the fix for this issue. Footnotes |
As it was flaky and not failing all the time when it was disabled, we can assume it was introduced after we disabled those tests, I'll try to |
|
After a bit of poking, if I remove all changed from the commit, except the If I keep everything from the commit, but move the I assume it some weird issue how the whole codebase tosses around the |
After a bit of poking, if I remove all changed from the commit, except the `patcher := ...` and the `patcher.Patch` call (so the `finalizeStatus` does literally nothing, but patches the "expected nothing"), it still causes an issue. If I keep everything from the commit, but move the `patcher :=` line into the `defer` function, tests are happy. I assume it some weird issue how the whole codebase tosses around the `&terraform` reference and made Patch calls all around the codebase. Fixes #890 References: * #890 Signed-off-by: Balazs Nadasdi <balazs@weave.works>
After a bit of poking, if I remove all changed from the commit, except the `patcher := ...` and the `patcher.Patch` call (so the `finalizeStatus` does literally nothing, but patches the "expected nothing"), it still causes an issue. If I keep everything from the commit, but move the `patcher :=` line into the `defer` function, tests are happy. We already have status patching everywhere (without this commit everything worked). The Patcher in this commit should just patch the diff calculated in finalizeStatus. I assume it a weird issue how the whole codebase tosses around the `&terraform` reference and makes Patch calls somewhere else in the State Machine. Fixes #890 References: * #890 Signed-off-by: Balazs Nadasdi <balazs@weave.works>
Hey! |
@chanwit after updating to approvePlan: ""
message: Plan no changes
reason: TerraformPlannedNoChanges
lastAppliedByDriftDetectionAt: "2023-09-20T18:14:33Z"
lastAppliedRevision: 0.38.0@sha1:862fc85490c01d3c0f394e64c3cb025c444bdada
lastAttemptedRevision: 0.38.0@sha1:862fc85490c01d3c0f394e64c3cb025c444bdada
lastDriftDetectedAt: "2023-09-26T15:30:50Z"
lastHandledReconcileAt: "2023-09-21T16:53:32.303215+03:00"
lastPlanAt: "2023-09-21T13:54:08Z"
lastPlannedRevision: 0.38.0@sha1:862fc85490c01d3c0f394e64c3cb025c444bdada
plan:
lastApplied: plan-0.38.0-862fc85490 UPD (2 Oct): I observe the same behavior for sourceRef kind |
@gberenice I haven't tested the new RC yet. |
Reopened in light of @gberenice's new comments. |
@chanwit @luizbafilho @yiannistri could you please provide update on this on-going issue for avoiding drift detection when using manual mode? Does this PR - #1076 solve the issue? Could you please review it and merge to main? |
User story
As a TF-Controller user, I want to be able to run planning requests in manual approval mode without the controller triggering drift detection, so that my process does not get stuck in a loop or provide misleading feedback.
ACs
Manual approval is enabled, and the initial plan is approved and applied successfully.
Now the terraform code changes a little bit, it requires a small update. But I cannot find a new plan ID. Then how to approve the update after the initial plan is applied? I don't find this in the documentation either.
The text was updated successfully, but these errors were encountered: