Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The controller should not perform drift detection if it's triggered by a planning request #890

Open
5 tasks
tao-zhang-shell opened this issue Aug 22, 2023 · 29 comments · Fixed by #943
Open
5 tasks
Assignees
Labels

Comments

@tao-zhang-shell
Copy link

tao-zhang-shell commented Aug 22, 2023

User story
As a TF-Controller user, I want to be able to run planning requests in manual approval mode without the controller triggering drift detection, so that my process does not get stuck in a loop or provide misleading feedback.

ACs

  • Disables drift detection during manual approval process
  • Resources are applied
  • The TF-Controller process in the manual mode does not get stuck in a loop
  • Testing is in place to ensure reliability
  • User documentation is updated

Manual approval is enabled, and the initial plan is approved and applied successfully.

Now the terraform code changes a little bit, it requires a small update. But I cannot find a new plan ID. Then how to approve the update after the initial plan is applied? I don't find this in the documentation either.

@chanwit
Copy link
Collaborator

chanwit commented Aug 22, 2023

Hi @tao-zhang-shell
Have you tried tfctl replan to help triggering the re-plan process?
Then, you would see the message in the object telling you how to approve the plan, similar to the previous apply process.

btw, we have a new planning system called the Branch Planner which has been released recently.
Please feel free to give it a try.

@tao-zhang-shell
Copy link
Author

tao-zhang-shell commented Aug 23, 2023

Thank you for your response @chanwit !

I downloaded tfctl, unfortunately, MacOS blocks it because it is not from an identified developer. Even if I allow it anyway, it's still not runnable.

Actually, after the Custom Resource Terraform object is updated (only input vars are changed, no change to the Terraform module), the correct TF update message is shown up in the object. For example:

    - lastTransitionTime: '2023-08-22T15:18:36Z'
      message: >

        Terraform used the selected providers to generate the following
        execution

        plan. Resource actions are indicated with the following symbols:
          ~ update in-place

        Terraform will perform the following actions:

          # module.kafka_topic["test"].confluent_kafka_topic.kafka_topic will be updated in-place
          ~ resource "confluent_kafka_topic" "kafka_topic" {
              ~ config           = {
                  + "cleanup.policy" = "compact"
                  + "segment.ms"     = "604900000"
                }
                id               = "Kafka-id"
                # (2 unchanged attributes hidden)

                # (1 unchanged block hidden)
            }

        Plan: 0 to add, 1 to change, 0 to destroy. 

The plan is good, just want to know how to apply this update. As I said, there is no new plan ID for this update.

@tao-zhang-shell
Copy link
Author

Hi @chanwit, I did more tests and found some interesting behaviours. Two scenarios:

  • No change to the TF module which is referenced at sourceRef field in Terraform object, only change to the values of input variables

    As I commented above, no new plan ID is created. But the message shows the correct update. In the meantime, the plan output is not updated in Kubernetes secret or config map (if human readable plan is enabled)

  • A dummy new commit (no actual change) to TF module, and also change the input variables values in Terraform object

    Now I see the new plan ID in the message of Terraform object, just like the first initial plan. A new plan output is generated as a Kubernetes secret or config map.

    If I patch the Terraform object with the new plan ID, everything works as expected.

      conditions:
        - lastTransitionTime: '2023-08-23T09:20:05Z'
          message: >-
            Plan generated: set approvePlan:
            "plan-main@sha1:7b59ac3655fd9376ffbf7445b64e2a76cd9a1403" to approve
            this plan.
          reason: TerraformPlannedWithChanges
          status: Unknown
          type: Ready
    

    Also, you can see the plan ID changed, which is based on the TF module commit SHA.

      plan:
        isDriftDetectionPlan: true
        lastApplied: plan-main@sha1:025855416738fee530ea8a733e48c686fff0e525
        pending: plan-main@sha1:7b59ac3655fd9376ffbf7445b64e2a76cd9a1403
    

Here is my guess. The new plan ID is re-generated only when the referenced TF module has some change. This of course makes sense. But if the values of input variables change, the new plan ID should be re-generated as well. Then it might not be good to use the TF module commit hash to form the plan ID.

@chanwit
Copy link
Collaborator

chanwit commented Aug 23, 2023

But if the values of input variables change, the new plan ID should be re-generated as well. Then it might not be good to use the TF module commit hash to form the plan ID.

In the current design, a plan ID is computed from a Git commit, yes.
Maybe we need to come up with other ways to compute plan ids. For example, adding object generation as suffixes, etc.

@chanwit chanwit changed the title How to approve an updated TF plan Plan Id computation from commit hash causes a bug when the Terraform CR is changed (was: How to approve an updated TF plan) Aug 23, 2023
@tao-zhang-shell
Copy link
Author

Hi @chanwit,

I can apply the example at https://github.com/tf-controller/branch-planner-demo successfully when I manually approve the plan with the plan ID. But when I change the value of subject from default World to new-world and redeploy this Terraform object(via kubectl apply -f ), there is no new plan ID generated. Then I have no way to apply this change.

I also tried Branch Planner today. Apparently, it only watches the changes (PR in this case) of the TF module git repo. It doesn't watch the input variable value change either.

Do you have any plans to fix this? Thanks!

apiVersion: infra.contrib.fluxcd.io/v1alpha2
kind: Terraform
metadata:
  name: branch-planner-demo
  namespace: flux-system
spec:

  vars:
    - name: subject
      value: new-world

@chanwit
Copy link
Collaborator

chanwit commented Aug 23, 2023

A workaround would be putting Terraform CR above in a Git repo too.
It's a common pattern to bootstrap Flux and having both Terraform CR and Terraform files in the same Git repo (but in different directories).

@tao-zhang-shell
Copy link
Author

tao-zhang-shell commented Aug 24, 2023

Hi @chanwit, I tried the workaround moving the Terraform CR to the same git repo as the Terraform module. You can find the forked git repo at https://github.com/tao-zhang-shell/branch-planner-demo.

However, it didn't work either when I pushed a new commit to change the input variable value. You can find the Terraform CR status after the commit is pushed.

I also tried to patch the CR with the new commit SHA approvePlan: plan-main-d55857071e, but it didn't work. So the workaround has the same behaviour as applying the change via kubectl apply -f.

status:
  availableOutputs:
    - hello_world
  conditions:
    - lastTransitionTime: '2023-08-24T12:40:12Z'
      message: >

        Terraform used the selected providers to generate the following
        execution

        plan. Resource actions are indicated with the following symbols:

        -/+ destroy and then create replacement


        Terraform will perform the following actions:

          # random_id.id must be replaced
        -/+ resource "random_id" "id" {
              ~ b64_std     = "lvErSw==" -> (known after apply)
              ~ b64_url     = "lvErSw" -> (known after apply)
              ~ dec         = "2532387659" -> (known after apply)
              ~ hex         = "96f12b4b" -> (known after apply)
              ~ id          = "lvErSw" -> (known after apply)
              ~ keepers     = { # forces replacement
                  ~ "trigger" = "New-world" -> "change-world"
                }
                # (1 unchanged attribute hidden)
            }

        Plan: 1 to add, 0 to change, 1 to destroy.


        Changes to Outputs:
          ~ hello_world = "Hello TF Controller v0.16.0-rc.2, New-world!" -> "Hello TF Controller v0.16.0-rc.2, change-world!"
      reason: DriftDetected
      status: 'False'
      type: Ready
    - lastTransitionTime: '2023-08-24T12:00:54Z'
      message: Plan no changes
      reason: TerraformPlannedNoChanges
      status: 'False'
      type: Plan
    - lastTransitionTime: '2023-08-24T11:58:27Z'
      message: Applied successfully
      reason: TerraformAppliedSucceed
      status: 'True'
      type: Apply
    - lastTransitionTime: '2023-08-24T11:58:27Z'
      message: Outputs available
      reason: TerraformOutputsAvailable
      status: 'True'
      type: Output
  lastAppliedRevision: main@sha1:5cbf6f02c736e008b03ceef03755de1185ee530e
  lastAttemptedRevision: main@sha1:d55857071e4a679c45a5bfa58a3de19a3ee516f0
  lastDriftDetectedAt: '2023-08-24T12:40:12Z'
  lastPlanAt: '2023-08-24T12:00:54Z'
  lastPlannedRevision: main@sha1:d55857071e4a679c45a5bfa58a3de19a3ee516f0
  observedGeneration: 4
  plan:
    lastApplied: plan-main-5cbf6f02c7

@LappleApple LappleApple added terraform kind/docs Improvements or additions to documentation labels Aug 24, 2023
@chanwit
Copy link
Collaborator

chanwit commented Aug 24, 2023

It worked without problem in my setup.
Here's the repo I tested: https://github.com/tf-controller/manual-approve-test/commits/main, where you could examine the commit logs.
TF-Controller v0.16.0-rc.2 was used in the test.

I'll write down a quick tutorial to use the manual approval mode based on the above setup.

@tao-zhang-shell
Copy link
Author

tao-zhang-shell commented Aug 24, 2023

@chanwit I really appreciate your help, but still no luck for me. The following are the TF controller logs, no plan ID like plan-main-xxxx is found. I have seen it in the logs if it's the initial plan/apply.

By the way, I also used the v0.16.0-rc.2 version.

{"level":"info","ts":"2023-08-24T15:05:22.782Z","msg":">> Started Generation: 3","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:22.782Z","msg":"getting source","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:22.782Z","msg":"before lookup runner: checking ready condition","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","ready":"&Condition{Type:Ready,Status:False,ObservedGeneration:0,LastTransitionTime:2023-08-24 15:05:17 +0000 UTC,Reason:DriftDetected,Message:\nTerraform used the selected providers to generate the following execution\nplan. Resource actions are indicated with the following symbols:\n-/+ destroy and then create replacement\n\nTerraform will perform the following actions:\n\n  # random_id.id must be replaced\n-/+ resource \"random_id\" \"id\" {\n      ~ b64_std     = \"oG0rKA==\" -> (known after apply)\n      ~ b64_url     = \"oG0rKA\" -> (known after apply)\n      ~ dec         = \"2691509032\" -> (known after apply)\n      ~ hex         = \"a06d2b28\" -> (known after apply)\n      ~ id          = \"oG0rKA\" -> (known after apply)\n      ~ keepers     = { # forces replacement\n          ~ \"trigger\" = \"New-world\" -> \"Another-world\"\n        }\n        # (1 unchanged attribute hidden)\n    }\n\nPlan: 1 to add, 0 to change, 1 to destroy.\n\nChanges to Outputs:\n  ~ hello_world = \"Hello TF Controller v0.16.0-rc.2, New-world!\" -> \"Hello TF Controller v0.16.0-rc.2, Another-world!\"\n,}"}
{"level":"info","ts":"2023-08-24T15:05:22.782Z","msg":"before lookup runner: updating status","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","ready":"&Condition{Type:Ready,Status:False,ObservedGeneration:0,LastTransitionTime:2023-08-24 15:05:17 +0000 UTC,Reason:DriftDetected,Message:\nTerraform used the selected providers to generate the following execution\nplan. Resource actions are indicated with the following symbols:\n-/+ destroy and then create replacement\n\nTerraform will perform the following actions:\n\n  # random_id.id must be replaced\n-/+ resource \"random_id\" \"id\" {\n      ~ b64_std     = \"oG0rKA==\" -> (known after apply)\n      ~ b64_url     = \"oG0rKA\" -> (known after apply)\n      ~ dec         = \"2691509032\" -> (known after apply)\n      ~ hex         = \"a06d2b28\" -> (known after apply)\n      ~ id          = \"oG0rKA\" -> (known after apply)\n      ~ keepers     = { # forces replacement\n          ~ \"trigger\" = \"New-world\" -> \"Another-world\"\n        }\n        # (1 unchanged attribute hidden)\n    }\n\nPlan: 1 to add, 0 to change, 1 to destroy.\n\nChanges to Outputs:\n  ~ hello_world = \"Hello TF Controller v0.16.0-rc.2, New-world!\" -> \"Hello TF Controller v0.16.0-rc.2, Another-world!\"\n,}"}
{"level":"info","ts":"2023-08-24T15:05:22.788Z","msg":"before lookup runner: updated status","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","ready":"&Condition{Type:Ready,Status:Unknown,ObservedGeneration:0,LastTransitionTime:2023-08-24 15:05:22.782292929 +0000 UTC m=+3878.063992515,Reason:Progressing,Message:Reconciliation in progress,}"}
{"level":"info","ts":"2023-08-24T15:05:22.788Z","msg":"trigger namespace tls secret generation","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:22.788Z","logger":"cert-rotation","msg":"TLS already generated for ","namespace":"flux-system"}
{"level":"info","ts":"2023-08-24T15:05:22.788Z","msg":"show runner pod state: ","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","name":"branch-planner-demo","state":"not-found"}
{"level":"info","ts":"2023-08-24T15:05:37.839Z","msg":"runner is running","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.839Z","msg":"setting up terraform","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.853Z","msg":"write backend config: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.853Z","msg":"new terraform","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","workingDir":"/tmp/flux-system-branch-planner-demo"}
{"level":"info","ts":"2023-08-24T15:05:37.857Z","msg":"generate vars from tf: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.858Z","msg":"generated var files from spec","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.858Z","msg":"generate template: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:37.858Z","msg":"generated template","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.415Z","msg":"init reply: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.415Z","msg":"tfexec initialized terraform","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.415Z","msg":"workspace select reply: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.415Z","msg":"calling detectDrift ...","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.667Z","msg":"plan for drift: ok found drift: true","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"info","ts":"2023-08-24T15:05:39.742Z","msg":"show plan: \nTerraform used the selected providers to generate the following execution\nplan. Resource actions are indicated with the following symbols:\n-/+ destroy and then create replacement\n\nTerraform will perform the following actions:\n\n  # random_id.id must be replaced\n-/+ resource \"random_id\" \"id\" {\n      ~ b64_std     = \"oG0rKA==\" -> (known after apply)\n      ~ b64_url     = \"oG0rKA\" -> (known after apply)\n      ~ dec         = \"2691509032\" -> (known after apply)\n      ~ hex         = \"a06d2b28\" -> (known after apply)\n      ~ id          = \"oG0rKA\" -> (known after apply)\n      ~ keepers     = { # forces replacement\n          ~ \"trigger\" = \"New-world\" -> \"Another-world\"\n        }\n        # (1 unchanged attribute hidden)\n    }\n\nPlan: 1 to add, 0 to change, 1 to destroy.\n\nChanges to Outputs:\n  ~ hello_world = \"Hello TF Controller v0.16.0-rc.2, New-world!\" -> \"Hello TF Controller v0.16.0-rc.2, Another-world!\"\n","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"error","ts":"2023-08-24T15:05:39.744Z","msg":"will not force / auto apply detected drift","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","error":"DriftDetected"}
{"level":"info","ts":"2023-08-24T15:05:39.748Z","msg":"clean up dir: ok","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z"}
{"level":"error","ts":"2023-08-24T15:05:39.754Z","msg":"Drift detected after 16.971997007s, next try in 15s","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"d5f0bb7f-3319-486a-95fb-57faa1653cdb","reconciliation-loop-id":"ed3e364f-c9c5-44f2-a5ed-36625dc2e70c","start-time":"2023-08-24T15:05:22.782Z","revision":"main@sha1:9c5447cfa09ee401b2e848127d51e43a3be79048","error":"DriftDetected"}
{"level":"info","ts":"2023-08-24T15:05:44.758Z","msg":">> Started Generation: 3","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"b49c534b-4b99-4124-9284-abdb02d1148f","reconciliation-loop-id":"9d84ca3d-fc50-4e56-b70a-be94b8c67c8f","start-time":"2023-08-24T15:05:44.758Z"}
{"level":"info","ts":"2023-08-24T15:05:44.758Z","msg":"getting source","controller":"terraform","controllerGroup":"infra.contrib.fluxcd.io","controllerKind":"Terraform","Terraform":{"name":"branch-planner-demo","namespace":"flux-system"},"namespace":"flux-system","name":"branch-planner-demo","reconcileID":"b49c534b-4b99-4124-9284-abdb02d1148f","reconciliation-loop-id":"9d84ca3d-fc50-4e56-b70a-be94b8c67c8f","start-time":"2023-08-24T15:05:44.758Z"}

@chanwit
Copy link
Collaborator

chanwit commented Aug 24, 2023

Found the problem.
You used the random ID generator, which caused a drift (reason: DriftDetected). As a result, the controller never reached the planning step of the second round.

    - lastTransitionTime: '2023-08-24T12:40:12Z'
      message: >

        Terraform used the selected providers to generate the following
        execution

        plan. Resource actions are indicated with the following symbols:

        -/+ destroy and then create replacement


        Terraform will perform the following actions:

          # random_id.id must be replaced
        -/+ resource "random_id" "id" {
              ~ b64_std     = "lvErSw==" -> (known after apply)
              ~ b64_url     = "lvErSw" -> (known after apply)
              ~ dec         = "2532387659" -> (known after apply)
              ~ hex         = "96f12b4b" -> (known after apply)
              ~ id          = "lvErSw" -> (known after apply)
              ~ keepers     = { # forces replacement
                  ~ "trigger" = "New-world" -> "change-world"
                }
                # (1 unchanged attribute hidden)
            }

        Plan: 1 to add, 0 to change, 1 to destroy.


        Changes to Outputs:
          ~ hello_world = "Hello TF Controller v0.16.0-rc.2, New-world!" -> "Hello TF Controller v0.16.0-rc.2, change-world!"
      reason: DriftDetected
      status: 'False'
      type: Ready

@tao-zhang-shell
Copy link
Author

tao-zhang-shell commented Aug 24, 2023

Do you mean the TF controller doesn't support TF resource random_id? Or something else? I don't quite get your point. @chanwit

Just like the message shows, the controller actually made the correct plan.

@chanwit
Copy link
Collaborator

chanwit commented Aug 24, 2023

We have the concept of "drift detection" implemented.

When a drift on a live system occurs, it means that a resource is changed by something else, and not by what we described in our desired states.

The use of your random_id demonstrated the same behavior as a live system having drifts.

@tao-zhang-shell
Copy link
Author

There is no any other resource than the random_id in my test TF project. As you can understand, nothing can change this random_id from live system.

I know what drift means in Terraform. But there is nothing that can change this random_id, which drifts from the backend state. Actually, before I changed and pushed the variable value to the git repo, the Terraform CR showed no drift at all.

@tao-zhang-shell
Copy link
Author

Also one suggestion, can you please add one resource to your test project and try it again? I see currently in your test project, there is no TF resource created. Only one output is defined. That's why I added this random_id resource.

Maye for any resource update, TF controller mistakenly regards it as a drift?

@chanwit
Copy link
Collaborator

chanwit commented Aug 24, 2023

Yep, I confirmed that it's definitely a bug in the manual approval mode.
It's working fine for me with approvePlan: auto, but causing the same error you're facing with the manual approve.

@chanwit chanwit changed the title Plan Id computation from commit hash causes a bug when the Terraform CR is changed (was: How to approve an updated TF plan) The controller should not perform drift detection if it's triggered by a planning request Aug 25, 2023
@chanwit
Copy link
Collaborator

chanwit commented Aug 25, 2023

I narrowed down the problem and will tackle this issue as a bug fix.

@chanwit chanwit added the kind/bug Something isn't working label Aug 25, 2023
@chanwit
Copy link
Collaborator

chanwit commented Aug 25, 2023

https://github.com/weaveworks/tf-controller/blob/eb4c3adb49f9cd8b2e369b42d65dd516ec20d9fd/controllers/tf_controller_drift_detect.go#L15-L53

A fix can be done by adding a criterion in the above code to:

  1. detect if we are in the manual mode
  2. if we are in the manual mode and the intent is to plan + apply, do not perform drift detection.

@tao-zhang-shell
Copy link
Author

Thank you very much for the confirmation @chanwit, I will be looking forward to the bug fix!

@LappleApple LappleApple self-assigned this Aug 30, 2023
@tao-zhang-shell
Copy link
Author

@chanwit @LappleApple Do you have any estimation of when this bug will be fixed? Thanks!

yitsushi added a commit that referenced this issue Sep 11, 2023
I disabled 51 and 52 tests as they are failing 100% of the time, it's
not flaky, but we don't know what causes the issue. With manual
approval, it stuck in a loop between "drift detected" and "in progress".
This is a known issue [^1].

Closes #905

Related to #890

[^1]: #890

Signed-off-by: Balazs Nadasdi <balazs@weave.works>
@yitsushi
Copy link
Collaborator

Related tests were disabled because they were flaky, now I tried to fix and enable them. Two related tests are still disabled1, please enable them as part of the fix for this issue.

Footnotes

  1. https://github.com/weaveworks/tf-controller/pull/943

@yitsushi
Copy link
Collaborator

As it was flaky and not failing all the time when it was disabled, we can assume it was introduced after we disabled those tests, I'll try to git bisect and find the commit that introduced this bug.

@yitsushi
Copy link
Collaborator

yitsushi commented Sep 11, 2023

git bisect says this commit is the first bad: 374232a (but I don't see why would it cause this)

@yitsushi
Copy link
Collaborator

After a bit of poking, if I remove all changed from the commit, except the patcher := ... and the patcher.Patch call (so the finalizeStatus does literally nothing, but patches the "expected nothing"), it still causes an issue.

If I keep everything from the commit, but move the patcher := line into the defer function, tests are happy.

I assume it some weird issue how the whole codebase tosses around the &terraform reference and made Patch calls all around the codebase.

yitsushi added a commit that referenced this issue Sep 11, 2023
After a bit of poking, if I remove all changed from the commit, except
the `patcher := ...` and the `patcher.Patch` call (so the
`finalizeStatus` does literally nothing, but patches the "expected
nothing"), it still causes an issue.

If I keep everything from the commit, but move the `patcher :=` line
into the `defer` function, tests are happy.

I assume it some weird issue how the whole codebase tosses around the
`&terraform` reference and made Patch calls all around the codebase.

Fixes #890

References:
* #890

Signed-off-by: Balazs Nadasdi <balazs@weave.works>
yitsushi added a commit that referenced this issue Sep 12, 2023
After a bit of poking, if I remove all changed from the commit, except
the `patcher := ...` and the `patcher.Patch` call (so the
`finalizeStatus` does literally nothing, but patches the "expected
nothing"), it still causes an issue.

If I keep everything from the commit, but move the `patcher :=` line
into the `defer` function, tests are happy.

We already have status patching everywhere (without this commit
everything worked). The Patcher in this commit should just patch the
diff calculated in finalizeStatus.

I assume it a weird issue how the whole codebase tosses around the
`&terraform` reference and makes Patch calls somewhere else in the State
Machine.

Fixes #890

References:
* #890

Signed-off-by: Balazs Nadasdi <balazs@weave.works>
@gberenice
Copy link

gberenice commented Sep 25, 2023

Hey!
@tao-zhang-shell could you please confirm that the issue is fixed for you? I'm facing a similar problem on v0.16.0-rc.3 now.
UPD: my issue probably relates to another root cause, but still would be great to confirm that this one is fixed.

@gberenice
Copy link

gberenice commented Sep 26, 2023

@chanwit after updating to v0.16.0-rc.3, I still see the similar behavior that is described in the issue. I use GitRepo source for my Terraform object, and this is a tag for a community module (terraform-aws-ecr v0.38.0). When I change spec.vars for the corresponding Terraform object, there is no plan ID generated. Some related details from the object:

  approvePlan: ""
    message: Plan no changes
    reason: TerraformPlannedNoChanges
  lastAppliedByDriftDetectionAt: "2023-09-20T18:14:33Z"
  lastAppliedRevision: 0.38.0@sha1:862fc85490c01d3c0f394e64c3cb025c444bdada
  lastAttemptedRevision: 0.38.0@sha1:862fc85490c01d3c0f394e64c3cb025c444bdada
  lastDriftDetectedAt: "2023-09-26T15:30:50Z"
  lastHandledReconcileAt: "2023-09-21T16:53:32.303215+03:00"
  lastPlanAt: "2023-09-21T13:54:08Z"
  lastPlannedRevision: 0.38.0@sha1:862fc85490c01d3c0f394e64c3cb025c444bdada
  plan:
    lastApplied: plan-0.38.0-862fc85490

UPD (2 Oct): I observe the same behavior for sourceRef kind OCIRepository.

@chanwit
Copy link
Collaborator

chanwit commented Sep 27, 2023

@LappleApple @yitsushi

@tao-zhang-shell
Copy link
Author

@gberenice I haven't tested the new RC yet.

@LappleApple LappleApple reopened this Sep 27, 2023
@LappleApple
Copy link
Contributor

Reopened in light of @gberenice's new comments.

@RishuSinghS
Copy link

@chanwit @luizbafilho @yiannistri could you please provide update on this on-going issue for avoiding drift detection when using manual mode?

Does this PR - #1076 solve the issue? Could you please review it and merge to main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment