Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

beta update-variant module isn't idempotent #326

Closed
ideasculptor opened this issue Nov 15, 2019 · 4 comments · Fixed by #327
Closed

beta update-variant module isn't idempotent #326

ideasculptor opened this issue Nov 15, 2019 · 4 comments · Fixed by #327

Comments

@ideasculptor
Copy link
Contributor

When running the same terraform apply command twice in a row, the first succeeds and the second results in the following plan and eventual error. I could really use some help with this one, as it blocks making any changes to a cluster without first running terragrunt destroy and then re-applying, or at least targeting module.gke.google_container_node_pool.pools[0] with destroy and re-apply.

  # module.gke.google_container_node_pool.pools[0] will be updated in-place
  ~ resource "google_container_node_pool" "pools" {
        cluster             = "gke-dev"
        id                  = "us-central1-a/gke-dev/default-node-pool-6593"
        initial_node_count  = 1
        instance_group_urls = [
            "https://www.googleapis.com/compute/v1/projects/refarch-dev-svc-c006/zones/us-central1-a/instanceGroupManagers/gke-gke-dev-default-node-pool-6593-f23dc4de-grp",
        ]
        location            = "us-central1-a"
        max_pods_per_node   = 110
        name                = "default-node-pool-6593"
        node_count          = 2
      ~ node_locations      = [
          - "us-central1-a",
        ]
        project             = "refarch-dev-svc-c006"
        version             = "1.14.8-gke.12"
        zone                = "us-central1-a"

        autoscaling {
            max_node_count = 100
            min_node_count = 1
        }

        management {
            auto_repair  = true
            auto_upgrade = false
        }

        node_config {
            disk_size_gb      = 50
            disk_type         = "pd-standard"
            guest_accelerator = []
            image_type        = "COS"
            labels            = {
                "cluster_name" = "gke-dev"
                "node_pool"    = "default-node-pool"
            }
            local_ssd_count   = 0
            machine_type      = "n1-standard-2"
            metadata          = {
                "cluster_name"             = "gke-dev"
                "disable-legacy-endpoints" = "false"
                "node_pool"                = "default-node-pool"
            }
            oauth_scopes      = [
                "https://www.googleapis.com/auth/devstorage.read_only",
                "https://www.googleapis.com/auth/logging.write",
                "https://www.googleapis.com/auth/monitoring",
                "https://www.googleapis.com/auth/service.management.readonly",
                "https://www.googleapis.com/auth/servicecontrol",
                "https://www.googleapis.com/auth/trace.append",
            ]
            preemptible       = false
            service_account   = "tf-gke-gke-dev-05j3@refarch-dev-svc-c006.iam.gserviceaccount.com"
            tags              = [
                "gke-gke-dev",
                "gke-gke-dev-default-node-pool",
                "ssh",
                "https",
            ]

            shielded_instance_config {
                enable_integrity_monitoring = true
                enable_secure_boot          = false
            }

            workload_metadata_config {
                node_metadata = "SECURE"
            }
        }

        timeouts {
            create = "30m"
            delete = "30m"
            update = "30m"
        }
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.gke.google_container_node_pool.pools[0]: Modifying... [id=us-central1-a/gke-dev/default-node-pool-6593]

Error: googleapi: Error 400: At least one of ['node_version', 'image_type', 'updated_node_pool', 'locations', 'workload_metadata_config', 'upgrade_settings'] must be specified., badRequest

  on .terraform/modules/gke/modules/beta-private-cluster-update-variant/cluster.tf line 281, in resource "google_container_node_pool" "pools":
 281: resource "google_container_node_pool" "pools" {

@ideasculptor ideasculptor changed the title update-variant module isn't idempotent beta update-variant module isn't idempotent Nov 15, 2019
@ideasculptor
Copy link
Contributor Author

Looking at the plan, it seems likely that specifying node_locations instead of location (or in addition to it) would correct the problem. Or telling terraform to ignore that difference. I don't have time to experiment right now, unfortunately.

@ideasculptor
Copy link
Contributor Author

ideasculptor commented Nov 15, 2019

Once I dug into the code, it became clear that including the following in any node_pool declaration will fix the problem:

node_pools = [
    {
      ...
      node_locations     = "us-central1-a"
      ...
    },
  ]

However, it would certainly be much better if it was correctly handled whether an explicit node_locations is provided or not, since it is coded as though it is optional - but the default value seems to come up empty on creation, but has a value when pulling the value while comparing to current state. Seems like something of a provider issue, though it could be fixed with a kludge to compute the default's correct value when it comes back empty.

In the meantime, if you bump into this problem, an explicit node_locations value in the node_pool declaration will fix the problem for you.

@morgante
Copy link
Contributor

morgante commented Nov 15, 2019

Thanks, I think #327 might fix this. There's probably no reason for us to touch the node_locations property on an individual pool unless you explicitly want to.

@ideasculptor
Copy link
Contributor Author

I assumed that if I had a regional cluster, node_locations might have several zones in it, while location would just have the region in it, and I wasn't sure how that would work in the plan comparison. It seemed like an opportunity for the existing resource to appear out of sync with the new plan, and I didn't want to take the time to try it to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants