Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart policy kills konlet-startup container #79

Closed
jakevc opened this issue Nov 17, 2021 · 8 comments
Closed

Restart policy kills konlet-startup container #79

jakevc opened this issue Nov 17, 2021 · 8 comments
Labels
bug Something isn't working good first issue Good for newcomers P2 high priority issues triaged Scoped and ready for work

Comments

@jakevc
Copy link

jakevc commented Nov 17, 2021

TL;DR

When supplying the restart policy "No" in the terraform configuration, the konlet-startup container fails.

Expected behavior

I expect the konlet-startup container to run correctly.

Observed behavior

The konlet startup container fails with:

Nov 17 18:49:15 gcs-inventory konlet-startup[3383]: 2021/11/17 18:49:15 Error: Failed to start container: Invalid container declaration: Unsupported container restart policy 'No'

Terraform Configuration

# cloud runner service for cloud storage inventory
terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 3.80"
    }
  }
}

provider "google" {
  project = var.project.name
  zone    = var.project.zone
}

resource "google_compute_firewall" "http-traffic" {
  name    = "allow-http"
  network = "default"

  allow {
    protocol = "tcp"
    ports    = ["80"]
  }

  target_tags = ["http-traffic"]
}

resource "google_compute_firewall" "http-ssh" {
  name    = "allow-ssh"
  network = "default"

  allow {
    protocol = "tcp"
    ports    = ["22"]
  }

  target_tags = ["ssh-traffic"]
}

data "terraform_remote_state" "setup" {
  backend = "local"

  config = {
    path = "setup/terraform.tfstate"
  }
}

module "gce-container" {
  source = "github.com/terraform-google-modules/terraform-google-container-vm"

  cos_image_name = var.instance.image

  container = {
    image = data.terraform_remote_state.setup.outputs.image

    env = [
      {
        name  = "BUCKET"
        value = var.bucket
      },
      {
        name  = "GCLOUD_PROJECT"
        value = var.project.name
      },
      {
        name  = "BQ_DATASET"
        value = var.bqdataset
      },
      {
        name  = "BQ_TABLE"
        value = var.bqtable
      }
    ]

    volumeMounts = [
      {
        mountPath = "/cache"
        name      = "tempfs-0"
        readOnly  = false
      },
    ]
  }

  volumes = [
    {
      name = "tempfs-0"

      emptyDir = {
        medium = "Memory"
      }
    },
  ]

  restart_policy = "no"
}

# single vm
resource "google_compute_instance" "cos-vm" {
  name         = var.service_name
  machine_type = var.instance.type
  project      = var.project.name
  zone         = var.project.zone
  tags         = ["ssh-traffic", "http-traffic"]

  boot_disk {
    initialize_params {
      image = module.gce-container.source_image
      size  = var.instance.image_size
      type  = var.instance.image_type
    }
  }

  network_interface {
    network = "default"
    access_config {}
  }

  metadata = {
    gce-container-declaration = module.gce-container.metadata_value
    google-logging-enabled    = "true"
    google-monitoring-enabled = "true"
  }

  labels = {
    container-vm = module.gce-container.vm_container_label
  }

  service_account {
    email  = data.terraform_remote_state.setup.outputs.serivce_account_email
    scopes = ["cloud-platform"]
  }
}

Terraform Version

terraform version
Terraform v1.0.2
on darwin_amd64
+ provider registry.terraform.io/hashicorp/google v3.90.1
+ provider registry.terraform.io/hashicorp/google-beta v4.1.0
+ provider registry.terraform.io/hashicorp/template v2.2.0

Additional information

My container does run when using the restart policies supported in the documentation, "never", "on-failure", or "always", but I have not been able to verify that their behavior is as expected.

@jakevc jakevc added the bug Something isn't working label Nov 17, 2021
@morgante
Copy link
Contributor

Thanks for the report! @stenalpjolly Can you take a look? I'm not sure where you got the No value for #77 from.

@bharathkkb
Copy link
Member

@morgante I believe it came from

invalid_restart_policy = var.restart_policy != "OnFailure" && var.restart_policy != "UnlessStopped" && var.restart_policy != "Always" && var.restart_policy != "No" ? 1 : 0

invalid_restart_policy doesn't seemed to be used anywhere at that commit so it could be that the supported values changed.

@morgante
Copy link
Contributor

Yea, I'm guessing the supported values changed. Or, rather, No should actually be Never.

@bharathkkb bharathkkb added good first issue Good for newcomers P2 high priority issues triaged Scoped and ready for work labels Nov 18, 2021
@jakevc
Copy link
Author

jakevc commented Nov 18, 2021

It's worth noting that even when supplying the values "never" or "on-failure", when I look at the vm metadata it still shows up as "Always".

@jakevc
Copy link
Author

jakevc commented Nov 19, 2021

Here is some restart policy logic in the konlet service that contains "no"...

https://github.com/GoogleCloudPlatform/konlet/blob/9cb9106daf07123c2641159cb8bcc9d6f4960ec2/gce-containers-startup/runtime/runtime.go#L283

@abhisek00
Copy link
Contributor

PR #87 for this fix

@apeabody
Copy link
Contributor

#87 merged - Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers P2 high priority issues triaged Scoped and ready for work
Projects
None yet
Development

No branches or pull requests

5 participants