Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Replacing hack with track_latest task definition attribute #171

Conversation

ivan-sukhomlyn
Copy link
Contributor

Description

Replacing calculation of the latest ECS task definition revision using the track_latest task definition resource attribute.

Motivation and Context

Resolving #164

Breaking Changes

Yes, the minimal AWS provider version is >=5.37

How Has This Been Tested?

  • I have updated at least one of the examples/* to demonstrate and validate my change(s)
  • I have tested and validated these changes using one or more of the provided examples/* projects
  • I have executed pre-commit run -a on my pull request

@ivan-sukhomlyn
Copy link
Contributor Author

ivan-sukhomlyn commented Feb 20, 2024

@bryantbiggs @antonbabenko I'm marking the PR as a draft because this functionality doesn't work as expected.
Terraform doesn't ignore the default Docker image tag at the container definition. In that case, it tries to replace it and release a new version of an ECS service.

details:

  1. terraform apply with track_latest = true
  # module.ecs_service_test_api.aws_ecs_task_definition.this[0] will be updated in-place
  ~ resource "aws_ecs_task_definition" "this" {
        id                       = "test-api"

      ~ track_latest             = false -> true
        # (11 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }
  1. Manually deploy the service with a new revision of the ECS task definition with an updated Docker image tag to simulate CI/CD.
  2. Start terraform apply once again.
    A new revision triggers the refreshing of the container-definition state, which we want to eliminate in case of having a new version.
Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:

  # module.ecs_service_test_api.aws_ecs_task_definition.this[0] has changed
  ~ resource "aws_ecs_task_definition" "this" {
        id                       = "test-api"
      ~ revision                 = 484 -> 485

        # (11 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }


Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or
respond to these changes.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place
+/- create replacement and then destroy
 <= read (data resources)

Terraform will perform the following actions:

  # module.ecs_service_test_api.data.aws_ecs_task_definition.this[0] will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_ecs_task_definition" "this" {
      + arn                  = (known after apply)
      + arn_without_revision = (known after apply)
      + execution_role_arn   = (known after apply)
      + family               = (known after apply)
      + id                   = (known after apply)
      + network_mode         = (known after apply)
      + revision             = (known after apply)
      + status               = (known after apply)
      + task_definition      = "test-api"
      + task_role_arn        = (known after apply)
    }

  # module.ecs_service_test_api.aws_ecs_service.this[0] will be updated in-place
  ~ resource "aws_ecs_service" "this" {
        id                                 = "arn:aws:ecs:xxxxx:xxxxx:service/core-dev/test-api"
        name                               = "test-api"

      ~ task_definition                    = "test-api:485" -> (known after apply)
        # (14 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

  # module.ecs_service_test_api.aws_ecs_task_definition.this[0] must be replaced
+/- resource "aws_ecs_task_definition" "this" {
      ~ arn                      = "arn:aws:ecs:xxxxx:xxxxx:task-definition/test-api:485" -> (known after apply)
      ~ arn_without_revision     = "arn:aws:ecs:xxxxx:xxxxx:task-definition/test-api" -> (known after apply)
      ~ container_definitions    = jsonencode(
          ~ [
              ~ {
                  ~ image                  = "xxxxx.dkr.ecr.xxxxx.amazonaws.com/xxxxx:test-4c2e72beb" -> "xxxxx.dkr.ecr.xxxxx.amazonaws.com/xxxxx:test-latest"
                    name                   = "test-api"
                    # (19 unchanged attributes hidden)
                },
            ] # forces replacement
        )
      ~ id                       = "test-api" -> (known after apply)
      ~ revision                 = 485 -> (known after apply)

        # (8 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

@bolshakoff
Copy link

test-api? Which example can I use to reproduce this? 🤔
Cuz in general, track_latest works for me: when the image in config is different from Terraform state, but is the same as in reality, Terraform reports that no changes are needed. 🤷‍♀️

Terraform doesn't ignore the default Docker image tag

Wait, what do you mean by "default" tag? 😵‍💫

@ivan-sukhomlyn
Copy link
Contributor Author

ivan-sukhomlyn commented Feb 21, 2024

@bolshakoff

Which example can I use to reproduce this? 🤔

I guess it's obvious that I replaced the real names of my project with those values, including the image tag.
And the "default" one is the Docker image used with Terraform while initially creating the container definition and the service.

Did you try to create a new task definition revision, deploy a service after the initial creation using Terraform, and then run terraform apply once again?

@kaarejoergensen
Copy link

@ivan-sukhomlyn I've been testing this functionality as well. It doesn't seem to work as expected, as Terraform still regards the resource as changed because the revision and thereby the arn has changed. I can force it to work as expected by adding a ignore_changes = [container_definitions] lifecycle block, but that has the consequence that changes to the container_definition will not be applied. Seems it's not possible to only ignore the image, since container_definitions is simply one big string.

@bolshakoff
Copy link

@ivan-sukhomlyn

I guess it's obvious that I replaced the real names of my project with those values, including the image tag.

Yes, but which project? Is there some nice, minimal, but complete example to quickly reproduce the problem?
So far, resorted to the example from README.md. 🤕

Regarding the "default" tag - OK, also from what I see in your scenario, when you run terraform apply in step 3, you are expecting Terraform to completely ignore any differences between your task definition resource and task definition reality.
But it's not supposed to do that.
It's actually smarter: it will indeed display zero diff, but only in case your task definition resource matches task definition of latest revision (even if it was deployed outside Terraform); but in case it is different, Terraform will still nicely detect it. 👌
From my humble understanding. And experiments.

So, actually, maybe there is nothing to reproduce, and it's just a misunderstanding.
Because yes, the input description might suggest that, when set to true, the flag ignores the config task definition, if it is outdated:

Whether should track latest task definition or the one created with the resource. Default is false.

But again, it works a bit differently; to have zero diff, you still have to match your resource config with the reality.

@kaarejoergensen - so, you have the same image in resource as in reality, but Terraform still suggests a redeployment? If so, would love to try to reproduce your scenario as well. 🙏

@ivan-sukhomlyn
Copy link
Contributor Author

@kaarejoergensen Yes, I think the single option to get an expected result of not tracking image tag changes outside of Terraform is to have a separate resource for the ECS container definition.
hashicorp/terraform-provider-aws#20121 (comment)

BTW the current hack implemented through the data source does its job because, in case of changes made by CI/CD, we just use the latest task definition and don't touch the aws_ecs_task_definition Terraform resource from the state if there are no changes for it in the configuration.

@bolshakoff
Copy link

single option to get an expected result of not tracking image tag changes outside of Terraform is to have a separate resource for the ECS container definition

There is at least one more:

track_latest = true
image = data.aws_ecs_container_definition.this.image

@ivan-sukhomlyn
Copy link
Contributor Author

ivan-sukhomlyn commented Feb 22, 2024

thanks!
However, it will not work for the first ECS task definition and service creation. Only in cases when an appropriate task definition is already present with a pre-defined image at a container definition.

That's why I'm not sure that it's an acceptable solution for this module; it will cause a lot of questions from the community while trying to create resources from scratch with the module.

@bolshakoff
Copy link

Well, in the worst case, can't we wrap it into something like:

variable "force_new_image" {
  type    = bool
  default = true
  # ...
}

variable "new_image" {
  type = string
  # ...
}

data "aws_ecs_container_definition" "this" {
  count = var.force_new_image ? 0 : 1
  # ...
}

locals {
  current_image = data.aws_ecs_container_definition.this.image
  image         = var.force_new_image ? var.new_image : local.current_image
  # ...
}

@bryantbiggs
Copy link
Member

would something like this work (haven't tried myself)

coalesce(try(data.aws_ecs_container_definition.this.image, ""), "latest")

@bolshakoff
Copy link

And a design note - I think it would be much more user-friendly to hide both ignore_task_definition_changes and task_definition_track_latest variables, and instead, expose the ability to enable the desired behavior via a more high-level variable:

variable "external_image_deployment" {
  type = bool
  # ...
}

Or at least via something like use_current_image or force_new_image & new_image.

Because what is the desired behavior, actually? What was the original motivation behind all these workarounds, hack, issues and the recently merged PR?
It was mainly, or even exclusively about being able to deploy image externally.

@bryantbiggs
Copy link
Member

while that may be the most common use case, it will definitely raise issues around users not being able to make other changes since its the entire task definition thats ignored. so the variable ignore_task_definition_changes says what it does, right on the tin. something more abstract like external_image_deployment takes a bit more to understand whats going on, what do I have access to change, etc.

@remiflament

This comment was marked as resolved.

@remiflament
Copy link

track_latest works for me. I extracted the aws_ecs_task_definition from this module to add the property (see previous comment).

I'm settings my image tag using this syntax :

image  = "${module.ecr.repository_url}:${data.aws_ecr_repository.this.most_recent_image_tags[0]}"

It would be great to add the ability to turn on track_latest and skip the actual hack. Maybe a third scenario in addition to the two actuals.

Did anyone come with news since the last discutions?

@fextr
Copy link

fextr commented Mar 19, 2024

single option to get an expected result of not tracking image tag changes outside of Terraform is to have a separate resource for the ECS container definition

There is at least one more:

track_latest = true
image = data.aws_ecs_container_definition.this.image

@bolshakoff Hey!
Could you provide a complete example of your approach? Looks like this could lead to a cyclic dependency issue.

@ivan-sukhomlyn
Copy link
Contributor Author

ivan-sukhomlyn commented Mar 19, 2024

Hi @remiflament
sorry for the late reply

But is it correct that I can use this module if my use case creates a new task_def at each deployment?

yes, the current hacky approach provides the ability to use external tools during the CI/CD process and deploy new task definition revisions with an image tag without triggering Terraform.

And one more possible method of resolving an issue is to use SSM parameters with an image tag updated by CI/CD while registering a new ECS task definition too.

But I think the best solution, in that case, is to have an additional resource on the AWS provider side, with the possibility of ignoring only image changes by Terraform.
Details - #171 (comment)

Otherwise, all other methods (SSM parameters, ECR image data source, latest ECS task definition from data source) are just workarounds.

Moreover, some people can use this module not only with ECR as a Docker registry but also with Dockerhub, Artifactory, etc., which eliminates the usage of only the ECR data source method as a replacement for the current behavior with tracking the latest task definition.

@ivan-sukhomlyn
Copy link
Contributor Author

@fextr yes, you're right 👍
using the data.aws_ecs_container_definition.this.image leads to a cycle dependency error as this data source requires task definition

For example, considering containers can be more than one at task definition.

module "container_definition" {
# ...

image                    = try(data.aws_ecs_container_definition.this[each.key].image, each.value.image, var.container_definition_defaults.image, null)
# ...
}

locals {
  create_task_definition = var.create && var.create_task_definition

  task_definition = local.create_task_definition ? "${aws_ecs_task_definition.this[0].family}:${aws_ecs_task_definition.this[0].revision}" : var.task_definition_arn
}

data "aws_ecs_container_definition" "this" {
  for_each = { for k, v in var.container_definitions : k => v if local.create_task_definition && try(v.create, true) }

  task_definition = local.task_definition
  container_name  = try(each.value.name, each.key)
}

resource "aws_ecs_task_definition" "this" {
  count = local.create_task_definition ? 1 : 0

  # Convert map of maps to array of maps before JSON encoding
  container_definitions = jsonencode([for k, v in module.container_definition : v.container_definition])
# ...
  track_latest = true
}
$ terraform validate
╷
│ Error: Cycle: module.container_definition (close), data.aws_ecs_container_definition.this, module.container_definition.var.image (expand), module.container_definition.local.definition (expand), module.container_definition.local.container_definition (expand), module.container_definition.output.container_definition (expand), aws_ecs_task_definition.this, local.task_definition (expand)

@ivan-sukhomlyn
Copy link
Contributor Author

ivan-sukhomlyn commented Mar 19, 2024

@bryantbiggs @antonbabenko what do you think about this situation?

Should we invest in using the track_latest ECS task def functionality with the introduction of breaking change?
After that, the module's users will have to track changes in the image field of the ECS task definition by themselves outside of the module and define it via the var to prevent Terraform from re-deploying an ECS service with a previous task definition revision, updated by CI/CD, for example.

Currently, task def revisions are tracked via data source for both module-managed and outside task definitions for the ECS service via the max_task_def_revision local.

Personally, I would rather wait for a possible separate aws_ecs_container_definition resource instead of making such changes.

@bryantbiggs
Copy link
Member

there is no rush to make a breaking change - if we make a breaking change, I'd love for it to be something really useful. I know the CI/CD story in ECS is not very smooth and a major pain point

@bolshakoff
Copy link

bolshakoff commented Mar 21, 2024

@fextr Good question! Indeed, the example you cite assumes that the container already exists.

If I were to generalize it a bit, for a simple workflow with one image, I would probably go with something like this:

variable "force_new_image" {
  type        = bool
  description = "Force new image deployment"
  default     = false
}

data "aws_ecs_container_definition" "this" {
  count = var.force_new_image ? 0 : 1

  task_definition = local.task_definition_name
  container_name  = local.container_name
}

locals {
  image = var.force_new_image ? var.new_image : data.aws_ecs_container_definition.this.0.image
}

With track_latest enabled, this should allow basically everything:

  1. If the container is already there, and external deployment mechanism is used, simply keep force_new_image disabled. But if one needs to sometimes deploy a new image from Terraform, simply enable force_new_image for these cases.
  2. If not, enable force_new_image for initial deployment and then go along with 1.

Copy link

This PR has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this PR will be closed in 10 days

@github-actions github-actions bot added the stale label Apr 21, 2024
Copy link

github-actions bot commented May 2, 2024

This PR was automatically closed because of stale in 10 days

@github-actions github-actions bot closed this May 2, 2024
Copy link

github-actions bot commented Jun 2, 2024

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants