Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition between state machine role creation and use. Long time to propagate the role. #7893

Closed
vishalGitAcc opened this issue Mar 11, 2019 · 9 comments · Fixed by #12005
Labels
bug Addresses a defect in current functionality. service/sfn Issues and PRs that pertain to the sfn service.
Milestone

Comments

@vishalGitAcc
Copy link

Terraform Version
Terraform version 2.1.0

Problem: The state machine exec role takes time to propagate and an explicit “depends on” constraint does not help.

Terraform Configuration Files

data "aws_iam_policy_document" "states_exec_role_document" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["states.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "states_exec_role" {
  name               = "${local.stack_name}-${local.pod}-statesExecRole"
  assume_role_policy = "${data.aws_iam_policy_document.states_exec_role_document.json}"
}

resource "aws_sfn_state_machine" "state_machine" {
 depends_on = ["aws_iam_role.states_exec_role"]
 name       = "${local.stack_name}-${local.pod}-state-machine"
 definition = "${data.template_file.definition.rendered}"
 role_arn   = "${aws_iam_role.states_exec_role.arn}"
}

Crash Output
"* aws_sfn_state_machine.state_machine: Error creating Step Function State Machine: AccessDeniedException: Neither the global service principal states.amazonaws.com, nor the regional one is authorized to assume the provided role.
status code: 400, request id: b26b3cc8-442d-11e9-83bb-03aa166dd894"

Expected Behavior
Apply complete! Resources: X added, 0 changed, 0 destroyed.

Actual Behavior
Fails to apply because iam roles takes time to propagate across regions. Apply succeeds on second application. It takes a few seconds for the IAM role to be available.

Steps to Reproduce
terraform apply

Additional context

Following hack addresses the problem, but this is still a hack.

resource "null_resource" "delay" {
 provisioner "local-exec" {
   command = "sleep 30"
 }
 triggers = {
   "states_exec_role" = "${aws_iam_role.states_exec_role.arn}"
 }
}

resource "aws_sfn_state_machine" "state_machine" {
 name       = "${local.stack_name}-${local.pod}-state-machine"
 definition = "${data.template_file.definition.rendered}"
 role_arn   = "${aws_iam_role.states_exec_role.arn}"
 depends_on = ["null_resource.delay"]
}

Looking for a better solution to address the race condition.
@sstoeckel
Copy link

I got an equal issue. The Problem is the State Machine will be created before the IAM Role and IAM Role Policy Attachment.

@aeschright aeschright added needs-triage Waiting for first response or review from a maintainer. service/iam Issues and PRs that pertain to the iam service. labels Jun 20, 2019
@mauza
Copy link

mauza commented Aug 10, 2019

Experiencing the same issue. I've built a few step functions in the past and never ran into this...

@rbundy
Copy link

rbundy commented Aug 14, 2019

Experiencing the same issue

@prokvk
Copy link

prokvk commented Sep 8, 2019

same issue here

@Vilaggio
Copy link

Vilaggio commented Sep 13, 2019

Same issue here. I attempted to add a trust relationship but that error continues...

I created an IAM role that allows FULL access to every AWS service I'm using.

@Vilaggio
Copy link

I fixed mine by going back and adding the correct IAM role in the state machine.

2

@bflad bflad added bug Addresses a defect in current functionality. service/sfn Issues and PRs that pertain to the sfn service. and removed needs-triage Waiting for first response or review from a maintainer. service/iam Issues and PRs that pertain to the iam service. labels Jul 1, 2020
@bflad bflad added this to the v2.69.0 milestone Jul 1, 2020
@bflad
Copy link
Contributor

bflad commented Jul 1, 2020

The fix to allow the aws_sfn_state_machine resource to retry IAM Role associated failures on creation has been merged and will release with version 2.69.0 of the Terraform AWS Provider, likely tomorrow. Thanks to @DrFaust92 for the implementation. 👍

@ghost
Copy link

ghost commented Jul 3, 2020

This has been released in version 2.69.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

@ghost
Copy link

ghost commented Aug 2, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Aug 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/sfn Issues and PRs that pertain to the sfn service.
Projects
None yet
8 participants