Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting an intermittent failed to refresh Bearer token error when trying to delete my AKS cluster #2602

Closed
btai24 opened this issue Jan 4, 2019 · 17 comments 路 Fixed by #4775
Closed
Assignees
Milestone

Comments

@btai24
Copy link

@btai24 btai24 commented Jan 4, 2019

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.11.10
AzureRM Provider v1.20.0

Affected Resource(s)

  • azurerm_kubernetes_cluster

Terraform Configuration Files

resource "azurerm_kubernetes_cluster" "aks_cluster" {
  name       = "${var.name}"
  location   = "${var.region}"
  dns_prefix = "${var.name}"

  kubernetes_version  = "${var.kubernetes_version}"
  resource_group_name = "${azurerm_resource_group.aks_resource_group.name}"

  linux_profile {
    admin_username = "xxx"

    ssh_key {
      key_data = "${var.ssh_public_key}"
    }
  }

  agent_pool_profile {
    count = "${var.node_count}"

    name            = "agentpool"
    vm_size         = "${var.vm_size}"
    os_disk_size_gb = "${var.os_disk_size}"
    os_type         = "Linux"
    vnet_subnet_id  = "${azurerm_subnet.private.id}"
    max_pods        = 110
  }

  service_principal {
    client_id     = "${azurerm_azuread_service_principal.service_principal.application_id}"
    client_secret = "${random_string.service_principal_password.result}"
  }

  role_based_access_control {
    enabled = true

    azure_active_directory {
      client_app_id     = "${var.rbac_client_app_id}"
      server_app_id     = "${var.rbac_server_app_id}"
      server_app_secret = "${var.rbac_server_app_secret}"
    }
  }

  network_profile {
    network_plugin = "azure"
  }

  depends_on = [
    "azurerm_azuread_service_principal.service_principal",
    "azurerm_azuread_service_principal_password.password",
  ]

  tags {
    environment = "${var.environment}"
    name        = "${var.name}"
  }
}

Debug Output

Unfortunately this happens intermittently, so I haven't been able to get debug output It started happening after I upgraded to AzureRM Provider v1.20, but I'm not sure if there is a connection.

Expected Behavior

Running terraform destroy should successfully delete the terraform provisioned AKS cluster on the first attempt.

Actual Behavior

Running terraform destroy does not always successfully delete the terraform provisioned AKS cluster on the first attempt. It always succeeds on a second attempt.

The error produced:

Error: Error applying plan:

1 error(s) occurred:

* module.aks_cluster.azurerm_kubernetes_cluster.aks_cluster (destroy): 1 error(s) occurred:

* azurerm_kubernetes_cluster.aks_cluster: Error waiting for the deletion of Managed Kubernetes Cluster "test-westus2" (Resource Group "aks-rg-test-westus2"): azure.BearerAuthorizer#WithAuthorization: 
Failed to refresh the Token for request to https://management.azure.com/subscriptions/<subscription_id>providers/Microsoft.ContainerService/locations/westus2/operations/<id>?api-version=2016-03-30: StatusCode=0 -- 
Original Error: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token

Steps to Reproduce

This unfortunately happens intermittently. But running a terraform destroy on an AKS cluster sometimes results in the error above.

@ToruMakabe

This comment was marked as off-topic.

Copy link

@ToruMakabe ToruMakabe commented Jan 11, 2019

I got the same error not only deletion of cluster, but also creation.

Error: Error applying plan:

1 error(s) occurred:

* module.primary.azurerm_kubernetes_cluster.aks: 1 error(s) occurred:

* azurerm_kubernetes_cluster.aks: Error waiting for completion of Managed Kubernetes Cluster "mycluster" (Resource Group "myrg"): azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/mysubscription/providers/Microsoft.ContainerService/locations/japaneast/operations/myid?api-version=2017-08-31: StatusCode=0 -- OriginalError: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token
@ToruMakabe

This comment has been minimized.

Copy link

@ToruMakabe ToruMakabe commented Jan 15, 2019

The same error happened besides AKS cluster creation/deletion. It seems that the error occurs in long-running plan/apply. The following is an example of it during Resource Group deletion at the end of long-running apply.

Error: Error applying plan:

1 error(s) occurred:

* azurerm_resource_group.shared (destroy): 1 error(s) occurred:

* azurerm_resource_group.shared: Error deleting Resource Group "myrg": azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to
https://management.azure.com/subscriptions/myid/operationresults/myresult?api-version=2018-05-01: StatusCode=0 -- Original Error: Manually created ServicePrincipalToken does not contain secret material to retrieve a new access token

@katbyte Do you have any advice?

@tombuildsstuff

This comment has been minimized.

Copy link
Contributor

@tombuildsstuff tombuildsstuff commented Jan 15, 2019

hi @btai24 @ToruMakabe

Thanks for opening this issue :)

This appears to be a bug in the authentication logic which we use to connect to Azure (specifically how it handles refreshing tokens) - as such this would require a bug-fix there (which is handled in this repository: http://github.com/hashicorp/go-azure-helpers). So that we're able to diagnose this further - would it be possible to know which method you're using to authenticate with Azure from Terraform (e.g. the Azure CLI / a Service Principal with a Client Secret etc)?

Thanks!

@ToruMakabe

This comment has been minimized.

Copy link

@ToruMakabe ToruMakabe commented Jan 15, 2019

@tombuildsstuff Thanks! I use Azure CLI.

@tombuildsstuff

This comment has been minimized.

Copy link
Contributor

@tombuildsstuff tombuildsstuff commented Jan 17, 2019

@ToruMakabe thanks for confirming that. Since this appears to be an issue in the upstream library I've created an upstream issue for this: hashicorp/go-azure-helpers#22

@OffColour

This comment was marked as off-topic.

Copy link

@OffColour OffColour commented Mar 6, 2019

As a workaround, if you're using az login and your individual account, this doesn't happen with "az login --use-device-code".

@btai24

This comment was marked as off-topic.

Copy link
Author

@btai24 btai24 commented Mar 11, 2019

@tombuildsstuff late reply, but i also use the Azure CLI

(revisiting this issue because I'm still running into this)

@PriceChild

This comment was marked as off-topic.

Copy link

@PriceChild PriceChild commented Jun 11, 2019

Hit these errors with azure-cli@2.0.66

Downgrading to 2.0.64 resolved the issue for me.

@markokole

This comment was marked as off-topic.

Copy link

@markokole markokole commented Jul 9, 2019

Getting the same error with azure-cli 2.0.68 while trying to provision event hubs.

@markokole

This comment was marked as off-topic.

Copy link

@markokole markokole commented Jul 9, 2019

Getting the same error with azure-cli 2.0.68 while trying to provision event hubs.

I ran the az login command again and provisioned with success now. So I guess there is an expiry token issue.

@mariojacobo

This comment was marked as off-topic.

Copy link

@mariojacobo mariojacobo commented Jul 18, 2019

+1 same issue

@mariojacobo

This comment was marked as off-topic.

Copy link

@mariojacobo mariojacobo commented Jul 22, 2019

seeing the same issue here. it's intermittent on apply or destroy operations with no apparent pattern (sometimes even after a few minutes, so the long running operation does not seem to count much). Anyone know if we can refresh the token or login again during the terraform apply ? I'm using latest (azure-cli 2.0.69) version.

@mikhailshilkov

This comment was marked as outdated.

Copy link
Contributor

@mikhailshilkov mikhailshilkov commented Sep 17, 2019

I opened a PR to fix this a while back: hashicorp/go-azure-helpers#39
Hopefully, it will get some attention soon.

Update: the PR is closed by maintainers without merging.

@theharleyquin

This comment was marked as off-topic.

Copy link

@theharleyquin theharleyquin commented Sep 20, 2019

Great to see this is upcoming - just ran into this today for AKS creation

@amasover

This comment has been minimized.

Copy link

@amasover amasover commented Oct 24, 2019

It looks like Azure/go-autorest#476 was just recently merged in, so once it gets incorporated downstream this issue should be fixed.

@tombuildsstuff

This comment has been minimized.

Copy link
Contributor

@tombuildsstuff tombuildsstuff commented Oct 25, 2019

@amasover yeah, we've a PR ready to go into the base library to fix this, it's just waiting on a release of go-autorest which looks like it's happening soon-ish :)

@hashibot

This comment has been minimized.

Copy link

@hashibot hashibot bot commented Nov 26, 2019

This has been released in version 1.37.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:

provider "azurerm" {
    version = "~> 1.37.0"
}
# ... other configuration ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

You can鈥檛 perform that action at this time.