Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provider produced inconsistent result after apply #160

Closed
kpettijohn opened this issue Nov 14, 2019 · 15 comments
Closed

Provider produced inconsistent result after apply #160

kpettijohn opened this issue Nov 14, 2019 · 15 comments

Comments

@kpettijohn
Copy link

@kpettijohn kpettijohn commented Nov 14, 2019

Terraform Version

Run terraform -v to show the version. If you are not running the latest version of Terraform, please upgrade because your issue may have already been fixed.

terraform --version
Terraform v0.12.9
+ provider.consul v2.6.0

Affected Resource(s)

Please list the resources as a list, for example:

  • consul_acl_policy

If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.

Terraform Configuration Files

provider "consul" {
  address        = "10.10.10.101:8501"
  scheme         = "https"
  datacenter     = "mydc"
  version        = ">= 2.6.0"
  ca_file        = "/Users/myuser/.tls/consul/mydc-consul-agent-ca.pem"
  cert_file      = "/Users/myuser/.tls/consul/mydc-cli-consul-1.pem"
  key_file       = "/Users/myuser/.tls/consul/mydc-cli-consul-1-key.pem"

}

resource "consul_acl_policy" "consul_test" {
  name  = "consul-test"
  rules = <<-RULE
    node "consul-test" {
      policy = "write"
    }
    RULE
}

Debug Output

Please provider a link to a GitHub Gist containing the complete debug output: https://www.terraform.io/docs/internals/debugging.html. Please do NOT paste the debug output in the issue; just paste a link to the Gist.

https://gist.github.com/kpettijohn/81cdd2588f7526b35f74c25d3a127c3d

Panic Output

If Terraform produced a panic, please provide a link to a GitHub Gist containing the output of the crash.log.

Expected Behavior

Successful Terraform apply

Actual Behavior

Terraform throws the following error after creating the new policy in Consul.

Error: Provider produced inconsistent result after apply

After the first error if another Terraform apply is attempted it will fail again but with another error saying that a policy with that name already exists.

Error: error creating ACL policy: Unexpected response code: 500 (rpc error making call: Invalid Policy: A Policy with Name "consul-test" already exists)

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply

Important Factoids

Are there anything atypical about your accounts that we should know? For example: Running in EC2 Classic? Custom version of OpenStack? Tight ACLs?

I have ACLs enabled and am currently using the Bootstrap Token (Global Management) token.

References

@remilapeyre

This comment has been minimized.

Copy link
Collaborator

@remilapeyre remilapeyre commented Nov 15, 2019

Thanks @kpettijohn, this is indeed an issue. I made some tests and I find Consul's behavior weird with policies and multiple datacenters.

I'll keep you updated and hopefully post a fix shortly.

@remilapeyre

This comment has been minimized.

Copy link
Collaborator

@remilapeyre remilapeyre commented Dec 5, 2019

Hi @kpettijohn, I think I found multiple ways to trigger this bug.

Can you give more information your Consul cluster? Do you have only one datacenter named mydc or do you have multiple datacenters and are trying to create the policy in mydc?

@kpettijohn

This comment has been minimized.

Copy link
Author

@kpettijohn kpettijohn commented Dec 5, 2019

In my case I only have one datacenter (mydc) with policies enabled and I am attempting to create a new policy in mydc. Let me know if you need anything else and thanks for digging into this issue!

@remilapeyre

This comment has been minimized.

Copy link
Collaborator

@remilapeyre remilapeyre commented Dec 5, 2019

Since the provider did not correctly checked the error response from Consul, we lack some info. Can you try running:

curl -i \
  --header 'X-Consul-Token: YOUR_TOKEN' \
  --cacert /Users/myuser/.tls/consul/mydc-consul-agent-ca.pem \
  --key /Users/myuser/.tls/consul/mydc-cli-consul-1-key.pem \
  --cert /Users/myuser/.tls/consul/mydc-cli-consul-1.pem \
  https://10.10.10.101:8501/v1/acl/policy?dc=mydc \
  -d '{"Name": "node-read","Description": "","Rules": "node \"consul-test\" { policy = \"write\"}","Datacenters": []}'

?

You will need to replace YOUR_TOKEN by a valid token.

@kpettijohn

This comment has been minimized.

Copy link
Author

@kpettijohn kpettijohn commented Dec 6, 2019

Here is the output from the command above.

curl -i \
  --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" \
  --cacert $CONSUL_CACERT \
  --key $CONSUL_CLIENT_KEY \
  --cert $CONSUL_CLIENT_CERT \
  $CONSUL_HTTP_ADDR/v1/acl/policy?dc=$CONSUL_DATACENTER  \
  -d '{"Name": "node-read","Description": "","Rules": "node \"consul-test\" { policy = \"write\"}","Datacenters": []}'

HTTP/2 405
allow: OPTIONS,PUT
vary: Accept-Encoding
content-type: text/plain; charset=utf-8
content-length: 23
date: Fri, 06 Dec 2019 16:42:44 GMT

method POST not allowed

I added the following flags -X PUT --http2 which returns the following.

curl -i \
  -X PUT \
  --http2 \
  --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" \
  --cacert $CONSUL_CACERT \
  --key $CONSUL_CLIENT_KEY \
  --cert $CONSUL_CLIENT_CERT \
  $CONSUL_HTTP_ADDR/v1/acl/policy?dc=$CONSUL_DATACENTER  \
  -d '{"Name": "node-read","Description": "","Rules": "node \"consul-test\" { policy = \"write\"}","Datacenters": []}'

HTTP/2 200
content-type: application/json
vary: Accept-Encoding
content-length: 230
date: Fri, 06 Dec 2019 16:44:26 GMT

{"ID":"fadcd56a-47ba-24e4-2cc6-0130deda4f16","Name":"node-read","Description":"","Rules":"node \"consul-test\" { policy = \"write\"}","Hash":"TAqNtsIj21eVrebawDCRbdszx+xPA/y2OhoMI/sYyWk=","CreateIndex":514133,"ModifyIndex":514133}

Running the curl again after the first 200 success returns the following error.

HTTP/2 500
vary: Accept-Encoding
content-type: text/plain; charset=utf-8
content-length: 84
date: Fri, 06 Dec 2019 16:44:59 GMT

rpc error making call: Invalid Policy: A Policy with Name "node-read" already exists
@remilapeyre

This comment has been minimized.

Copy link
Collaborator

@remilapeyre remilapeyre commented Dec 6, 2019

Can you remove the policy node-read and try again?

@kpettijohn

This comment has been minimized.

Copy link
Author

@kpettijohn kpettijohn commented Dec 6, 2019

Removing the policy allows me to create it again.

HTTP/2 200
content-type: application/json
vary: Accept-Encoding
content-length: 230
date: Fri, 06 Dec 2019 17:35:17 GMT

{"ID":"d18f62fc-c531-2149-5c95-a62d6a0e42eb","Name":"node-read","Description":"","Rules":"node \"consul-test\" { policy = \"write\"}","Hash":"TAqNtsIj21eVrebawDCRbdszx+xPA/y2OhoMI/sYyWk=","CreateIndex":514821,"ModifyIndex":514821}
@remilapeyre

This comment has been minimized.

Copy link
Collaborator

@remilapeyre remilapeyre commented Dec 6, 2019

Does

curl -i \
  --http2 \
  --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" \
  --cacert $CONSUL_CACERT \
  --key $CONSUL_CLIENT_KEY \
  --cert $CONSUL_CLIENT_CERT \
  $CONSUL_HTTP_ADDR/v1/acl/policy/d18f62fc-c531-2149-5c95-a62d6a0e42eb?dc=$CONSUL_DATACENTER

let you read the policy properly?

@kpettijohn

This comment has been minimized.

Copy link
Author

@kpettijohn kpettijohn commented Dec 6, 2019

It does seem to let me read the policy.

curl -i \
>   --http2 \
>   --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" \
>   --cacert $CONSUL_CACERT \
>   --key $CONSUL_CLIENT_KEY \
>   --cert $CONSUL_CLIENT_CERT \
>   $CONSUL_HTTP_ADDR/v1/acl/policy/d18f62fc-c531-2149-5c95-a62d6a0e42eb?dc=$CONSUL_DATACENTER
HTTP/2 200
content-type: application/json
vary: Accept-Encoding
x-consul-index: 514821
x-consul-knownleader: true
x-consul-lastcontact: 0
content-length: 230
date: Fri, 06 Dec 2019 18:10:33 GMT

{"ID":"d18f62fc-c531-2149-5c95-a62d6a0e42eb","Name":"node-read","Description":"","Rules":"node \"consul-test\" { policy = \"write\"}","Hash":"TAqNtsIj21eVrebawDCRbdszx+xPA/y2OhoMI/sYyWk=","CreateIndex":514821,"ModifyIndex":514821}
@remilapeyre

This comment has been minimized.

Copy link
Collaborator

@remilapeyre remilapeyre commented Dec 6, 2019

This is weird, those two calls, the PUT then the GET are exactly what Terraform is doing, it seems that reading the policy from Consul fails but doing it manually seems to work and I did not manage to reproduce the bug on my computer.

Can you send your Consul configuration file with the secrets removed? Maybe my configuration differs from yours.

@remilapeyre

This comment has been minimized.

Copy link
Collaborator

@remilapeyre remilapeyre commented Dec 6, 2019

Maybe this depends on the Consul version we are running. Which one are you using?

@kpettijohn

This comment has been minimized.

Copy link
Author

@kpettijohn kpettijohn commented Dec 6, 2019

I am running Consul v1.6.1 on all clients and servers.

Server configuration:

datacenter = "mydc"
data_dir = "/var/lib/consul"
server = true
bootstrap_expect = 3
ui = true
ports {
  grpc = 8502
  http = -1 //disabled
  https = 8501
}
retry_join = ["10.10.10.101", "10.10.10.102", "10.10.10.103"]
encrypt = "encrypt-key"
verify_incoming = false
verify_incoming_rpc = true
verify_outgoing = true
verify_server_hostname = true
auto_encrypt {
  allow_tls = true
}
ca_file = "/etc/pki/tls/certs/consul-agent-ca.pem"
cert_file = "/etc/pki/tls/certs/mydc-server-consul-0.pem"
key_file = "/etc/pki/tls/private/mydc-server-consul-0-key.pem"

acl = {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
  tokens = {
    agent = "agent-token"
  }
}
addresses = {
  https = "0.0.0.0"
}
@kpettijohn

This comment has been minimized.

Copy link
Author

@kpettijohn kpettijohn commented Dec 6, 2019

After looking into things I bit more I added a log statement to the resource consul_acl_policy and I now see the following error from Consul.

500 (No path to datacenter)

2019-12-06T13:04:12.153-0800 [DEBUG] plugin.terraform-provider-consul_v2.6.0_x4: 2019/12/06 13:04:12 [INFO] Consul Client configured with address: '10.10.10.101:8501', scheme: 'https', datacenter: 'mydc', insecure_https: 'false'
2019-12-06T13:04:12.153-0800 [DEBUG] plugin.terraform-provider-consul_v2.6.0_x4: 2019/12/06 13:04:12 [DEBUG] Creating ACL policy
2019-12-06T13:04:12.272-0800 [DEBUG] plugin.terraform-provider-consul_v2.6.0_x4: 2019/12/06 13:04:12 [DEBUG] Created ACL policy "6db44e0e-c8f2-43e1-152a-eae1124f7e1b"
2019-12-06T13:04:12.274-0800 [DEBUG] plugin.terraform-provider-consul_v2.6.0_x4: 2019/12/06 13:04:12 [INFO] Consul Client configured with address: '10.10.10.101:8501', scheme: 'https', datacenter: 'mydc', insecure_https: 'false'
2019-12-06T13:04:12.274-0800 [DEBUG] plugin.terraform-provider-consul_v2.6.0_x4: 2019/12/06 13:04:12 [DEBUG] Reading ACL policy "6db44e0e-c8f2-43e1-152a-eae1124f7e1b"
2019-12-06T13:04:12.382-0800 [DEBUG] plugin.terraform-provider-consul_v2.6.0_x4: 2019/12/06 13:04:12 [WARN] ACL policy not found, removing from state
2019-12-06T13:04:12.382-0800 [DEBUG] plugin.terraform-provider-consul_v2.6.0_x4: 2019/12/06 13:04:12 [Error] ACL policy not found... Unexpected response code: 500 (No path to datacenter)
@remilapeyre

This comment has been minimized.

Copy link
Collaborator

@remilapeyre remilapeyre commented Dec 6, 2019

Did you added that in resourceConsulACLPolicyRead() at line 87? The error check is not precise enough, this error should have been returned. Still, this error should not happen in the first place.

I can reproduce the bug with multiple datacenters and this is the error message I get too.
This is why I asked you whether you had multiple datacenters, but It seems you have only one. Is it possible some of the servers

I'm running tests based on your configuration but still can't reproduce the bug. Are you sure all three masters are in mydc?

@kpettijohn

This comment has been minimized.

Copy link
Author

@kpettijohn kpettijohn commented Dec 6, 2019

OK I think I found the issue and it seems that it might just be a bad configuration on my end. Overall I had a typo in the datacenter name on the consul provider configuration which allowed it to create the policy but when reading it back it would error as the DC didn't exist. Thanks for your help tracking things down @remilapeyre!

remilapeyre added a commit to remilapeyre/terraform-provider-consul that referenced this issue Dec 6, 2019
Failing to read a policy from the server does not necessarly mean
that the policy has been removed, the network can be down, the correct
datacenter may not be reachable etc. We must be conservative when
removing resources from the state and only create a new one if it's
actualy needed.

See terraform-providers#160
@remilapeyre remilapeyre closed this Dec 6, 2019
remilapeyre added a commit that referenced this issue Dec 6, 2019
Failing to read a policy from the server does not necessarly mean
that the policy has been removed, the network can be down, the correct
datacenter may not be reachable etc. We must be conservative when
removing resources from the state and only create a new one if it's
actualy needed.

See #160
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.