Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal errors with increasing number of security group rules #18

Closed
lubars opened this issue Mar 15, 2019 · 17 comments
Closed

internal errors with increasing number of security group rules #18

lubars opened this issue Mar 15, 2019 · 17 comments

Comments

@lubars
Copy link

lubars commented Mar 15, 2019

With 16 or more tencentcloud_security_group_rule defined, all attempts at apply result in the following error:

Error: Error applying plan:
1 error(s) occurred:
* tencentcloud_instance.default: 1 error(s) occurred:
* tencentcloud_instance.default: tencentcloud_instance got error, code:InternalError, message:An internal error has occurred. Retry your request, but if the problem persists, contact us with details by posting a message on the Tencent cloud forums.

This limit includes inactive rules with count = 0.

@zqfan
Copy link
Contributor

zqfan commented Mar 15, 2019

@lubars Please paste your .tf file (mask sensitive data) if it is convinient to do so.
The error shows it is caused by CVM API, but you mentioned that it fails when you have 16 security group rules, so you are trying to create an instance with a security group which has 16 security group rules?

@lubars
Copy link
Author

lubars commented Mar 15, 2019

I simplified and sanitized to a small, reproducible file, and was able to reliably get the internal error by adding/removing security group rules. It basically looked like this:

resource "tencentcloud_security_group" "default" {
  name = "default"
  description = "default security group"
}

resource "tencentcloud_security_group_rule" "rule1" {
}
resource "tencentcloud_security_group_rule" "rule2" {
}
...
resource "tencentcloud_security_group_rule" "rule16" {
}

resource "tencentcloud_instance" "default" {
  . . .
  security_groups   = ["${tencentcloud_security_group.default.id}"]
  . . .
}

I did one last test, and to my surprise it succeeded. I went back to my original files, and those are now succeeding too (after failing for several days). So I am no longer sure what is going on or what the problem was.

@lubars
Copy link
Author

lubars commented Mar 15, 2019

I'm still getting intermittent failures, and they seem more frequent with more security group rules (just not reliably so). Here is my template; I would be interested to know if you encounter any issues with it.

provider "tencentcloud" {
  secret_id  = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  secret_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  region     = "na-siliconvalley"
}

resource "tencentcloud_security_group" "default" {
  name = "default"
  description = "default security group"
}

resource "tencentcloud_security_group_rule" "rule0" {
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "egress"
  policy = "accept"
  cidr_ip = "0.0.0.0/0"
}
resource "tencentcloud_security_group_rule" "rule1" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "22"
}
resource "tencentcloud_security_group_rule" "rule2" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "2376"
}
resource "tencentcloud_security_group_rule" "rule3" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "51773,52773"
}
resource "tencentcloud_security_group_rule" "rule4" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "2188"
}
resource "tencentcloud_security_group_rule" "rule5" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "UDP"
  cidr_ip = "0.0.0.0/0"
  port_range = "4002"
}
resource "tencentcloud_security_group_rule" "rule6" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "3022-3023"
}
resource "tencentcloud_security_group_rule" "rule7" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "2377-2378"
}
resource "tencentcloud_security_group_rule" "rule8" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "51774-51775"
}
resource "tencentcloud_security_group_rule" "rule9" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "52774-52775"
}
resource "tencentcloud_security_group_rule" "rule10" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "80"
}
resource "tencentcloud_security_group_rule" "rule11" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "53"
}
resource "tencentcloud_security_group_rule" "rule12" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "UDP"
  cidr_ip = "0.0.0.0/0"
  port_range = "53"
}
resource "tencentcloud_security_group_rule" "rule13" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "4041"
}
resource "tencentcloud_security_group_rule" "rule14" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "6783"
}
resource "tencentcloud_security_group_rule" "rule15" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "UDP"
  cidr_ip = "0.0.0.0/0"
  port_range = "6783-6784"
}
resource "tencentcloud_security_group_rule" "rule16" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "7077,7000,8080,8081,6066,7001,7005"
}
resource "tencentcloud_security_group_rule" "rule17" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "8080"
}
resource "tencentcloud_security_group_rule" "rule18" {
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress"
  policy = "accept"
  ip_protocol = "UDP"
  cidr_ip = "0.0.0.0/0"
  port_range = "500,4500"
}
resource "tencentcloud_security_group_rule" "rule19" { 
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress" 
  policy = "accept"
  ip_protocol = "TCP"
  cidr_ip = "0.0.0.0/0"
  port_range = "8300,8301,8500"
}
resource "tencentcloud_security_group_rule" "rule20" { 
  count = "1"
  security_group_id = "${tencentcloud_security_group.default.id}"
  type = "ingress" 
  policy = "accept"
  ip_protocol = "UDP" 
  cidr_ip = "0.0.0.0/0"
  port_range = "8301"
}

resource "tencentcloud_instance" "default" {
  count             = "1"
  instance_name     = "default"
  availability_zone = "na-siliconvalley-1"
  image_id          = "img-pi0ii46r"
  instance_type     = "S2.MEDIUM4"
  security_groups   = ["${tencentcloud_security_group.default.id}"]
  internet_max_bandwidth_out = 20
  allocate_public_ip = "true"
  system_disk_type = "CLOUD_BASIC"
  system_disk_size = "50"
}

@lubars
Copy link
Author

lubars commented Mar 15, 2019

The file above is now succeeding on most runs, but when I set the instance count to "2" I am seeing the following error on about 50% of the runs:

* tencentcloud_instance.default: tencentcloud_instance got error, code:InternalError, message:The order system is abnormal. Please try again later.

It also seems to be the case that there are always some security group rules which fail to be destroyed:

Error: Error applying plan:
2 error(s) occurred:
* tencentcloud_security_group_rule.rule4 (destroy): 1 error(s) occurred:
* tencentcloud_security_group_rule.rule4: security group rule index not found
* tencentcloud_security_group_rule.rule3 (destroy): 1 error(s) occurred:
* tencentcloud_security_group_rule.rule3: security group rule index not found

However a second terraform destroy seems to clean them up reliably.

@zqfan
Copy link
Contributor

zqfan commented Mar 16, 2019

Thanks for providing such detailed information, we will look into it. Basically, as I said, this issue is caused by CVM & VPC APIs, we need to discuss with these products' developers to locate the problem.

@lubars
Copy link
Author

lubars commented Jun 15, 2019

Any progress on this issue? Still receiving internal errors on 90%-95% of attempts to provision; hundreds of attempts are now required when creating a cluster.

@lubars lubars changed the title internal error when more than 15 security group rules defined internal errors with increasing number of security group rules Jun 17, 2019
@likexian
Copy link
Contributor

Hello @lubars
We have refactoring the security_group, would you please have a try with the new release?

@likexian
Copy link
Contributor

If there is any problems, please open a new issue.

@lubars
Copy link
Author

lubars commented Jul 31, 2019

Upgraded Terraform and Tencentcloud plugin:

Terraform v0.12.5

  • provider.null v2.1.2
  • provider.tencentcloud v1.14.0

Still getting internal errors on roughly 25% of calls to terraform apply (this succeeded on the third try):

Error: [TencentCloudSDKError] Code=InternalError, Message=(bd6d22a359eb)An internal error has occurred. Retry your request, but if the problem persists, contact us with details by posting a message on the Tencent cloud forums., RequestId=5498c37c-ff95-4231-ab6c-bd6d22a359eb

on infrastructure.tf line 372, in resource "tencentcloud_cbs_storage" "data":
372: resource "tencentcloud_cbs_storage" "data" {

Still seeing errors on calls to terraform destroy as well:

Error: [TencentCloudSDKError] Code=ResourceBusy, Message=656e6c70-be4a-4657-aed1-95964abc0b7b,ins-ruv3rbzm is busy, please retry later (0afe54a64aa6), RequestId=a7e786f4-5c73-47e6-8d37-0afe54a64aa6

@ausmartway
Copy link
Contributor

ausmartway commented Jul 31, 2019 via email

@lubars
Copy link
Author

lubars commented Aug 1, 2019

Hi @ausmartway,

In the few runs I was able to do with -parallelism=1, I did not see any internal errors, so this may be a significant finding (though the apply phase took much longer, so more of a workaround than a solution).

I do continue to see errors in the destroy phase (using the same parameter), such as:

Error: [TencentCloudSDKError] Code=ResourceInUse, Message=指定资源 subnet-4sg0xnlu 已经在使用中。, RequestId=d3d44dd7-844c-40bb-a4c6-30377af2e5f3

and:

Error: security group sg-6yhirbge still bind instances

@ausmartway
Copy link
Contributor

@lubars

I agree that this is a workaround rather than a solution.

I believe @likexian's team will look into the best way to resolve this issue, I guess this is related to how the provider would queue/buffer calls to Tencent cloud API.

As for not being able to destroy the subnet, you can use tencent cloud console to find out which resources is still using the subnet(subnet-4sg0xnlu), and try manually remove it.

If you happened to find that it is a resource other than CVM, please report here, so @likexian can look into it.

@likexian
Copy link
Contributor

likexian commented Aug 1, 2019

Hello @lubars
Would you please post your tf file here? I will reproduce it first.

@lubars
Copy link
Author

lubars commented Aug 1, 2019

Were you unable to reproduce using the simple tf file I provided above? The files I am using now would require a lot of time to sanitize and reduce to a single file.

@likexian
Copy link
Contributor

likexian commented Aug 1, 2019

Thank you @lubars
The tf file above works for me, have you got error using the above tf file?

Would you please turn on terraform debug mode and send the log to me when failed?

Turn on debug mode:

export TF_LOG="DEBUG"
export  TF_LOG_PATH="./terraform.log"

Then

terraform destroy

And send terraform.log to me
Sorry for the trouble.

@likexian likexian reopened this Aug 1, 2019
@likexian
Copy link
Contributor

likexian commented Aug 1, 2019

Upgraded Terraform and Tencentcloud plugin:

Terraform v0.12.5

  • provider.null v2.1.2
  • provider.tencentcloud v1.14.0

Still getting internal errors on roughly 25% of calls to terraform apply (this succeeded on the third try):

Error: [TencentCloudSDKError] Code=InternalError, Message=(bd6d22a359eb)An internal error has occurred. Retry your request, but if the problem persists, contact us with details by posting a message on the Tencent cloud forums., RequestId=5498c37c-ff95-4231-ab6c-bd6d22a359eb
on infrastructure.tf line 372, in resource "tencentcloud_cbs_storage" "data":
372: resource "tencentcloud_cbs_storage" "data" {

Still seeing errors on calls to terraform destroy as well:

Error: [TencentCloudSDKError] Code=ResourceBusy, Message=656e6c70-be4a-4657-aed1-95964abc0b7b,ins-ruv3rbzm is busy, please retry later (0afe54a64aa6), RequestId=a7e786f4-5c73-47e6-8d37-0afe54a64aa6

Hello @lubars
For this issue, we have find out the matter, and will fix it in next release.
Thank you for your feedback.

@likexian
Copy link
Contributor

likexian commented Aug 6, 2019

Hello @lubars
Thank you for your feedback, we have fixed the issue with resources creating, Would you please upgrade to the v1.14.1 and try again?
If there is still some problems, please open debug mode and send me the log.

Turn on debug mode:
export TF_LOG="DEBUG"
export  TF_LOG_PATH="./terraform.log"

Thank you.

@likexian likexian closed this as completed Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants