Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terraform crashes (while creating instances) #541

Closed
till opened this issue Dec 17, 2018 · 24 comments
Closed

Terraform crashes (while creating instances) #541

till opened this issue Dec 17, 2018 · 24 comments

Comments

@till
Copy link
Contributor

till commented Dec 17, 2018

Terraform Version

Terraform v0.11.10
+ provider.openstack v1.12.0
+ provider.template v1.0.0

Affected Resource(s)

I am using the OpenStack provider only to:

  • create a private network and a subnet
  • attach router/interface
  • attach the network/subnet to the router
  • add peering to existing public network
  • create ports
  • create instances

I can say that the first 3 steps always work. Creating the ports also seems to be successful always, but then it crashes while creating the instances.

I am creating a total of 4 instances in my test — 3 using my custom module (which wraps port and instance creating — see below) and one with straight Terraform (a jump host, which is connected to the private network it created and an existing public network).

Terraform Configuration Files

I have a terraform repository with the usual (variables, deploy, ...) and two sub-modules which wrap:

  1. network creation (k8s_network)
  2. instance creation (k8s_node)
  3. More terraform in my deploy.tf to add a jump host once the rest is done

I can share all that (sans variables.tf), if required.

Panic Output

https://gist.github.com/till/a5e401bf883c91f02e703235ca26b7ee

Expected Behavior

Network gets created, ports get created, instances are attached. Clean exit.

Actual Behavior

Most of the above happens, but Terraform crashes (and is unable to capture state).

In addition this is output:

Error: Error applying plan:

4 error(s) occurred:

* module.k8s_worker_node.openstack_compute_instance_v2.node: 1 error(s) occurred:

* openstack_compute_instance_v2.node: unexpected EOF
* module.k8s_controlplane_node.openstack_compute_instance_v2.node: 1 error(s) occurred:

* openstack_compute_instance_v2.node: unexpected EOF
* module.k8s_etcd_node.openstack_compute_instance_v2.node: 1 error(s) occurred:

* openstack_compute_instance_v2.node: unexpected EOF
* openstack_compute_instance_v2.jump-host: 1 error(s) occurred:

* openstack_compute_instance_v2.jump-host: unexpected EOF

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.


panic: runtime error: index out of range

Steps to Reproduce

  1. source /etc/kolla/admin-openrc
  2. terraform apply

Questions I had

I noticed lots of PRs closed since last release, but I am unable to figure out how to run the provider from "master". Any hints appreciated, then I will re-test.

@till
Copy link
Contributor Author

till commented Dec 17, 2018

Maybe to add:

  • I just noticed that for the last three tests that I ran, I got instances, but none had an interface attached to them.
  • It seems it's very related to panic: runtime error: index out of range #506 (Sorry for opening yet another ticket.)

What I am unsure is, why this "just" worked (an hour ago), and now I get these errors.

@jtopjian
Copy link
Contributor

@till Yes, this does seem similar to #506. The similarity between this and #506 is that you two are making some kind of Kubernetes infrastructure? Are you basing your configuration off of some published project or article?

From what I can see from the debug logs, you two are using two different cloud providers, so it wouldn't be a case of a single cloud causing this.

Can you provide a Terraform confirmation (.tf) file which can reproduce this issue?

Have you been able to reproduce this issue across multiple multiple builds (meaning: has this happened, say, 2 out of 10 times)?

@jtopjian
Copy link
Contributor

I noticed lots of PRs closed since last release, but I am unable to figure out how to run the provider from "master". Any hints appreciated, then I will re-test.

It's doubtful that any changes since the last release will fix this problem. The better solution would be to apply #539 and see if that helps.

If you are familiar with compiling Go binaries, then the following should work:

$ go get github.com/terraform-providers/terraform-provider-openstack
$ cd $GOPATH/src/github.com/terraform-providers/terraform-provider-openstack
$ make build
$ cp $GOPATH/bin/terraform-provider-openstack /path/to/where/original/is

@jtopjian
Copy link
Contributor

@till I think I see the problem.

In the debug log you've provided, search for:

2018-12-17T15:53:43.849Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 2018/12/17 15:53:43 [DEBUG] Create Options: openstack.PortCreateOpts

The later:

2018-12-17T15:53:49.359Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 2018/12/17 15:53:49 [DEBUG] Created Subnet 7a8b7406-6746-4cdf-be45-eca1d1075a3a

I think what's happening is that the ports are being created before the subnet has finished creating. This means that the ports will not have a fixed IP at the time of their creation. This is a common situation to be in because a port doesn't have a natural dependency on a subnet, even though 99% of the time, you want a port associated with a subnet.

Therefore, you have to make an explicit dependency:

resource "openstack_networking_subnet_v2" "subnet_1" {
  name = "subnet_1"
  network_id = "${openstack_networking_network_v2.network_1.id}"
  cidr = "192.168.1.0/24"
  ip_version = 4
  enable_dhcp = true
  no_gateway = true
}

resource "openstack_networking_port_v2" "port_1" {
  depends_on = [
    "openstack_networking_subnet_v2.subnet_1",
  ]

  name = "port_1"
  network_id = "${openstack_networking_network_v2.network_1.id}"
  admin_state_up = "true"
}

I recommend trying that in your configuration and seeing how things go. If that works, then I have two ideas:

  1. If you are using someone else's module, I recommend letting them know about this.
  2. I still recommend trying Compute v2: Fix Instance NIC indexing #539 if you can. Compute v2: Fix Instance NIC indexing #539 is not meant to be a complete fix, but supposed to prevent the crash from happening. With Compute v2: Fix Instance NIC indexing #539 applied, your instances probably won't have any network information returned, but at least Terraform didn't crash.

@till
Copy link
Contributor Author

till commented Dec 18, 2018

@jtopjian depends_on — neat. I will try that right now.

I had my network stuff in one module, and the instance creation, including ports, in another. I will read up on depends_on. :) Both modules are mine, I will let myself know. 💃

@till
Copy link
Contributor Author

till commented Dec 18, 2018

Depends on doesn't seem to work across modules.

When I try to run terraform plan, it's unable to "find" my subnet. I am guessing I will have to refactor this into the same module?

@till
Copy link
Contributor Author

till commented Dec 18, 2018

@jtopjian part of this seemed to have worked.

I did the following:

  • (in k8s_network) added depends_on when I create network, subnet and router (and router interface)
  • return subnet's id as an output subnet_id
  • use subnet_id in k8s_node in a null resource and depends_on that

However, there is a new issue: When I run destroy now, Terraform is no longer crashing, but trying to clean-up the subnet, but there are still ports attached to it. It's stuck in "Still destroying..." until it times out as it seems to hit some kind of race condition.

I listed the ports with openstack port list and deleted the ports manually myself, but still not entirely sure how to do this "better"? Should I create another issue for it? I am gonna try to recreate a couple times and find out exactly where this port is coming from.

@till
Copy link
Contributor Author

till commented Dec 18, 2018

Part solved: The port mess was from a previous crash.

Cleaning up the ports helped, but eventually, I just end up with a crash with: openstack_compute_instance_v2.node: unexpected EOF

The effects are:

  • private network created
  • router created
  • ports are created, but not assigned to instances
  • instances are created, but not assigned to a port

@till
Copy link
Contributor Author

till commented Dec 18, 2018

panic: runtime error: index out of range
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: goroutine 260 [running]:
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: panic(0xed5ce0, 0x17e4a90)
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 	/opt/goenv/versions/1.10.3/src/runtime/panic.go:551 +0x3c1 fp=0xc420626f60 sp=0xc420626ec0 pc=0x429251
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: runtime.panicindex()
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 	/opt/goenv/versions/1.10.3/src/runtime/panic.go:28 +0x5e fp=0xc420626f80 sp=0xc420626f60 pc=0x427efe
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: github.com/terraform-providers/terraform-provider-openstack/openstack.flattenInstanceNetworks(0xc420270540, 0xfe9f80, 0xc420136160, 0xe70720, 0xc420364470, 0x0, 0x0, 0x0)
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-openstack/openstack/compute_instance_v2_networking.go:469 +0x14a3 fp=0xc4206272f8 sp=0xc420626f80 pc=0xcc4153
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: github.com/terraform-providers/terraform-provider-openstack/openstack.resourceComputeInstanceV2Read(0xc420270540, 0xfe9f80, 0xc420136160, 0x6, 0xc4207c7950)
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-openstack/openstack/resource_openstack_compute_instance_v2.go:546 +0x36e fp=0xc420627578 sp=0xc4206272f8 pc=0xd0843e
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: github.com/terraform-providers/terraform-provider-openstack/openstack.resourceComputeInstanceV2Create(0xc420270540, 0xfe9f80, 0xc420136160, 0xc420270540, 0x0)
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-openstack/openstack/resource_openstack_compute_instance_v2.go:526 +0x13fa fp=0xc4206279f8 sp=0xc420627578 pc=0xd0799a
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: github.com/terraform-providers/terraform-provider-openstack/vendor/github.com/hashicorp/terraform/helper/schema.(*Resource).Apply(0xc4202db340, 0xc4205ba190, 0xc420380160, 0xfe9f80, 0xc420136160, 0xc4201f2e01, 0xc4204e8b88, 0x4bd9cc)
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-openstack/vendor/github.com/hashicorp/terraform/helper/schema/resource.go:227 +0x35a fp=0xc420627a98 sp=0xc4206279f8 pc=0xb9ed6a
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: github.com/terraform-providers/terraform-provider-openstack/vendor/github.com/hashicorp/terraform/helper/schema.(*Provider).Apply(0xc420271f80, 0xc4205ba140, 0xc4205ba190, 0xc420380160, 0xc4204e8b80, 0x484493, 0xc4203f6300)
2018-12-18T14:52:22.010Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-openstack/vendor/github.com/hashicorp/terraform/helper/schema/provider.go:283 +0xa4 fp=0xc420627af8 sp=0xc420627a98 pc=0xb9d574
2018-12-18T14:52:22.011Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: github.com/terraform-providers/terraform-provider-openstack/vendor/github.com/hashicorp/terraform/plugin.(*ResourceProviderServer).Apply(0xc4203800c0, 0xc4203800e0, 0xc4203642f0, 0x0, 0x0)
2018-12-18T14:52:22.011Z [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 	/opt/teamcity-agent/work/5d79fe75d4460a2f/src/github.com/terraform-providers/terraform-provider-openstack/vendor/github.com/hashicorp/terraform/plugin/resource_provider.go:527 +0x57 fp=0xc420627b48 sp=0xc420627af8 pc=0xb57997
...

(I think that's still what your PR is trying to address, but unsure as of "root" cause.)

@jtopjian
Copy link
Contributor

Thank you for looking into this.

Both modules are mine, I will let myself know

haha - sounds good ;)

Since both reported issues were using a module to build Kubernetes infrastructure, I thought there might have been a published module somewhere.

Depends on doesn't seem to work across modules.

Correct. This is a popular issue. One workaround is to define an output variable in your module and then have the resource (which resides outside of the module) depend on that variable. The variable can be anything.

See here for similar issues and workarounds:

(I think that's still what your PR is trying to address, but unsure as of "root" cause.)

Indeed - the same area of code is being hit.

I'm quite sure this is still an ordering issue of some sort. I'll see if I can reproduce it locally.

@jtopjian
Copy link
Contributor

@till well, this is interesting. I reproduced the problem, but the two clouds I tested both returned a proper error:

* openstack_compute_instance_v2.instance_1: Error creating OpenStack server: Bad request with: [POST https://example.com:8774/v2/8ae5f2f63b4d417d85d178a23acdf45b/servers], error message:
{"badRequest": {"message": "Port b7399055-f50e-400b-afe1-4a24c0f34ad6 requires a FixedIP in order to be used.", "code": 400}}

Here's the short config I used:

resource "openstack_networking_network_v2" "network_1" {
  name = "network_1"
}

resource "openstack_networking_subnet_v2" "subnet_1" {
  name        = "subnet_1"
  network_id  = "${openstack_networking_network_v2.network_1.id}"
  cidr        = "192.168.1.0/24"
  ip_version  = 4
  enable_dhcp = true
}

resource "openstack_networking_port_v2" "port_1" {
  /*
  depends_on = [
    "openstack_networking_subnet_v2.subnet_1",
  ]
  */

  name           = "port_1"
  network_id     = "${openstack_networking_network_v2.network_1.id}"
  admin_state_up = "true"
}

resource "openstack_compute_instance_v2" "instance_1" {
  name = "instance_1"

  network {
    port = "${openstack_networking_port_v2.port_1.id}"
  }
}                                                             

I'm curious if you see the same error or if you get the panic/crash. If you don't see the panic, would you mind posting your setup so I can try to reproduce what you're seeing?

@till
Copy link
Contributor Author

till commented Dec 19, 2018

@jtopjian yeah, I can definitely share. I am trying to refactor a few things to get working.

Btw, I can report that when I keep network setup and instance/port creation separate, it succeeds. As I said above, my work around was to not just return the network.id, but also the subnet.id from my k8s_network module.

I then used a null_resource:

resource "null_resource" "subnet_hack" {
  triggers {
    subnet_id = "${var.subnet}"
  }
}

But that still doesn't seem to work. I end up with ports without IPs, etc..

When I execute the same code sequentially (k8s_network module with network, subnet, router, router interface first) and invoke my k8s_node module (port + instance creation), it succeeds.

Your PR in #539 fixes it in a way where, some resources fail, but I am not left with a full on crash (which doesn't seem to preserve all the state despite best attempts made).

@jtopjian
Copy link
Contributor

Your PR in #539 fixes it in a way where, some resources fail, but I am not left with a full on crash (which doesn't seem to preserve all the state despite best attempts made).

OK cool. At least it's stopping the crash. Once I'm able to reproduce this locally, I can look at a better fix.

Thank you :)

@till
Copy link
Contributor Author

till commented Dec 25, 2018

@jtopjian I will need some additional time to pick the code apart. I am gonna try to do this first week of January. But yeah, it doesn't fix the original issue yet (instance doesn't have IPs) as the dependencies are not resolved, but the crash is gone.

@jtopjian
Copy link
Contributor

Sounds good :)

@kayrus
Copy link
Collaborator

kayrus commented Jan 12, 2019

Just faced the same issue.

Panic Output

openstack_compute_instance_v2.app_master: Refreshing state... (ID: 41e30e1f-2b15-49be-8558-6da834896b57)
... ... ...
panic: runtime error: index out of range
2019-01-12T22:01:25.959+0100 [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: 
2019-01-12T22:01:25.959+0100 [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: goroutine 55 [running]:
2019-01-12T22:01:25.959+0100 [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4: github.com/terraform-providers/terraform-provider-openstack/openstack.flattenInstanceNetworks(0xc000231650, 0x107eb00, 0xc00057c180, 0xefb980, 0xc00026e000, 0x0, 0x0, 0x0)
2019-01-12T22:01:25.959+0100 [DEBUG] plugin.terraform-provider-openstack_v1.12.0_x4:    terraform-provider-openstack/openstack/compute_instance_v2_networking.go:469 +0x14d9

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

  1. terraform apply
  2. terraform apply

Important Factoids

resource "openstack_networking_port_v2" "my_app" {
  name = "my_app"
  network_id = "505e3d28-863d-44be-a4ba-3a2ffb5e9abe"
  no_fixed_ip = true
}

resource "openstack_compute_instance_v2" "my_app" {
  name = "my_app"

  flavor_id = "20"
  key_pair  = "keypair"
  image_id  = "67680990-4cd7-4b7f-9ea1-2249e67d0a9c"

  network {
    port = "${openstack_networking_port_v2.my_app.id}"
  }
}

@jtopjian
Copy link
Contributor

@kayrus Thanks!

no_fixed_ip = true

Well that would do it :)

Technically this is an invalid configuration, but it still stands that Terraform should not crash. Are you able to try #539 and see if this helps or if further work is required?

@jtopjian
Copy link
Contributor

@till This should be resolved in #539. If you're still seeing crashes, please let me know.

@till
Copy link
Contributor Author

till commented Jan 15, 2019

@jtopjian 🎉 Thank you for fixing it. Do you know when a new release is coming?

@jtopjian
Copy link
Contributor

@till Yes, 30 minutes ago 😉

@SerenaLi279
Copy link

Hi @jtopjian,
I still encounter the same error, I want to use 'no_fixed_ip = true' (not assign ip for a port).
Do you have any suggestions to avoid this issue?

Error: Error applying plan:

1 error(s) occurred:

* module.serena1-control-01.openstack_compute_instance_v2.instance: 1 error(s) occurred:

* openstack_compute_instance_v2.instance: Error creating OpenStack server: Bad request with: [POST https://10.75.11.228:13774/v2.1/os-volumes_boot], error message: {"badRequest": {"message": "Port 2e4a4f1f-1bbe-4bc8-b58c-fa9f8224114d requires a FixedIP in order to be used.", "code": 400}}

Terraform Version:

  • terraform-provider-openstack_v1.16.0_x4 ----checked this is the latested one
  • Terraform v0.11.11

openstack.tf

resource "openstack_networking_port_v2" "port_net-02_v4" {
    name = "${var.name_prefix}-${var.hostname}"
    network_id = "${var.net-02_net_uuid}"
    admin_state_up = "true"
    no_security_groups = "false"
    no_fixed_ip = true
    count = "${var.count}"
}


resource "openstack_compute_instance_v2" "instance" {
    name = "${var.name_prefix}-${var.hostname}"
    key_pair = "${var.keypair_name}"
    flavor_id = "${var.flavor_uuid}"
    image_id = "${var.image_uuid}"
    config_drive = "true"
    availability_zone = "${var.zone_of_instance}"
    
    network  = {
        port = "${openstack_networking_port_v2.port_net-02_v4.id}"
    }
}

@jtopjian
Copy link
Contributor

@SerenaLi279 Unfortunately I'm not sure what the problem might be.

This issue was opened because Terraform was actually crashing (panic + stack trace) so the goal was to stop that from happening and have Terraform report a proper error (which is what you're seeing).

Per my comment here (https://github.com/terraform-providers/terraform-provider-openstack/issues/541#issuecomment-448451486), I was unable to actually get an instance created without an IP (I got the same error as you). I'm not sure if others are able to because their clouds have a special configuration or if they are using a different set of Terraform resources.

@SerenaLi279
Copy link

Hi @jtopjian,
Thanks your reply, this issued https://github.com/terraform-providers/terraform-provider-openstack/issues/429 created by my colleague, it sounds like we could use 'no_fixed_ip'https://github.com/terraform-providers/terraform-provider-openstack/pull/433 to create a port without ip for one instance. But in my case it dose not work, could you please tell me how to let it work or have other ways?
Thanks a lot~

@SerenaLi279
Copy link

@jtopjian,
I found this quite new thread from redhat https://bugzilla.redhat.com/show_bug.cgi?id=1669350
where they state:

upon review ip less interfaces while supported by neutrons api are not supported by nova in OSP 13
currently no release of openstack nova upstream allows the use of neutron ports with ip_allocation=none.
this will be considerded as a possible Future Feature for OSP 16 based on discussion with upstream.

so looks like attaching a port without fixed ip is currently only supported for already running VMs.
I tried could attach the port without fixed IP to an existing VM instance.

so no_fixed_ip could works well with openstack_compute_interface_attach_v2 to attach ports after vm has created like:

resource "openstack_networking_port_v2" "port_net-02_v4" {
    name = "${var.name_prefix}-${var.hostname}"
    network_id = "${var.net-02_net_uuid}"
    admin_state_up = "true"
    security_group_ids = [ 
  ] 
    no_fixed_ip = true
    count = "${var.count}"
}

resource "openstack_networking_port_v2" "port_net-03_v4" {
    name = "${var.name_prefix}-${var.hostname}"
    network_id = "${var.net-02_net_uuid}"
    admin_state_up = "true"
    security_group_ids = [ 
  ] 
    no_fixed_ip = true
    count = "${var.count}"
}
    
resource "openstack_networking_port_v2" "port_net-01_v4" {
    name = "${var.name_prefix}-${var.hostname}"
    network_id = "${var.net-01_net_uuid}"
    admin_state_up = "true"
    security_group_ids = [ 
  ] 
    fixed_ip {
        "subnet_id" = "${var.net-01_v4_sub_uuid}"
        "ip_address" = "${var.net-01_static_ipv4_address}"
    }
    allowed_address_pairs {
        ip_address = "${var.overlay_network}"
    }
    count = "${var.count}"
}
    
resource "openstack_compute_instance_v2" "instance" {
    name = "${var.name_prefix}-${var.hostname}"
    key_pair = "${var.keypair_name}"
    flavor_id = "${var.flavor_uuid}"
    image_id = "${var.image_uuid}"
    config_drive = "true"
    availability_zone = "${var.zone_of_instance}"
    
    network  = {
        port = "${openstack_networking_port_v2.port_net-01_v4.id}"
    }

    count = "${var.count}" 
    user_data = "${var.user_data}"
}

resource "openstack_compute_interface_attach_v2" "test2" {
  instance_id = "${openstack_compute_instance_v2.instance.id}"
  port_id  = "${openstack_networking_port_v2.port_net-02_v4.id}"
}
resource "openstack_compute_interface_attach_v2" "test3" {
  depends_on = ["openstack_compute_interface_attach_v2.test2"]
  instance_id = "${openstack_compute_instance_v2.instance.id}"
  port_id  = "${openstack_networking_port_v2.port_net-03_v4.id}"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants