Add a workaround for CloudFormation update issues #162

aermakov-zalando · 2018-05-15T14:09:26Z

A bug in how CloudFormation processes updates for LBs can result in complete misconfiguration. The AWS::ElasticLoadBalancingV2::Listener resource must have exactly one certificate in the CertificateArn property, which is then treated as the default certificate. All the rest are managed in a separate resource of type AWS::ElasticLoadBalancingV2::ListenerCertificate. Unfortunately, the update is not performed atomically. CloudFormation updates the Listener first, and only after it's done it creates a new ListenerCertificate resource and deletes the old one.

However, if the previously default certificate swaps positions with one from the ListenerCertificate list, the resulting race conditions in AWS will result in a completely broken configuration. One of the certificates will be completely missing from the LB and you might or might not get update errors on the CF stack.

Implement the workaround suggested by AWS support, which is to change the name of the ListenerCertificate resource instead of updating an existing one. To make the resource name predictable, certificate ARNs are now sorted and their hash is used in the resource name. A nice side effect is that the default certificate will always be a wildcard one instead of an arbitrary one.

Additionally, the first certificate is now present in both Listener and ListenerCertificate resources. This doesn't seem to break anything, but will stop CloudFormation from deleting the default certificate if it stops being the default one.

A bug in how CloudFormation processes updates for LBs can result in complete misconfiguration. The AWS::ElasticLoadBalancingV2::Listener resource must have exactly one certificate in the CertificateArn property, which is then treated as the default certificate. All the rest are managed in a separate resource of type AWS::ElasticLoadBalancingV2::ListenerCertificate. Unfortunately, the update is not performed atomically. CloudFormation updates the Listener first, and only after it's done it creates a new ListenerCertificate resource and deletes the old one. However, if the previously default certificate swaps positions with one from the ListenerCertificate list, the resulting race conditions in AWS will result in a completely broken configuration. One of the certificates will be completely missing from the LB and you might or might not get update errors on the CF stack. Implement the workaround suggested by AWS support, which is to change the name of the ListenerCertificate resource instead of updating an existing one. To make the resource name predictable, certificate ARNs are now sorted and their hash is used in the resource name. A nice side effect is that the default certificate will always be a wildcard one instead of an arbitrary one. Additionally, the first certificate is now present in both Listener and ListenerCertificate resources. This doesn't seem to break anything, but will stop CloudFormation from deleting the default certificate if it stops being the default one. Signed-off-by: Alexey Ermakov <alexey.ermakov@zalando.de>

coveralls · 2018-05-15T14:15:10Z

Coverage increased (+0.3%) to 64.875% when pulling bf9ff06 on fix-cf-update-bug into 8983ed4 on master.

mikkeloscar · 2018-05-15T14:15:23Z

👍

aermakov-zalando · 2018-05-15T14:16:45Z

👍

szuecs · 2018-05-15T14:19:40Z

What if you have a second wildcard certifcate created that is lexicographically sorted before the first one?
For example you have *.foo.example.org and later you add *.bar.example.org. In this situation it will break again because of the order. I am fine with this but we should document it.

aermakov-zalando · 2018-05-15T14:27:12Z

@szuecs the certificates were already configured in an arbitrary order (depending on hashmap ordering), so nobody should've depended on it. Making a wildcard certificate the default one wasn't really a goal, just a nice side-effect.

mikkeloscar · 2018-05-15T14:29:03Z

@aermakov-zalando @szuecs and this should not really break as I understand it right? Because the "main" certificate will be stored in the ListenerCertificate list so it will be available even if the default is replaced by another wildcard cert.

aermakov-zalando · 2018-05-15T14:32:17Z

@mikkeloscar @szuecs If we want to implement some logic to determine the best default certificate we can do it later, but this change shouldn't break anything.

This fixes a couple of bugs related to exceeding the maximum number of certificates per ALB. It limits the max size to 24 instead of 25. This is done because we need to duplicate the default certificate to work around a CloudFormation bug (#162) and therefore need one extra space for this limiting the maximum unique certificates per ALB to 24 instead of the AWS limit of 25. It also fixes a bug in `AddIngress` which could potentially add some of the certificates for a single ingress to a stack and the rest to another stack resulting in an undesired state. Thirdly it adds rollback complete states to `IsComplete()` to automatically attempt to update stacks that are in a rollback complete state. Lastly it removes some of the getter methods from structs and instead exports the fields that are needed in other packages. This was done to make it easier to use the structs in tests and because we are not writing Java. Fix #176, #175 Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

This fixes a couple of bugs related to exceeding the maximum number of certificates per ALB. It limits the max size to 24 instead of 25. This is done because we need to duplicate the default certificate to work around a CloudFormation bug (#162) and therefore need one extra space for this limiting the maximum unique certificates per ALB to 24 instead of the AWS limit of 25. It also fixes a bug in `AddIngress` which could potentially add some of the certificates for a single ingress to a stack and the rest to another stack resulting in an undesired state. Thirdly it adds rollback complete states to `IsComplete()` to automatically attempt to update stacks that are in a rollback complete state. Fix #176, #175 Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

* Remove getter and setters for easier testing Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net> * Fix handling limit of certificates per ALB This fixes a couple of bugs related to exceeding the maximum number of certificates per ALB. It limits the max size to 24 instead of 25. This is done because we need to duplicate the default certificate to work around a CloudFormation bug (#162) and therefore need one extra space for this limiting the maximum unique certificates per ALB to 24 instead of the AWS limit of 25. It also fixes a bug in `AddIngress` which could potentially add some of the certificates for a single ingress to a stack and the rest to another stack resulting in an undesired state. Thirdly it adds rollback complete states to `IsComplete()` to automatically attempt to update stacks that are in a rollback complete state. Fix #176, #175 Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net> * Vendor with dep Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net> * Check for nil stack Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net> * Sync load balancer state with stack state Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

aermakov-zalando merged commit 3bf4254 into master May 15, 2018

aermakov-zalando deleted the fix-cf-update-bug branch May 15, 2018 14:16

aermakov-zalando mentioned this pull request May 15, 2018

Remove _ from CF resource name #163

Merged

greenboxal mentioned this pull request Jun 8, 2018

Replacing cert doesn't work #173

Closed

mikkeloscar mentioned this pull request Jun 29, 2018

Fix handling limit of certificates per ALB #184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a workaround for CloudFormation update issues #162

Add a workaround for CloudFormation update issues #162

aermakov-zalando commented May 15, 2018

coveralls commented May 15, 2018 •

edited

mikkeloscar commented May 15, 2018

aermakov-zalando commented May 15, 2018

szuecs commented May 15, 2018

aermakov-zalando commented May 15, 2018

mikkeloscar commented May 15, 2018

aermakov-zalando commented May 15, 2018

Add a workaround for CloudFormation update issues #162

Add a workaround for CloudFormation update issues #162

Conversation

aermakov-zalando commented May 15, 2018

coveralls commented May 15, 2018 • edited

mikkeloscar commented May 15, 2018

aermakov-zalando commented May 15, 2018

szuecs commented May 15, 2018

aermakov-zalando commented May 15, 2018

mikkeloscar commented May 15, 2018

aermakov-zalando commented May 15, 2018

coveralls commented May 15, 2018 •

edited