Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a workaround for CloudFormation update issues #162

Merged
merged 1 commit into from May 15, 2018

Conversation

aermakov-zalando
Copy link
Contributor

A bug in how CloudFormation processes updates for LBs can result in complete misconfiguration. The AWS::ElasticLoadBalancingV2::Listener resource must have exactly one certificate in the CertificateArn property, which is then treated as the default certificate. All the rest are managed in a separate resource of type AWS::ElasticLoadBalancingV2::ListenerCertificate. Unfortunately, the update is not performed atomically. CloudFormation updates the Listener first, and only after it's done it creates a new ListenerCertificate resource and deletes the old one.

However, if the previously default certificate swaps positions with one from the ListenerCertificate list, the resulting race conditions in AWS will result in a completely broken configuration. One of the certificates will be completely missing from the LB and you might or might not get update errors on the CF stack.

Implement the workaround suggested by AWS support, which is to change the name of the ListenerCertificate resource instead of updating an existing one. To make the resource name predictable, certificate ARNs are now sorted and their hash is used in the resource name. A nice side effect is that the default certificate will always be a wildcard one instead of an arbitrary one.

Additionally, the first certificate is now present in both Listener and ListenerCertificate resources. This doesn't seem to break anything, but will stop CloudFormation from deleting the default certificate if it stops being the default one.

A bug in how CloudFormation processes updates for LBs can result in
complete misconfiguration. The AWS::ElasticLoadBalancingV2::Listener
resource must have exactly one certificate in the CertificateArn
property, which is then treated as the default certificate. All the rest
are managed in a separate resource of type
AWS::ElasticLoadBalancingV2::ListenerCertificate. Unfortunately, the
update is not performed atomically. CloudFormation updates the Listener
first, and only after it's done it creates a new ListenerCertificate
resource and deletes the old one.

However, if the previously default certificate swaps positions with one
from the ListenerCertificate list, the resulting race conditions in AWS
will result in a completely broken configuration. One of the
certificates will be completely missing from the LB and you might or
might not get update errors on the CF stack.

Implement the workaround suggested by AWS support, which is to change
the name of the ListenerCertificate resource instead of updating an
existing one. To make the resource name predictable, certificate ARNs
are now sorted and their hash is used in the resource name. A nice side
effect is that the default certificate will always be a wildcard one
instead of an arbitrary one.

Additionally, the first certificate is now present in both Listener and
ListenerCertificate resources. This doesn't seem to break anything, but
will stop CloudFormation from deleting the default certificate if it
stops being the default one.

Signed-off-by: Alexey Ermakov <alexey.ermakov@zalando.de>
@coveralls
Copy link

coveralls commented May 15, 2018

Coverage Status

Coverage increased (+0.3%) to 64.875% when pulling bf9ff06 on fix-cf-update-bug into 8983ed4 on master.

@mikkeloscar
Copy link
Collaborator

👍

1 similar comment
@aermakov-zalando
Copy link
Contributor Author

👍

@aermakov-zalando aermakov-zalando merged commit 3bf4254 into master May 15, 2018
@aermakov-zalando aermakov-zalando deleted the fix-cf-update-bug branch May 15, 2018 14:16
@szuecs
Copy link
Member

szuecs commented May 15, 2018

What if you have a second wildcard certifcate created that is lexicographically sorted before the first one?
For example you have *.foo.example.org and later you add *.bar.example.org. In this situation it will break again because of the order. I am fine with this but we should document it.

@aermakov-zalando
Copy link
Contributor Author

@szuecs the certificates were already configured in an arbitrary order (depending on hashmap ordering), so nobody should've depended on it. Making a wildcard certificate the default one wasn't really a goal, just a nice side-effect.

@mikkeloscar
Copy link
Collaborator

@aermakov-zalando @szuecs and this should not really break as I understand it right? Because the "main" certificate will be stored in the ListenerCertificate list so it will be available even if the default is replaced by another wildcard cert.

@aermakov-zalando
Copy link
Contributor Author

@mikkeloscar @szuecs If we want to implement some logic to determine the best default certificate we can do it later, but this change shouldn't break anything.

mikkeloscar added a commit that referenced this pull request Jun 29, 2018
This fixes a couple of bugs related to exceeding the maximum number of
certificates per ALB.

It limits the max size to 24 instead of 25. This is done because we need
to duplicate the default certificate to work around a CloudFormation bug
(#162) and therefore need one extra space for this limiting the maximum
unique certificates per ALB to 24 instead of the AWS limit of 25.

It also fixes a bug in `AddIngress` which could potentially add some of
the certificates for a single ingress to a stack and the rest to
another stack resulting in an undesired state.

Thirdly it adds rollback complete states to `IsComplete()` to
automatically attempt to update stacks that are in a rollback complete
state.

Lastly it removes some of the getter methods from structs and instead
exports the fields that are needed in other packages. This was done to
make it easier to use the structs in tests and because we are not
writing Java.

Fix #176, #175

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>
mikkeloscar added a commit that referenced this pull request Jun 29, 2018
This fixes a couple of bugs related to exceeding the maximum number of
certificates per ALB.

It limits the max size to 24 instead of 25. This is done because we need
to duplicate the default certificate to work around a CloudFormation bug
(#162) and therefore need one extra space for this limiting the maximum
unique certificates per ALB to 24 instead of the AWS limit of 25.

It also fixes a bug in `AddIngress` which could potentially add some of
the certificates for a single ingress to a stack and the rest to
another stack resulting in an undesired state.

Thirdly it adds rollback complete states to `IsComplete()` to
automatically attempt to update stacks that are in a rollback complete
state.

Lastly it removes some of the getter methods from structs and instead
exports the fields that are needed in other packages. This was done to
make it easier to use the structs in tests and because we are not
writing Java.

Fix #176, #175

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>
mikkeloscar added a commit that referenced this pull request Jul 8, 2018
This fixes a couple of bugs related to exceeding the maximum number of
certificates per ALB.

It limits the max size to 24 instead of 25. This is done because we need
to duplicate the default certificate to work around a CloudFormation bug
(#162) and therefore need one extra space for this limiting the maximum
unique certificates per ALB to 24 instead of the AWS limit of 25.

It also fixes a bug in `AddIngress` which could potentially add some of
the certificates for a single ingress to a stack and the rest to
another stack resulting in an undesired state.

Thirdly it adds rollback complete states to `IsComplete()` to
automatically attempt to update stacks that are in a rollback complete
state.

Fix #176, #175

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>
mikkeloscar added a commit that referenced this pull request Jul 8, 2018
This fixes a couple of bugs related to exceeding the maximum number of
certificates per ALB.

It limits the max size to 24 instead of 25. This is done because we need
to duplicate the default certificate to work around a CloudFormation bug
(#162) and therefore need one extra space for this limiting the maximum
unique certificates per ALB to 24 instead of the AWS limit of 25.

It also fixes a bug in `AddIngress` which could potentially add some of
the certificates for a single ingress to a stack and the rest to
another stack resulting in an undesired state.

Thirdly it adds rollback complete states to `IsComplete()` to
automatically attempt to update stacks that are in a rollback complete
state.

Fix #176, #175

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>
mikkeloscar added a commit that referenced this pull request Jul 9, 2018
* Remove getter and setters for easier testing

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

* Fix handling limit of certificates per ALB

This fixes a couple of bugs related to exceeding the maximum number of
certificates per ALB.

It limits the max size to 24 instead of 25. This is done because we need
to duplicate the default certificate to work around a CloudFormation bug
(#162) and therefore need one extra space for this limiting the maximum
unique certificates per ALB to 24 instead of the AWS limit of 25.

It also fixes a bug in `AddIngress` which could potentially add some of
the certificates for a single ingress to a stack and the rest to
another stack resulting in an undesired state.

Thirdly it adds rollback complete states to `IsComplete()` to
automatically attempt to update stacks that are in a rollback complete
state.

Fix #176, #175

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

* Vendor with dep

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

* Check for nil stack

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

* Sync load balancer state with stack state

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants