Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudformation fails to apply new stacks due max certificates #176

Closed
ese opened this issue Jun 19, 2018 · 3 comments · Fixed by #184
Closed

Cloudformation fails to apply new stacks due max certificates #176

ese opened this issue Jun 19, 2018 · 3 comments · Fixed by #184
Labels

Comments

@ese
Copy link

ese commented Jun 19, 2018

Even including only 25 or less in the cloudformation stack this one fails to apply changes internally.

Once the certificate limit is reached KIAC is able to create a new one ALB stack but if the old one need to change some of their certificates it fails to update. It's common deal with this situation since every new ingress could fit their certificate in the full ALB pushing someother to the new one with spare space.

example:
Imagine a limit of 3 certs
you have one ALB with A[a,b,c] an a new ingress with [d] certificate is created in the cluster.
In the next iteration of the KIAC, detects 2 stacks and start to distribute the ingresses in this stacks with A[d,a,b] and B[c]. This leads make the upgrade of the A stack fails so I guess cloudformation is trying first to add new certs and then delete others.

In addition even having the update stack failed and rollbacked, KIAC updates ingresses statuses with the stacks. In this case statuses of c will point to B while the cert is still in A making a working service to fail

Also once you have the stack in UPDATE_ROLLBACK_COMPLETE status it is not ever touched anymore by KIAC

if !item.certsEqual() && item.stack.IsComplete() {
return update
}

This make it unusable for more than 25 certificates

@ese ese changed the title Cloudformation fails to apply new stacks due max certifications Cloudformation fails to apply new stacks due max certificates Jun 19, 2018
@mikkeloscar
Copy link
Collaborator

Thanks for reporting this. We did some changes like sorting certificates to work around bugs in Cloudformation. I suspect this might have broken the original logic for handling the cases where the limit is reached for one ALB.

mikkeloscar added a commit that referenced this issue Jun 29, 2018
This fixes a couple of bugs related to exceeding the maximum number of
certificates per ALB.

It limits the max size to 24 instead of 25. This is done because we need
to duplicate the default certificate to work around a CloudFormation bug
(#162) and therefore need one extra space for this limiting the maximum
unique certificates per ALB to 24 instead of the AWS limit of 25.

It also fixes a bug in `AddIngress` which could potentially add some of
the certificates for a single ingress to a stack and the rest to
another stack resulting in an undesired state.

Thirdly it adds rollback complete states to `IsComplete()` to
automatically attempt to update stacks that are in a rollback complete
state.

Lastly it removes some of the getter methods from structs and instead
exports the fields that are needed in other packages. This was done to
make it easier to use the structs in tests and because we are not
writing Java.

Fix #176, #175

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>
mikkeloscar added a commit that referenced this issue Jun 29, 2018
This fixes a couple of bugs related to exceeding the maximum number of
certificates per ALB.

It limits the max size to 24 instead of 25. This is done because we need
to duplicate the default certificate to work around a CloudFormation bug
(#162) and therefore need one extra space for this limiting the maximum
unique certificates per ALB to 24 instead of the AWS limit of 25.

It also fixes a bug in `AddIngress` which could potentially add some of
the certificates for a single ingress to a stack and the rest to
another stack resulting in an undesired state.

Thirdly it adds rollback complete states to `IsComplete()` to
automatically attempt to update stacks that are in a rollback complete
state.

Lastly it removes some of the getter methods from structs and instead
exports the fields that are needed in other packages. This was done to
make it easier to use the structs in tests and because we are not
writing Java.

Fix #176, #175

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>
@ese
Copy link
Author

ese commented Jul 3, 2018

More detail on this as it was tried to handle with #184 reducing max certificates to 24, this still lets stack a brittle condition, specially in a high dynamic environments where ingresses are being added and deleted dinamically in an automated way.

What happens

Once you completed an ALB with 24 certificates (and having 1 spare space for future changes), adding ingresses (or deleting) with 2 new certificates in the same time frame between two polling loops, the controller could lead to a broken stack.

As in the first comment imagine a max of 3 certificates per ALB, you have configured your max to 2 and you have one ALB with [c,d] certifcates.
Then 2 new ingresses appear in the cluster with [a,b] certificates and controller start a new loop.
The sort algorithm from the controller sort at the beginning the new certificates [a,b] so send a cloudformation stack update to first ALB with [a,b] which lead cloudformation to fail becouse there are more new certicates to add (2) than spare space (1) in the stack.
Also controllar is creating a new cloudformation stack for a new ALB with [c,d]

What to expect

Whatever the number of ingresses and related certificates changes in the cluster, controller manage properly the ALBs to end with usable infrastructure

Conclusion

Relying on cloudformation as it seems to works it is a little bit tricky hamdle this.

    1. do changes one by one in the cloudformation stacks. This could delay applyings too much. Also too much certificate dance between ALB could lead unnecesary downtimes while DNS and other stuff changes are applied.
    1. Once a certificate lay on one ALB dont move from it in future updates. Pretty much stable in terms of certicate dance but you need to implement extra logic to handle current state and desire state. Also you have take in account when several ingresses dissapear which could lead to move several certifacte to full ALB
    1. Be less efficiency and only add certs to an existing stack when enough spare space is in the ALB for new certificates, if there is no enough spare space just create new stacks.

mikkeloscar added a commit that referenced this issue Jul 8, 2018
This fixes a couple of bugs related to exceeding the maximum number of
certificates per ALB.

It limits the max size to 24 instead of 25. This is done because we need
to duplicate the default certificate to work around a CloudFormation bug
(#162) and therefore need one extra space for this limiting the maximum
unique certificates per ALB to 24 instead of the AWS limit of 25.

It also fixes a bug in `AddIngress` which could potentially add some of
the certificates for a single ingress to a stack and the rest to
another stack resulting in an undesired state.

Thirdly it adds rollback complete states to `IsComplete()` to
automatically attempt to update stacks that are in a rollback complete
state.

Fix #176, #175

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>
mikkeloscar added a commit that referenced this issue Jul 8, 2018
This fixes a couple of bugs related to exceeding the maximum number of
certificates per ALB.

It limits the max size to 24 instead of 25. This is done because we need
to duplicate the default certificate to work around a CloudFormation bug
(#162) and therefore need one extra space for this limiting the maximum
unique certificates per ALB to 24 instead of the AWS limit of 25.

It also fixes a bug in `AddIngress` which could potentially add some of
the certificates for a single ingress to a stack and the rest to
another stack resulting in an undesired state.

Thirdly it adds rollback complete states to `IsComplete()` to
automatically attempt to update stacks that are in a rollback complete
state.

Fix #176, #175

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>
mikkeloscar added a commit that referenced this issue Jul 9, 2018
* Remove getter and setters for easier testing

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

* Fix handling limit of certificates per ALB

This fixes a couple of bugs related to exceeding the maximum number of
certificates per ALB.

It limits the max size to 24 instead of 25. This is done because we need
to duplicate the default certificate to work around a CloudFormation bug
(#162) and therefore need one extra space for this limiting the maximum
unique certificates per ALB to 24 instead of the AWS limit of 25.

It also fixes a bug in `AddIngress` which could potentially add some of
the certificates for a single ingress to a stack and the rest to
another stack resulting in an undesired state.

Thirdly it adds rollback complete states to `IsComplete()` to
automatically attempt to update stacks that are in a rollback complete
state.

Fix #176, #175

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

* Vendor with dep

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

* Check for nil stack

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>

* Sync load balancer state with stack state

Signed-off-by: Mikkel Oscar Lyderik Larsen <m@moscar.net>
@szuecs
Copy link
Member

szuecs commented Jul 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants