Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async execution needs rate limiter #99

Open
muvaf opened this issue Sep 23, 2022 · 0 comments
Open

Async execution needs rate limiter #99

muvaf opened this issue Sep 23, 2022 · 0 comments
Labels
enhancement New feature or request reconciler v2
Milestone

Comments

@muvaf
Copy link
Member

muvaf commented Sep 23, 2022

What problem are you facing?

Async execution is the default for Upjet since #90 however, the managed reconciler is not optimized for the fact that an operation could take more than a single reconciliation pass. Quoting my message from Slack;

For (2), sync mode is similar to all other controllers and fits better into managed reconciler since it assumes the operations start and end in given function but it’s clear that it doesn’t work for all cases - there are resources whose creation takes more than 15mins. Async mode is made compatible with managed reconciler but it doesn’t perfectly fit because reconciler assumes that operations are contained, for example external-creation-pending is useless in Async mode since creation call never fails because it just triggers the CLI call. However, the main reason that we didn’t use Async for all resources in Terrajet was that it required waiting for the next reconciliation to check whether the async operation completed, hence it’d take at least 1 minute even if the resource is created in a few seconds like VPC. What’s changed since then is that we added a callback mechanism for Async so that operation itself updates the custom resource status when the operation is completed, which could (need to validate) trigger a new reconciliation. If that’s the case, I think we can make it default in Upjet first, have it there for a while and then remove the sync mode.

Another incompatibility comes from the fact that the concurrency limits we put via controller-runtime are ineffective because the reconciliation completes quite quickly whereas the TF call continues in the background, hence you get more reconciliations but the number of TF calls active at any time does not have any limit.

How could Terrajet help solve your problem?

In addition to number of reconciliations, we need to make controller-runtime aware of the number of active TF calls so that it doesn't add new events to the queue if that reached the given limit. If the number of TF calls are not capped, we will see saturation of given resources with the increased number of resources which could lead to undefined behavior.

This may involve upstream contribution to https://github.com/kubernetes-sigs/controller-runtime/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request reconciler v2
Projects
None yet
Development

No branches or pull requests

2 participants