You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Async execution is the default for Upjet since #90 however, the managed reconciler is not optimized for the fact that an operation could take more than a single reconciliation pass. Quoting my message from Slack;
For (2), sync mode is similar to all other controllers and fits better into managed reconciler since it assumes the operations start and end in given function but it’s clear that it doesn’t work for all cases - there are resources whose creation takes more than 15mins. Async mode is made compatible with managed reconciler but it doesn’t perfectly fit because reconciler assumes that operations are contained, for example external-creation-pending is useless in Async mode since creation call never fails because it just triggers the CLI call. However, the main reason that we didn’t use Async for all resources in Terrajet was that it required waiting for the next reconciliation to check whether the async operation completed, hence it’d take at least 1 minute even if the resource is created in a few seconds like VPC. What’s changed since then is that we added a callback mechanism for Async so that operation itself updates the custom resource status when the operation is completed, which could (need to validate) trigger a new reconciliation. If that’s the case, I think we can make it default in Upjet first, have it there for a while and then remove the sync mode.
Another incompatibility comes from the fact that the concurrency limits we put via controller-runtime are ineffective because the reconciliation completes quite quickly whereas the TF call continues in the background, hence you get more reconciliations but the number of TF calls active at any time does not have any limit.
How could Terrajet help solve your problem?
In addition to number of reconciliations, we need to make controller-runtime aware of the number of active TF calls so that it doesn't add new events to the queue if that reached the given limit. If the number of TF calls are not capped, we will see saturation of given resources with the increased number of resources which could lead to undefined behavior.
What problem are you facing?
Async execution is the default for Upjet since #90 however, the managed reconciler is not optimized for the fact that an operation could take more than a single reconciliation pass. Quoting my message from Slack;
Another incompatibility comes from the fact that the concurrency limits we put via controller-runtime are ineffective because the reconciliation completes quite quickly whereas the TF call continues in the background, hence you get more reconciliations but the number of TF calls active at any time does not have any limit.
How could Terrajet help solve your problem?
In addition to number of reconciliations, we need to make controller-runtime aware of the number of active TF calls so that it doesn't add new events to the queue if that reached the given limit. If the number of TF calls are not capped, we will see saturation of given resources with the increased number of resources which could lead to undefined behavior.
This may involve upstream contribution to https://github.com/kubernetes-sigs/controller-runtime/
The text was updated successfully, but these errors were encountered: