[FEAT] Support exponential backoff in the wrapped BackoffAwareScaler type #253

brooksmtownsend · 2024-04-04T18:22:41Z

Inside of the scaler logic we have the notion of a BackoffAwareScaler, which approached exponential backoffs in a very naïve way. Basically, for specific commands that might take a long time (ProviderStarted, ComponentScaled) we would prevent that scaler from sending commands out either for 30 seconds or until we receive an event that is specifically in response to that scaler (for a provider start command, we'd expect for that provider to either start or fail to start with a corresponding event.)

The real problem we're trying to solve here is preventing a scaler from thrashing in response to events that might be relevant. What isn't solved for here is the more generic problem of thrashing in response to events that are relevant. Imagine the scenario where a scaler is attempting to start a Wasm component that is in a private registry, and the wasmCloud host does not have credentials. The scaler publishes the command, the host nearly immediately fails to authenticate, and a component_scale_failed event is emitted. That scaler sees that the component failed to scale, and being the dumb scaler that it is (doesn't look at the error type) immediately tries to restart it. Rust is fast, and we'll be retrying this forever or until someone notices the increased load.

My proposal for this is to have every scaler wrapped in the BackoffAware structure, where external to the scaler logic we can have an internal backoff timer for repeated commands. We want to make sure that the individual scaler is able to reconcile immediately in the case where state is actually modified, but in the case where it's attempting hopelessly to perform the same command over and over we can have an exponential (power of two, Fibonacci, etc) backoff for sending out that next command.

The text was updated successfully, but these errors were encountered:

brooksmtownsend added the enhancement New feature or request label Apr 4, 2024

brooksmtownsend mentioned this issue Apr 4, 2024

Updates to wadm to support wasmCloud 1.0 #247

Merged

brooksmtownsend mentioned this issue Apr 16, 2024

feat: named configuration management #263

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Support exponential backoff in the wrapped BackoffAwareScaler type #253

[FEAT] Support exponential backoff in the wrapped BackoffAwareScaler type #253

brooksmtownsend commented Apr 4, 2024

[FEAT] Support exponential backoff in the wrapped BackoffAwareScaler type #253

[FEAT] Support exponential backoff in the wrapped BackoffAwareScaler type #253

Comments

brooksmtownsend commented Apr 4, 2024