Add autoscaling configuration which considers change in queue length

### Motivation

For example: if `target_replica_concurrency` is 1 and current concurrency is 2, but 10 seconds ago it was 3 and 10 seconds before that it was 4, then it's not necessary to scale up

#### Initial design thoughts

The metric to consider could be a projection of when the concurrency will reach the `target_replica_concurrency`. This could be computed, for example, buy looking at the total in flight over the past `window` period (not the rolling averages), and drawing a best fit line. The configuration would be "don't scale up if it's under-provisioned, but projected to reach `target_replica_concurrency` within X amount of time (default: 0s)". Also support the reverse: "don't scale down if it's over provisioned, but projected to reach `target_replica_concurrency` within X amount of time (default: 0s)".

Question: How to handle changes in the replica count over that period? Perhaps only consider requested replicas (vs live replicas)?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add autoscaling configuration which considers change in queue length #859

Motivation

Initial design thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add autoscaling configuration which considers change in queue length #859

Description

Motivation

Initial design thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions