Skip to content

Add autoscaling configuration which considers change in queue length #859

Open
@deliahu

Description

@deliahu

Motivation

For example: if target_replica_concurrency is 1 and current concurrency is 2, but 10 seconds ago it was 3 and 10 seconds before that it was 4, then it's not necessary to scale up

Initial design thoughts

The metric to consider could be a projection of when the concurrency will reach the target_replica_concurrency. This could be computed, for example, buy looking at the total in flight over the past window period (not the rolling averages), and drawing a best fit line. The configuration would be "don't scale up if it's under-provisioned, but projected to reach target_replica_concurrency within X amount of time (default: 0s)". Also support the reverse: "don't scale down if it's over provisioned, but projected to reach target_replica_concurrency within X amount of time (default: 0s)".

Question: How to handle changes in the replica count over that period? Perhaps only consider requested replicas (vs live replicas)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions