-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: RFC #1858 - 2020-04-06 - Automatic rate limit adjustment #2329
Changes from 2 commits
106f889
d310d0f
24d23cb
4503e03
639cb79
7b08132
3d322f1
92fa0bb
d960d69
07fae10
20c2167
6779df3
8e44ac6
d142eba
0feac28
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,161 @@ | ||||||||||||||
# RFC 1858 - 2020-04-06 - Automatically Adjust Limits | ||||||||||||||
|
||||||||||||||
This RFC proposes a new scheme for rate limiting requests to external | ||||||||||||||
services in order to maximize the sustained transmission rate over | ||||||||||||||
varying conditions. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
## Motivation | ||||||||||||||
|
||||||||||||||
Vector users commonly run into the problem of internal service rate | ||||||||||||||
limiting. This is not an external service limiting receiving data from | ||||||||||||||
us but our own rate limiting based on the `request` | ||||||||||||||
parameters. Alternatively, users can run into the opposite problem of | ||||||||||||||
overwhelming a downstream service to the point where it becomes | ||||||||||||||
unresponsive and starts queueing requests. Instead, Vector should be | ||||||||||||||
automatically rate limiting its requests to maximally fill the service's | ||||||||||||||
capacity without overwhelming it and causing additional problems. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
|
||||||||||||||
Most sinks in Vector have their request structure managed by the [tower | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
crate](https://github.com/tower-rs/tower). This service builder allows | ||||||||||||||
for setting how requests are sent to remote sinks. In particular, Vector | ||||||||||||||
fixes the number of requests that may be simultaneously in flight (AKA | ||||||||||||||
the concurrency limit), the maximum rate at which requests may be sent, | ||||||||||||||
expressed in terms of the number of requests over some time interval | ||||||||||||||
(AKA the rate limit number and duration). | ||||||||||||||
|
||||||||||||||
Many of these parameters _must_ be adjusted by the Vector administrator | ||||||||||||||
bruceg marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
to maximize the throughput for each service. For high volume sites, this | ||||||||||||||
can require considerable trial and error experimentation before a | ||||||||||||||
satisfactory set of parameters can be achieved. More importantly, | ||||||||||||||
changes in the service parameters, whether in terms of available | ||||||||||||||
processing power, bandwidth, or changes to latency, such as is caused by | ||||||||||||||
link congestion, or the number of other agents delivering to the same | ||||||||||||||
sink, can cause the same settings to now impede utilization rather than | ||||||||||||||
improve it. | ||||||||||||||
|
||||||||||||||
Since many of these factors that affect delivery rate are not fixed | ||||||||||||||
values but instead will vary considerably throughout the life of a | ||||||||||||||
service, it is impossible to choose the "best" parameters that will fit | ||||||||||||||
all these conditions. Instead, Vector should adopt an approach that | ||||||||||||||
allows for dynamic adjustment of the underlying parameters based on the | ||||||||||||||
current conditions. | ||||||||||||||
|
||||||||||||||
When service limits are reached, Vector will experience a number of | ||||||||||||||
undesirable phenomenon, notably request latency, timeouts, and | ||||||||||||||
deferrals. These all decrease the overall flow rate while actually | ||||||||||||||
increasing actual bandwidth usage. | ||||||||||||||
|
||||||||||||||
## Guide-level Proposal | ||||||||||||||
|
||||||||||||||
There are two levels of controls we have in play—static and dynamic. The | ||||||||||||||
existing controls are static, and best describe service limits, such as | ||||||||||||||
maximum allowed request rates. What is needed is a set of dynamic | ||||||||||||||
controls that adapt to the underlying conditions and scale the service | ||||||||||||||
utilization on the fly. | ||||||||||||||
|
||||||||||||||
Since the controls under consideration are dependant on some form of | ||||||||||||||
queueing, the controls will be inserted at the same level as | ||||||||||||||
`TowerRequestSettings`. The existing rate limit controls will remain in | ||||||||||||||
bruceg marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
order to provide a hard upper bound on service utilization (for example, | ||||||||||||||
to prevent over-use violations), but will be dynamically bounded by | ||||||||||||||
adjusting the concurrency. | ||||||||||||||
|
||||||||||||||
The underlying control will replace the `tower::limit::ConcurrencyLimit` | ||||||||||||||
layer with a new custom layer that dynamically adjusts the concurrency | ||||||||||||||
limit based on current conditions. It will track each request's result | ||||||||||||||
status (ie success, or deferral) and the round trip time (RTT). This | ||||||||||||||
will require a modified `tower::limit::Limit` structure that will add | ||||||||||||||
and remove permits as needed. A new `ResponseFuture` will forward the | ||||||||||||||
result of the request back to the invoking `ConcurrencyLimit` after a | ||||||||||||||
completed `poll` (in addition to the usual action of releasing the | ||||||||||||||
permit on `drop`). | ||||||||||||||
|
||||||||||||||
The algorithm used to control the limit will follow the AIMD framework: | ||||||||||||||
|
||||||||||||||
* The controller will maintain an moving average RTT of past requests | ||||||||||||||
using an exponentially weighted moving average (EWMA). The weighting | ||||||||||||||
(α) is to be experimentally determined. | ||||||||||||||
|
||||||||||||||
* The current response's RTT is compared to this moving average: | ||||||||||||||
|
||||||||||||||
* If less than or equal to the average, the concurrency will be | ||||||||||||||
incremented by one (additive increase) up to a maximum of the in | ||||||||||||||
flight limit. | ||||||||||||||
|
||||||||||||||
* If greater than the average, or the result was a failure of any | ||||||||||||||
bruceg marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
kind, the concurrency will be reduced by a factor of one half | ||||||||||||||
(multiplicative decrease) down to a minimum of one. | ||||||||||||||
|
||||||||||||||
## Prior Art | ||||||||||||||
|
||||||||||||||
* [TCP congestion control algorithms](https://en.wikipedia.org/wiki/TCP_congestion_control) | ||||||||||||||
* [Additive Increase/Multiplicative Decrease](https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease) | ||||||||||||||
* [Netflix Technology Blog: Performance Under Load](https://medium.com/@NetflixTechBlog/performance-under-load-3e6fa9a60581) | ||||||||||||||
* [JINSPIRED - Adaptive Safety Control (archive.org)](https://web.archive.org/web/20130105023839/http://www.jinspired.com/site/jxinsight-opencore-6-4-ea-11-released-adaptive-safety-control) | ||||||||||||||
|
||||||||||||||
## Sales Pitch | ||||||||||||||
|
||||||||||||||
This proposal: | ||||||||||||||
|
||||||||||||||
* provides a simple and understandable mechanism for varying resource | ||||||||||||||
utilization of sinks; | ||||||||||||||
|
||||||||||||||
* adapts an existing design to avoid reinventing known good solutions; | ||||||||||||||
bruceg marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
|
||||||||||||||
* is minimally invasive to the existing code base while applying to most | ||||||||||||||
sinks; | ||||||||||||||
|
||||||||||||||
* minimizes the amount of configuration required to produce the ideal | ||||||||||||||
(most efficient and performant) configuration; and | ||||||||||||||
|
||||||||||||||
* does not impose hard limits on flow rates while respecting configured | ||||||||||||||
limits. | ||||||||||||||
|
||||||||||||||
## Drawbacks | ||||||||||||||
|
||||||||||||||
Since the underlying parameters that control when requests are throttled | ||||||||||||||
will be abstracted behind an additional layer, it will become harder to | ||||||||||||||
reason about the causes of bandwidth limits. | ||||||||||||||
|
||||||||||||||
## Rationale | ||||||||||||||
|
||||||||||||||
* As referenced earlier, the proposed mechanism borrows from _proven_ | ||||||||||||||
mechanisms designed to manage flow control under varying conditions, | ||||||||||||||
making it a good choice for the first pass implementation. | ||||||||||||||
|
||||||||||||||
* A moving average is used to smooth out small variations in latency | ||||||||||||||
without completely ignoring them. | ||||||||||||||
|
||||||||||||||
* EWMA is chosen as an averaging mechanism as it avoids having to | ||||||||||||||
maintain memory of past observations beyond a single | ||||||||||||||
value. Mathematically it is the simplest possible moving average. | ||||||||||||||
|
||||||||||||||
## Alternatives | ||||||||||||||
|
||||||||||||||
* Instead of managing the concurrency, we could alter the maximum | ||||||||||||||
request rate or maximum bandwidth usage. This runs into the difficulty | ||||||||||||||
of how to set the minimum bound both before any data has been | ||||||||||||||
collected and after hard backoffs, while concurrency has a trivially | ||||||||||||||
obvious minimum bound and is better able to flex with load. | ||||||||||||||
|
||||||||||||||
* Instead of comparing the RTT against a moving average, we could simply | ||||||||||||||
use the previous observation (mathematically equivalent to a EWMA | ||||||||||||||
weighting of α=1). | ||||||||||||||
|
||||||||||||||
## Outstanding Questions | ||||||||||||||
|
||||||||||||||
* The ideal value for the weighting α is unknown. Too large a value will | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably start with a low alpha and let the user customize this. I think that is what makes sense, with a low enough alpha we should be able to eventually step up to the "real" value. |
||||||||||||||
amplify the effect of short term changes to RTT. Too small a value may | ||||||||||||||
delay responding to real changes excessively. | ||||||||||||||
|
||||||||||||||
* Some experimentation may be required to determine a small zone around | ||||||||||||||
the average that is still considered "equal" to avoid excessive | ||||||||||||||
flapping of the concurrency level without allowing the RTT to grow | ||||||||||||||
unbounded and overload the sink. | ||||||||||||||
|
||||||||||||||
## Plan Of Attack | ||||||||||||||
|
||||||||||||||
* [ ] Submit a PR with spike-level code _roughly_ demonstrating the change. | ||||||||||||||
* [ ] Benchmark the approach under various conditions to determine a good | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This, I think, needs much more detail. How exactly do you plan to test/benchmark this change to ensure it is:
For example. Starting simple, it would help to define a set of scenarios and their ideal outcome. We could then replay these scenarios to verify behavior as we hone in on the optimal algorithm. In other words, tests 😄 . But I think we should agree on those test criteria within this RFC. Ex:
It would be nice to have these static expectations to keep us honest. Beyond that, I think we would benefit from a randomized simulation of some sort. For example, we could define criteria that would cause a mock service to become unresponsive (shut down) and the test fails if the service reaches that state. The service would implement a rate-limit that randomly adjusts, and if the algorithm works, Vector should correctly hone in on that rate-limit as it adjusts. This should all be done within defined bounds that, if exceeded, would shut down the service. Easier said than done! I'm happy to explain better. And I have to assume there is prior art for this kind of testing. If not, it would make for excellent blog post! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree this needs a more concrete testing and benchmarking plan, and these are all good ideas. Does all of this detail really belong in this implementation plan, or am I free to spread it elsewhere and just reference it here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not at all. Feel free to add a section or add it under the "Guide Level Proposal" section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And to clarify the "Plan of Attack", it should be progressive and outline an optimal plan. It does not mean we will do all of the items. For example, if a simulator is part of the plan, I would expect that to be at the end unless it's absolutely necessary. It's possible we won't do it depending on how everything comes together. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW condition 3 above doesn't actually compute—the rate limit and concurrency are different units. A rate limit of 5 requests per second might be achieved with any number of different concurrency levels. It depends on what the RTT is between Vector and the service. More precisely, Vector's concurrency should approximate |
||||||||||||||
value for α. | ||||||||||||||
* [ ] ……… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make it clear that this RFC focuses on HTTP based sinks only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, and it's worth clarifying that this doesn't apply universally, but the HTTP part seems accidental. It is documented to apply at the level of the
tower
crate service builder, so any service that uses that framework would have this capability. Now, it happens that all of our HTTP services use the tower framework, and vice versa, but it isn't necessarily so.