tower-balance: Observations from testing #286

olix0r · 2019-05-28T17:38:16Z

We've recently been running tests on the Linkerd proxy that exercise the load
balancer in larger clusters (of 30+ endpoints). At the same time, I've been
exploring the existing endpoint-weighting scheme.

In doing so, I've realized that the balancer is currently O(n), though it is
intended to be effectively O(1). Furthermore, the existing weighting scheme
is complex to instrument in practice, and is of questionable value in its
current form.

All of this leads to me believe that we should drastically simplify the
balancer:

Do not attempt to unify (Weighted) P2C and Round-Robin under one
implementation. Each strategy benefits from being able to use its own data
structure. For now, I propose that we simply drop the Round-Robin logic. It
can easily be added later if it's desirable.
The balancer cannot be responsible for driving the readiness of all of its
constituent services. The P2C balancer is intended to sample two
endpoints. In the current implementation, we always poll all unready
inner services, which leads to poor behavior as the balancer scales.
Something like Spawn ready: drives a service's readiness on an executor #283 is necessary to relax the balancer's readiness-polling
guarantees.
We initially decided that all endpoint-service errors should be treated as fatal
to the balancer. It now seems more appropriate to let the balancer handle these
failures by dropping the failed service from the balancer.
The Load and Instrument traits are not balancer-specific and should
become more generally useful abstractions in a dedicated crate.
The balancer should expose a Layer that layers over inner layers that
produce Discover-typed results.
The Pool implementation is factored inconveniently, especially in light of how
tower-layer has evolved: it requires direct access to a balancer implementation,
accessing its discover field directly. I think that it should probably
be implemented as a Discover proxy-type that is constructed with a
Watch<Load>. The pool doesn't rely on any specific balancer behavior, so it
shouldn't dictate use of a specific balancer implementation.

I have a series of changes that I would like to pursue to this end:

Spawn ready: drives a service's readiness on an executor #283 Enables endpoint stacks to be driven to readiness without being actively polled, i.e. by the balancer.
Extract tower-load from tower-balance #285 extracts the Load & Instrument traits into a dedicated crates; and removes the current Weight implementation (which is not really what we want).
I have another (followup) branch that removes the choose trait/module, leaving only a P2CBalance implementation.
I could use some help figuring out the path forward for Pool.

The text was updated successfully, but these errors were encountered:

carllerche · 2019-05-28T18:34:44Z

This plan sounds good to me 👍

hawkw · 2019-05-28T22:43:18Z

In the current implementation, we always poll all unready
inner services, which leads to poor behavior as the balancer scales.
Something like #283 is necessary to relax the balancer's readiness-polling
guarantees.

I'm a bit curious about the tradeoffs between driving all the inner services in the background and having the balancer select two services at poll_ready, store the selected indices internally, poll them, and then balance over selected services when it's next called?

olix0r · 2019-05-28T22:47:37Z

@hawkw hm? is there something i can explain better? or are you offering to write benchmarks? ;)

hawkw · 2019-05-28T22:49:51Z

@olix0r nothing in particular, I was just wondering about your thought process. I am interested in doing some benchmarking though, although I'm not sure if it would be practical...

olix0r · 2019-05-28T23:16:46Z

@hawkw imagine a load balancer with 1000 endpoints. Initially, and in generally worst-case (big-o) situations, the balancer will have 1000 unready services. Every time you try to call poll_ready on the balancer, it will have to issue 1000 inner poll_readys. Every time we get a service discovery update, we'll have to poll all of these services. Until all of them ready! That's a lot of polling.

Instead, we can rely on the executor to be much more intelligent about how these inner services are polled, and we can make the balancer's poll_ready much cheaper/simpler in the process:

https://github.com/olix0r/tower/blob/bfb0fd64ea0b514f32a374371ad64dd07980de88/tower-balance/src/lib.rs#L222-L256

hawkw · 2019-05-28T23:20:57Z

@olix0r In the approach I described, you only poll two inner services on any given poll_ready call on the balancer --- I definitely agree that making the balancer actually constant-time is important, my question was just regarding the approach. What I was describing was essentially moving the selection of the two services to balance over out of call and into poll_ready, and polling only those services. Since the poll_ready contract means we can rely on state set in poll_ready to be there in call, AFAICT this should work, but I suppose it also means you're more likely to exert backpressure. I'm not sure what the consequences of that are in contrast to spawning a future for every service in the balancer.

olix0r · 2019-05-28T23:24:16Z

@hawkw I see. So your question is: "why do we have to drive inner services to readiness at all?" If so, that's a good question: a given TLS connection may need to be polled 4 or 5 times to drive itself through all of the state transitions needed complete a connection. If you just let the balancer poll services at random without driving them to readiness, you may never complete any of these connections (as they would idle out, etc).

hawkw · 2019-05-28T23:25:52Z

Ah, yeah, that makes sense now that you mention it. Thanks.

The tower-balance crate includes the `Load` and `Instrument` traits, which are likely useful outside of balancers; and certainly have no tight coupling with any specific balancer implementation. This change extracts these protocol-agnostic traits into a dedicated crate. The `Load` trait includes a latency-aware _PeakEWMA_ load strategy as well as a simple _PendingRequests_ strategy for latency-agnostic applications. The `Instrument` trait is used by both of these strategies to track in-flight requests without knowing protocol details. It is expected that protocol-specific crates will provide, for instance, HTTP time-to-first-byte latency strategies. A default `NoInstrument` implementation tracks the a request until its response future is satisfied. This crate should only be published once tower-balance is published. Part of #286

As described in tower-rs#286, `Balance` had a few problems: - it is responsible for driving all inner services to readiness, making its `poll_ready` O(n) and not O(1); - the `choose` abstraction was a hinderance. If a round-robin balancer is needed it can be implemented separately without much duplicate code; and - endpoint errors were considered fatal to the balancer. This changes replaces `Balance` with `P2CBalance` and removes the `choose` module. Endpoint service failures now cause the service to be removed from the balancer gracefully. Endpoint selection is now effectively constant time, though it biases for availability in the case when random selection does not yield an available endpoint. `tower-test` had to be updated so that a mocked service could fail after advertising readiness.

As described in #286, `Balance` had a few problems: - it is responsible for driving all inner services to readiness, making its `poll_ready` O(n) and not O(1); - the `choose` abstraction was a hinderance. If a round-robin balancer is needed it can be implemented separately without much duplicate code; and - endpoint errors were considered fatal to the balancer. This changes replaces `Balance` with `p2c::Balance` and removes the `choose` module. Endpoint service failures now cause the service to be removed from the balancer gracefully. Endpoint selection is now effectively constant time, though it biases for availability in the case when random selection does not yield an available endpoint. `tower-test` had to be updated so that a mocked service could fail after advertising readiness.

jonhoo · 2019-07-10T23:27:05Z

This issue is also a good roadmap for what would be needed for a 0.1 tower-balance and tower-load release.

I'd be happy to help with Pool (I think I'm also currently the only user of it?) if you can give me a more concrete idea of what changes you'd like to see to it.

olix0r · 2019-07-11T01:49:17Z

@jonhoo thanks for pinging this.

To my mind, there are a few issues I want to address before releasing this crate:

Endpoint weighting: I'd like to reintroduce endpoint-weighting. This is something I'm actively looking at.
I think we should figure out how to decouple pool from p2c::Balance. I need to understand pool a bit better to really know what I'm talking about here, but my hunch is that we'd ideally want an aperture balancer which fits a similar pattern but has much better behavior when balancing over large replica sets. As a more reasonable next step, though, I think we could at least make pool generic over the actual balancer -- i.e., probably a Layer that takes a Discover, wraps it, and builds a generic inner service with it. I think it could likely be split into a separate crate (and we may perhaps want to specialize the current crate as balance-p2c or something).
I still have some reservations about the Discover API -- to use it in practice we have a lot of extra wrapper types that seem like they should be pushed down into the balancer implementation. This may end up as an 0.2 change, though...

jonhoo · 2019-07-11T02:01:16Z

I agree! Pool doesn't actually care what balancer is used behind the scenes, and so abstracting over it seems reasonable. More thoughts on the relation to aperture balancing below.
What wrapper types in particular are you thinking about? Those for Stream and Iterator?

I think it's useful to clear up what Pool is to me (or, at least, how it's used in Noria). It might help us determine the proper future (and perhaps even name) for it. Pool addresses a slightly different issue than something like an aperture balancer does, although the two are somewhat related. Specifically, it attempts to address the fact that a single Service can only be processed by a single thread. For example, if you have one Service that represents one "connection" to your backend server (something like tokio_tower::Client), then that service can only accept as many requests as a single thread can serialize per second, no matter how fast your network is or how many cores you have. In this case, you actually want to spin up another (potentially identical) Service that allows you to process more requests concurrently. Of course the "service service" that generates new Service instances for the pool may choose to establish a connection to a different backend server, but that's not really something Pool cares about.

olix0r · 2019-07-12T15:11:41Z

@jonhoo I've started a branch that splits out the core service caching logic out of the balancer. I expect that this could satisfy your pool use cases (which I have some more ideas about thanks to the helpful context you provided), and also the weighted distribution use case we have (which isn't strictly about load balancing, but should take readiness into account). I'm going to pursue this a little further to see if I can work through my pool ideas on this branch so that I can explain them a bit better.

jonhoo · 2020-03-31T21:40:44Z

@olix0r now that tower-ready-cache has landed, where are we at with this?

jonhoo · 2020-05-27T12:35:53Z

Re Pool, see #456

olix0r self-assigned this May 28, 2019

olix0r mentioned this issue May 28, 2019

balance: Internally represent weights as integers #282

Closed

This was referenced May 30, 2019

balance: Specialize the balancer for P2C #288

Merged

balance: Investigate FuturesUnordered-style polling for inner services #292

Closed

olix0r mentioned this issue Jul 12, 2019

WIP RFC: tower-ready-cache #302

Closed

samvrlewis mentioned this issue Sep 30, 2022

Weighted balance::p2c #696

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tower-balance: Observations from testing #286

tower-balance: Observations from testing #286

olix0r commented May 28, 2019

carllerche commented May 28, 2019

hawkw commented May 28, 2019

olix0r commented May 28, 2019

hawkw commented May 28, 2019 •

edited

olix0r commented May 28, 2019

hawkw commented May 28, 2019 •

edited

olix0r commented May 28, 2019

hawkw commented May 28, 2019

jonhoo commented Jul 10, 2019

olix0r commented Jul 11, 2019 •

edited

jonhoo commented Jul 11, 2019

olix0r commented Jul 12, 2019

jonhoo commented Mar 31, 2020

jonhoo commented May 27, 2020

tower-balance: Observations from testing #286

tower-balance: Observations from testing #286

Comments

olix0r commented May 28, 2019

carllerche commented May 28, 2019

hawkw commented May 28, 2019

olix0r commented May 28, 2019

hawkw commented May 28, 2019 • edited

olix0r commented May 28, 2019

hawkw commented May 28, 2019 • edited

olix0r commented May 28, 2019

hawkw commented May 28, 2019

jonhoo commented Jul 10, 2019

olix0r commented Jul 11, 2019 • edited

jonhoo commented Jul 11, 2019

olix0r commented Jul 12, 2019

jonhoo commented Mar 31, 2020

jonhoo commented May 27, 2020

hawkw commented May 28, 2019 •

edited

hawkw commented May 28, 2019 •

edited

olix0r commented Jul 11, 2019 •

edited