Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tower-balance: Observations from testing #286

Open
olix0r opened this issue May 28, 2019 · 14 comments
Open

tower-balance: Observations from testing #286

olix0r opened this issue May 28, 2019 · 14 comments
Assignees
Labels
A-balance Area: The tower "balance" middleware A-ready-cache Area: The tower "ready_cache" middleware C-musing Category: musings about a better world S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. T-middleware Topic: middleware

Comments

@olix0r
Copy link
Collaborator

olix0r commented May 28, 2019

We've recently been running tests on the Linkerd proxy that exercise the load
balancer in larger clusters (of 30+ endpoints). At the same time, I've been
exploring the existing endpoint-weighting scheme.

In doing so, I've realized that the balancer is currently O(n), though it is
intended to be effectively O(1). Furthermore, the existing weighting scheme
is complex to instrument in practice, and is of questionable value in its
current form.

All of this leads to me believe that we should drastically simplify the
balancer:

  1. Do not attempt to unify (Weighted) P2C and Round-Robin under one
    implementation. Each strategy benefits from being able to use its own data
    structure. For now, I propose that we simply drop the Round-Robin logic. It
    can easily be added later if it's desirable.
  2. The balancer cannot be responsible for driving the readiness of all of its
    constituent services. The P2C balancer is intended to sample two
    endpoints. In the current implementation, we always poll all unready
    inner services, which leads to poor behavior as the balancer scales.
    Something like Spawn ready: drives a service's readiness on an executor #283 is necessary to relax the balancer's readiness-polling
    guarantees.
  3. We initially decided that all endpoint-service errors should be treated as fatal
    to the balancer. It now seems more appropriate to let the balancer handle these
    failures by dropping the failed service from the balancer.
  4. The Load and Instrument traits are not balancer-specific and should
    become more generally useful abstractions in a dedicated crate.
  5. The balancer should expose a Layer that layers over inner layers that
    produce Discover-typed results.
  6. The Pool implementation is factored inconveniently, especially in light of how
    tower-layer has evolved: it requires direct access to a balancer implementation,
    accessing its discover field directly. I think that it should probably
    be implemented as a Discover proxy-type that is constructed with a
    Watch<Load>. The pool doesn't rely on any specific balancer behavior, so it
    shouldn't dictate use of a specific balancer implementation.

I have a series of changes that I would like to pursue to this end:

  • Spawn ready: drives a service's readiness on an executor #283 Enables endpoint stacks to be driven to readiness without being actively polled, i.e. by the balancer.
  • Extract tower-load from tower-balance #285 extracts the Load & Instrument traits into a dedicated crates; and removes the current Weight implementation (which is not really what we want).
  • I have another (followup) branch that removes the choose trait/module, leaving only a P2CBalance implementation.
  • I could use some help figuring out the path forward for Pool.
@olix0r olix0r self-assigned this May 28, 2019
@carllerche
Copy link
Member

This plan sounds good to me 👍

@hawkw
Copy link
Member

hawkw commented May 28, 2019

In the current implementation, we always poll all unready
inner services, which leads to poor behavior as the balancer scales.
Something like #283 is necessary to relax the balancer's readiness-polling
guarantees.

I'm a bit curious about the tradeoffs between driving all the inner services in the background and having the balancer select two services at poll_ready, store the selected indices internally, poll them, and then balance over selected services when it's next called?

@olix0r
Copy link
Collaborator Author

olix0r commented May 28, 2019

@hawkw hm? is there something i can explain better? or are you offering to write benchmarks? ;)

@hawkw
Copy link
Member

hawkw commented May 28, 2019

@olix0r nothing in particular, I was just wondering about your thought process. I am interested in doing some benchmarking though, although I'm not sure if it would be practical...

@olix0r
Copy link
Collaborator Author

olix0r commented May 28, 2019

@hawkw imagine a load balancer with 1000 endpoints. Initially, and in generally worst-case (big-o) situations, the balancer will have 1000 unready services. Every time you try to call poll_ready on the balancer, it will have to issue 1000 inner poll_readys. Every time we get a service discovery update, we'll have to poll all of these services. Until all of them ready! That's a lot of polling.

Instead, we can rely on the executor to be much more intelligent about how these inner services are polled, and we can make the balancer's poll_ready much cheaper/simpler in the process:

https://github.com/olix0r/tower/blob/bfb0fd64ea0b514f32a374371ad64dd07980de88/tower-balance/src/lib.rs#L222-L256

@hawkw
Copy link
Member

hawkw commented May 28, 2019

@olix0r In the approach I described, you only poll two inner services on any given poll_ready call on the balancer --- I definitely agree that making the balancer actually constant-time is important, my question was just regarding the approach. What I was describing was essentially moving the selection of the two services to balance over out of call and into poll_ready, and polling only those services. Since the poll_ready contract means we can rely on state set in poll_ready to be there in call, AFAICT this should work, but I suppose it also means you're more likely to exert backpressure. I'm not sure what the consequences of that are in contrast to spawning a future for every service in the balancer.

@olix0r
Copy link
Collaborator Author

olix0r commented May 28, 2019

@hawkw I see. So your question is: "why do we have to drive inner services to readiness at all?" If so, that's a good question: a given TLS connection may need to be polled 4 or 5 times to drive itself through all of the state transitions needed complete a connection. If you just let the balancer poll services at random without driving them to readiness, you may never complete any of these connections (as they would idle out, etc).

@hawkw
Copy link
Member

hawkw commented May 28, 2019

Ah, yeah, that makes sense now that you mention it. Thanks.

olix0r added a commit that referenced this issue May 29, 2019
The tower-balance crate includes the `Load` and `Instrument` traits,
which are likely useful outside of balancers; and certainly have no
tight coupling with any specific balancer implementation. This change
extracts these protocol-agnostic traits into a dedicated crate.

The `Load` trait includes a latency-aware _PeakEWMA_ load strategy as
well as a simple _PendingRequests_ strategy for latency-agnostic
applications.

The `Instrument` trait is used by both of these strategies to track
in-flight requests without knowing protocol details. It is expected that
protocol-specific crates will provide, for instance, HTTP
time-to-first-byte latency strategies.

A default `NoInstrument` implementation tracks the a request until its
response future is satisfied.

This crate should only be published once tower-balance is published.

Part of #286
olix0r added a commit to olix0r/tower that referenced this issue May 30, 2019
As described in tower-rs#286, `Balance` had a few problems:
- it is responsible for driving all inner services to readiness, making
  its `poll_ready` O(n) and not O(1);
- the `choose` abstraction was a hinderance. If a round-robin balancer
  is needed it can be implemented separately without much duplicate
  code; and
- endpoint errors were considered fatal to the balancer.

This changes replaces `Balance` with `P2CBalance` and removes the
`choose` module.

Endpoint service failures now cause the service to be removed from the
balancer gracefully.

Endpoint selection is now effectively constant time, though it biases
for availability in the case when random selection does not yield an
available endpoint.

`tower-test` had to be updated so that a mocked service could fail after
advertising readiness.
olix0r added a commit that referenced this issue Jun 4, 2019
As described in #286, `Balance` had a few problems:
- it is responsible for driving all inner services to readiness, making
  its `poll_ready` O(n) and not O(1);
- the `choose` abstraction was a hinderance. If a round-robin balancer
  is needed it can be implemented separately without much duplicate
  code; and
- endpoint errors were considered fatal to the balancer.

This changes replaces `Balance` with `p2c::Balance` and removes the
`choose` module.

Endpoint service failures now cause the service to be removed from the
balancer gracefully.

Endpoint selection is now effectively constant time, though it biases
for availability in the case when random selection does not yield an
available endpoint.

`tower-test` had to be updated so that a mocked service could fail after
advertising readiness.
@jonhoo
Copy link
Collaborator

jonhoo commented Jul 10, 2019

This issue is also a good roadmap for what would be needed for a 0.1 tower-balance and tower-load release.

I'd be happy to help with Pool (I think I'm also currently the only user of it?) if you can give me a more concrete idea of what changes you'd like to see to it.

@olix0r
Copy link
Collaborator Author

olix0r commented Jul 11, 2019

@jonhoo thanks for pinging this.

To my mind, there are a few issues I want to address before releasing this crate:

  1. Endpoint weighting: I'd like to reintroduce endpoint-weighting. This is something I'm actively looking at.
  2. I think we should figure out how to decouple pool from p2c::Balance. I need to understand pool a bit better to really know what I'm talking about here, but my hunch is that we'd ideally want an aperture balancer which fits a similar pattern but has much better behavior when balancing over large replica sets. As a more reasonable next step, though, I think we could at least make pool generic over the actual balancer -- i.e., probably a Layer that takes a Discover, wraps it, and builds a generic inner service with it. I think it could likely be split into a separate crate (and we may perhaps want to specialize the current crate as balance-p2c or something).
  3. I still have some reservations about the Discover API -- to use it in practice we have a lot of extra wrapper types that seem like they should be pushed down into the balancer implementation. This may end up as an 0.2 change, though...

@jonhoo
Copy link
Collaborator

jonhoo commented Jul 11, 2019

  1. I agree! Pool doesn't actually care what balancer is used behind the scenes, and so abstracting over it seems reasonable. More thoughts on the relation to aperture balancing below.
  2. What wrapper types in particular are you thinking about? Those for Stream and Iterator?

I think it's useful to clear up what Pool is to me (or, at least, how it's used in Noria). It might help us determine the proper future (and perhaps even name) for it. Pool addresses a slightly different issue than something like an aperture balancer does, although the two are somewhat related. Specifically, it attempts to address the fact that a single Service can only be processed by a single thread. For example, if you have one Service that represents one "connection" to your backend server (something like tokio_tower::Client), then that service can only accept as many requests as a single thread can serialize per second, no matter how fast your network is or how many cores you have. In this case, you actually want to spin up another (potentially identical) Service that allows you to process more requests concurrently. Of course the "service service" that generates new Service instances for the pool may choose to establish a connection to a different backend server, but that's not really something Pool cares about.

@olix0r
Copy link
Collaborator Author

olix0r commented Jul 12, 2019

@jonhoo I've started a branch that splits out the core service caching logic out of the balancer. I expect that this could satisfy your pool use cases (which I have some more ideas about thanks to the helpful context you provided), and also the weighted distribution use case we have (which isn't strictly about load balancing, but should take readiness into account). I'm going to pursue this a little further to see if I can work through my pool ideas on this branch so that I can explain them a bit better.

@jonhoo
Copy link
Collaborator

jonhoo commented Mar 31, 2020

@olix0r now that tower-ready-cache has landed, where are we at with this?

@jonhoo jonhoo added A-balance Area: The tower "balance" middleware A-ready-cache Area: The tower "ready_cache" middleware C-musing Category: musings about a better world S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. T-middleware Topic: middleware labels Mar 31, 2020
@jonhoo
Copy link
Collaborator

jonhoo commented May 27, 2020

Re Pool, see #456

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-balance Area: The tower "balance" middleware A-ready-cache Area: The tower "ready_cache" middleware C-musing Category: musings about a better world S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. T-middleware Topic: middleware
Projects
None yet
Development

No branches or pull requests

4 participants