Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

balance: Only balance over ready endpoints #293

Merged
merged 10 commits into from
Jul 6, 2019

Conversation

olix0r
Copy link
Collaborator

@olix0r olix0r commented Jun 12, 2019

In 03ec4aa, the balancer was changed to make a quick endpoint decision.
This, however, means that the balancer can return NotReady when it does
in fact have a ready endpoint. Furthermore, testing shows that while
this approach does have about 10% beter CPU utilization, it has about
25% worse throughput when communicating with a busy service.

This changes the balancer to segregate unready endpoints, only
performing p2c over ready endpoints. Unready endpoints are tracked with
a FuturesUnordered that supports premature eviction via oneshots.

The main downside of this change is that the Balancer must become
generic over the Request type.

Fixes #292

In 03ec4aa, the balancer was changed to make a quick endpoint decision.
This, however, means that the balancer can return NotReady when it does
in fact have a ready endpoint. Furthermore, testing shows that while
this approach does have about 10% beter CPU utilization, it has about
25% worse throughput when communicating with a busy service.

This changes the balancer to segregate unready endpoints, only
performing p2c over ready endpoints. Unready endpoints are tracked with
a FuturesUnordered that supports premature eviction via oneshots.

The main downside of this change is that the Balancer must become
generic over the Request type.

Fixes tower-rs#292
@olix0r olix0r self-assigned this Jun 12, 2019
Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strategy seems good to me. I left one inline comment re: naming that caused confusion while I was reading, but afaik, the strategy is sound. We can improve the details of the impl once we update to std future as how tasks are tracked will change.

@carllerche carllerche requested a review from jonhoo June 14, 2019 22:05
@carllerche
Copy link
Member

Also, ccing @jonhoo

tower-balance/Cargo.toml Outdated Show resolved Hide resolved
@jonhoo
Copy link
Collaborator

jonhoo commented Jul 1, 2019

Sorry it took me so long to get to this @olix0r! This looks like an excellent change, though I do have some questions I'd like to resolve before merging. Most of them are around adding comments for some of the more complex interactions (and to check my own understanding).

jonhoo added a commit to jonhoo/tracing-timing that referenced this pull request Jul 1, 2019
@jonhoo
Copy link
Collaborator

jonhoo commented Jul 2, 2019

@olix0r depending on how busy you are, I can also take on pushing this over the finish line as I'd like to have it for some work-related stuff :)

@olix0r
Copy link
Collaborator Author

olix0r commented Jul 2, 2019

@jonhoo I should have a chance today to push another iteration. Thanks, though!

@olix0r
Copy link
Collaborator Author

olix0r commented Jul 5, 2019

@jonhoo I do take your general point about this polling perhaps excessively. I do think it's important for us to eagerly poll a selected endpoint immediately before a request is dispatched to it, but i do think we can accomplish this with a slightly more relaxed strategy. Let me try something...

@jonhoo
Copy link
Collaborator

jonhoo commented Jul 5, 2019

I'd also be okay with merging this as-is now and then do optimizations in a follow-up PR. Up to you :)

@olix0r
Copy link
Collaborator Author

olix0r commented Jul 5, 2019

@jonhoo PTAL; that change allowed us to remove some of the uglier code in there (repairing indexes).

Copy link
Collaborator

@jonhoo jonhoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that looks great!

@olix0r olix0r merged commit 18b30eb into tower-rs:master Jul 6, 2019
@olix0r olix0r deleted the ver/balance-ready branch July 6, 2019 03:46
olix0r added a commit to linkerd/linkerd2-proxy that referenced this pull request Jul 11, 2019
tower-rs/tower#288 and tower-rs/tower#293 changed the Balance
implementation substantially. Now, a new layer, `spawn_ready` is
inserted around endpoint stack to ensure that readiness is driven
on a background task.

In support of this change, the `pending` layer was removed from the
endpoint stack and, instead, the discovery system is now responsible for
driving pending services to be materialized.
olix0r added a commit to olix0r/tower that referenced this pull request Jul 18, 2019
In tower-rs#293, `balance` was refactored to manage dispatching requests over a
set of equivalent inner services that may or may not be ready.

This change extracts the core logic of managing a cache of ready
services into a dedicated crate, leaving the balance crate to deal with
node selection.

`ReadyCache` provides a default `Service` implementation, though this is
just provided as convenience. The `Balance` service, for instance, uses
`ReadyCache`'s APIs directly so that endpoint failure and endpoint
exhaustion can be handled by the balancer.
olix0r added a commit to olix0r/tower that referenced this pull request Jul 18, 2019
In tower-rs#293, `balance` was refactored to manage dispatching requests over a
set of equivalent inner services that may or may not be ready.

This change extracts the core logic of managing a cache of ready
services into a dedicated crate, leaving the balance crate to deal with
node selection.

`ReadyCache` provides a default `Service` implementation, though this is
just provided as convenience. The `Balance` service, for instance, uses
`ReadyCache`'s APIs directly so that endpoint failure and endpoint
exhaustion can be handled by the balancer.
olix0r added a commit to olix0r/tower that referenced this pull request Jul 18, 2019
In tower-rs#293, `balance` was refactored to manage dispatching requests over a
set of equivalent inner services that may or may not be ready.

This change extracts the core logic of managing a cache of ready
services into a dedicated crate, leaving the balance crate to deal with
node selection.

`ReadyCache` provides a default `Service` implementation, though this is
just provided as convenience. The `Balance` service, for instance, uses
`ReadyCache`'s APIs directly so that endpoint failure and endpoint
exhaustion can be handled by the balancer.
olix0r added a commit that referenced this pull request Nov 12, 2019
In #293, `balance` was refactored to manage dispatching requests over a
set of equivalent inner services that may or may not be ready.

This change extracts the core logic of managing a cache of ready
services into a dedicated crate, leaving the balance crate to deal with
node selection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

balance: Investigate FuturesUnordered-style polling for inner services
3 participants