Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an examples that demonstrates p2c rr behavior #39

Merged
merged 3 commits into from
Jan 25, 2018
Merged

Conversation

olix0r
Copy link
Collaborator

@olix0r olix0r commented Jan 25, 2018

The new demo example sends a million simulated requests through each load balancer configuration and records the observed latency distributions.

Furthermore, this fixes a critical bug in Balancer, where we did not properly iterate through not-ready nodes.

:; cargo run --example p2c
   Compiling tower-balance v0.1.0 (file:///Users/ver/b/rs/tower/tower-balance)
    Finished dev [unoptimized + debuginfo] target(s) in 2.55 secs
     Running `/Users/ver/b/rs/tower/target/debug/examples/p2c`
p2c samples: 1000000
p2c p50:  454
p2c p90:  498
p2c p95:  522
p2c p99:  1984
p2c p999: 2034
rr samples: 1000000
rr p50:  457
rr p90:  1944
rr p95:  1971
rr p99:  2006
rr p999: 2038

The new _demo_ example sends a million simulated requests through each
load balancer configuration and records theobserved latency
distributions.

Furthermore, this fixes a critcal bug in Balancer, where we did not
properly iterate through not-ready nodes.
@hawkw
Copy link
Member

hawkw commented Jan 25, 2018

FWIW, I'm pretty sure that the bug you mentioned above also causes some test failures in conduit-proxy's discovery module, after updating to the latest tower master branch:

     Running target/debug/deps/discovery-0a4fa203212e3238

running 3 tests
test outbound_times_out ... ignored
thread 'support proxy' panicked at 'attempt to subtract with overflow', /Users/eliza/.cargo/git/checkouts/tower-b098c32cf5a1bcca/4bedc52/tower-balance/src/lib.rs:149:20
note: Run with `RUST_BACKTRACE=1` for a backtrace.
thread 'support proxy' panicked at 'attempt to subtract with overflow', /Users/eliza/.cargo/git/checkouts/tower-b098c32cf5a1bcca/4bedc52/tower-balance/src/lib.rs:149:20
test outbound_asks_controller_api ... FAILED
test outbound_reconnects_if_controller_stream_ends ... FAILED

failures:

---- outbound_asks_controller_api stdout ----
        thread 'outbound_asks_controller_api' panicked at 'called `Result::unwrap()` on an `Err`value: Error { kind: Inner(Error { kind: Io(Error { repr: Kind(BrokenPipe) }) }) }', src/libcore/result.rs:906:4

---- outbound_reconnects_if_controller_stream_ends stdout ----
        thread 'outbound_reconnects_if_controller_stream_ends' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Inner(Error { kind: Io(Error { repr: Kind(BrokenPipe) }) }) }', src/libcore/result.rs:906:4


failures:
    outbound_asks_controller_api
    outbound_reconnects_if_controller_stream_ends

test result: FAILED. 0 passed; 2 failed; 1 ignored; 0 measured; 0 filtered out

@hawkw
Copy link
Member

hawkw commented Jan 25, 2018

Note that the panics I mentioned in #39 (comment) both occurred at /tower-balance/src/lib.rs:149:20; that's this line in the diff; so I'm pretty sure the bug fixed here is implicated. :)

I vote we try and fast-track getting this merged --- I'd like to test Conduit against this branch, but changing a tower dependency sends us into dependency hell really fast.

@hawkw
Copy link
Member

hawkw commented Jan 25, 2018

Update: this appears to fix the panic I mentioned, but the same tests now appear to hang:

     Running target/debug/deps/discovery-5af105dcfe5db014

running 3 tests
test outbound_times_out ... ignored
test outbound_asks_controller_api has been running for over 60 seconds
test outbound_reconnects_if_controller_stream_ends has been running for over 60 seconds

It's unclear whether or not that's a bug in tower-balance, or in Conduit, though. I'll continue looking into it.

EDIT: doing some sleuthing and it looks like a Conduit bug, but this still remains to be determined.

// Iterate through the not-ready endpoints from right to left to prevent removals
// from reordering services in a way that could prevent a service from being polled.
for idx in self.not_ready.len()-1..0 {
for offset in 1..n {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be....

for idx in (1..n).iter().rev

@olix0r olix0r merged commit 777888d into master Jan 25, 2018
@olix0r olix0r deleted the ver/balance-eg branch January 25, 2018 20:51
for offset in 1..n {
let idx = n - offset;

for idx in (0..n-1).rev() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olix0r I don't think this change is correct --- I stopped seeing the integer overflow in Conduit tests when building with this branch as of commit fb08434, but after pointing back at master as of 777888d, I'm getting the overflow again:

running 3 tests
test outbound_times_out ... ignored
thread 'support proxy' panicked at 'thread 'attempt to subtract with overflowsupport proxy', ' panicked at '/Users/eliza/.cargo/git/checkouts/tower-b098c32cf5a1bcca/777888d/tower-balance/src/lib.rsattempt to subtract with overflow:', 154/Users/eliza/.cargo/git/checkouts/tower-b098c32cf5a1bcca/777888d/tower-balance/src/lib.rs::24154
:note: Run with `RUST_BACKTRACE=1` for a backtrace.
24
test outbound_reconnects_if_controller_stream_ends ... FAILED
test outbound_asks_controller_api ... FAILED

failures:

---- outbound_reconnects_if_controller_stream_ends stdout ----
	thread 'outbound_reconnects_if_controller_stream_ends' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Inner(Error { kind: Io(Error { repr: Kind(BrokenPipe) }) }) }', src/libcore/result.rs:906:4

---- outbound_asks_controller_api stdout ----
	thread 'outbound_asks_controller_api' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: Inner(Error { kind: Io(Error { repr: Kind(BrokenPipe) }) }) }', src/libcore/result.rs:906:4


failures:
    outbound_asks_controller_api
    outbound_reconnects_if_controller_stream_ends

test result: FAILED. 0 passed; 2 failed; 1 ignored; 0 measured; 0 filtered out

hawkw added a commit that referenced this pull request Jan 25, 2018
An usize overflow can occur in `Balance::promote_to_ready` when `self.not_ready` has length 0.

See my comment here: #39 (comment)

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added a commit that referenced this pull request Jan 25, 2018
An usize overflow can occur in `Balance::promote_to_ready` when `self.not_ready` has length 0.

See my comment here: #39 (comment)

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
hawkw added a commit that referenced this pull request Jan 26, 2018
The `..` syntax creates a _half-open_ range (see https://doc.rust-lang.org/std/ops/struct.Range.html), so all that messing about with `n-1` in #39 and #40 was never actually necessary. This actually fixes the Conduit test I mentioned in #39 (comment); it no longer hangs.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants