Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tokio-buf and a BufStream trait #611

Merged
merged 16 commits into from Oct 29, 2018
Merged

Add tokio-buf and a BufStream trait #611

merged 16 commits into from Oct 29, 2018

Conversation

carllerche
Copy link
Member

@carllerche carllerche commented Sep 1, 2018

This is a work in progress.

The description will be updated over time.

TODO

  • SizeHint guarantees
  • Should consume_hint take a u64 or usize?
  • BufStream::limit() tests

@carllerche
Copy link
Member Author

@seanmonstar I have updated size_hint documentation to say that the hint values must be accurate. This should work for Hyper to use the hint for the content-length value?

@carllerche
Copy link
Member Author

The problem, however, is now size_hint is "useless" for gzip stream transformations. When compressing a stream, there is no guarantee as to the resulting size. The stream can even get larger, which means that an upper bound cannot be set.

@carllerche
Copy link
Member Author

Perhaps, size_hint should have an approximate field that is between upper / lower that is used to reserve memory when collecting the stream?

///
/// Once a stream is finished, i.e. `Ready(None)` has been returned, further
/// calls to `poll` may result in a panic or other "bad behavior".
fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be poll_next like the upcoming change to Stream? Or perhaps a different name, since a type could conceivably implement both Stream and BufStream, and if both traits are in scope, so there is no conflict?

/// calls to `poll` may result in a panic or other "bad behavior".
fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error>;

/// Returns the bounds on the remaining length of the iterator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calls this an "iterator"

pub struct SizeHint {
available: usize,
lower: usize,
upper: Option<usize>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For use in hyper to set the content-length, this being a usize does mean that 32bit platforms can't set the content-length automatically if sending over 4GBs. Also, things like implementing BufStream for tokio::fs::File would cause problems there as well, since Metadata::len() returns a u64.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing size_hint to u64 has the interesting property of making collect() fallible.

@carllerche
Copy link
Member Author

@seanmonstar I switched SizeHint to have u64 fields. Doing this exposed a flaw in collect() where large streams could result in overflow situations.

I resolved this by adding an error associated type to FromBufStream. See here.

This ended up complicating the FromBufStream implementation for Vec<u8> as I added handling for overflow. See here.

About half of those changes are handling potential overflow and the other half is factoring SizeHint::upper().

@carllerche
Copy link
Member Author

Then the next question... should consume_hint be u64 or usize?

@carllerche
Copy link
Member Author

I added a limit combinator here.

Now, to collect a body stream, you can do:

let body = body_stream
    .limit(mb_100)
    .collect();

Copy link
Member

@jplatte jplatte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/sneding/sending

@carllerche
Copy link
Member Author

I think I'm going to get rid of SizeHint::available for now. It can be added later and it is not obviously needed.

/// if it could produce data at that point. If it chooses to return
/// `NotReady`, when `consume_hint` is called with a non-zero argument, the
/// task must be notified in order to respect the `poll_buf` contract.
fn consume_hint(&mut self, amount: usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible alternative to this function is passing some hint as an argument to poll_buf. Like, stream.poll_buf(1024 * 16) or something...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here (with h2) is this function would be called separately from reading. When a window update frame is received, it would call consume_hint w/ the total window size.

An argument to poll_buf would also work... but then it would be required to pass something. With a separate function consume_hint, calling it isn't required.

There are pros / cons to both, what are you thinking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another con is that middleware must remember to implement consume_hint as there is a default implementation.

I'm actually wondering if the default implementation should be omitted. Same with size_hint... I would expect it to be rarer to implement vs. use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example of this providing value somewhere? I get the intention, but I'd personally prefer this left off until there's an implementation showing how it can be helpful.

@seanmonstar
Copy link
Member

Notably, removing hyper's optimization for Payload::__hyper_full_data results in (on my machine) ~9% reduction in this pipeline benchmark. This is a private-ish method that allows skipping checking the size-hint and then polling and then checking is_end_stream, opting for just a single method call, when the Body already has the full data available.

@carllerche
Copy link
Member Author

@seanmonstar what is the additional cost from?

@carllerche
Copy link
Member Author

@seanmonstar it seems highly suspect that using BufStream would be slower given that calling size_hint on Once should be extremely fast and you can use the size_hint to only poll once and skip is_end_stream.

@seanmonstar
Copy link
Member

So, some of the slow down I measured was that internally a optimized fast path was not being taken. Having refactored to basically this:

match body.poll_data()? {
    Async::Ready(Some(data)) => {
        if body.is_end_stream() {
            self.write_full_msg(head, data);
        }
    }
}

This has improved it somewhat, but there is still ~5% slowdown compared to when using body.__hyper__full_body. I suspect it's because when using __hyper_full_body, there's no branch to deal with an Err or NotReady, and no second check for is_end_stream.

@carllerche
Copy link
Member Author

@seanmonstar that is pretty surprising and would be worth digging into.

@carllerche
Copy link
Member Author

@seanmonstar I dug in a bit, and it looks like the comparison isn't quite fair. The "slow path" could probably be optimized quite a bit more by avoiding much work in poll_data when the body type is Once.

I did a bunch of experimentation myself, and I saw a 5ns difference between using poll_buf and size_hint as described compared with a similar optimization that you use. This should translate to, at most, an 80ns delta in your pipelined benchmark.

Copy link
Member

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with the caveat that consume_hint should be looked at before publishing 0.1 to crates.io.

@carllerche carllerche merged commit 51e36e4 into master Oct 29, 2018
@carllerche carllerche deleted the tokio-buf branch November 20, 2018 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants