Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix #2147: mostly zero-copy writes and reads in IOStream #2166

Closed
wants to merge 7 commits into from

Conversation

pitrou
Copy link
Contributor

@pitrou pitrou commented Oct 17, 2017

Improves performance by 45% on the following benchmark: #2147 (comment)

Caveats:

  • small reads and writes can be 10% slower
  • read_until and read_until_regex can be 10 to 20% slower

If we implement the core of StreamBuffer in C, we can probably make performance even better.

@pitrou pitrou mentioned this pull request Oct 17, 2017
@pitrou pitrou force-pushed the large_buffers2 branch 2 times, most recently from 1aa6093 to 76e5e5d Compare October 17, 2017 16:06
@pitrou pitrou changed the title Fix #2147: mostly zero-copy writes and reads in IOStream [WIP] Fix #2147: mostly zero-copy writes and reads in IOStream Oct 17, 2017
@mrocklin
Copy link
Contributor

What are you using to benchmark the small-messages case? Is the 10-20% performance hit after optimization? I'm curious if we can reduce this to make a change like this more palatable to Tornado devs.

@pitrou
Copy link
Contributor Author

pitrou commented Oct 18, 2017

I'm using those two scripts and running them with time: https://gist.github.com/pitrou/15ffce51f709301652bbff212e93ef4f

@pitrou
Copy link
Contributor Author

pitrou commented Oct 18, 2017

I'm curious if we can reduce this to make a change like this more palatable to Tornado devs.

Optimizing this in pure Python looks difficult, as there are so many different ways of reading from an IOStream (read_bytes, read_until, read_until_close, callbacks and streaming callbacks...).

Perhaps writing the StreamBuffer class in C can gain enough speed to make up for the performance hit.

@mrocklin
Copy link
Contributor

also cc @ajdavis

@mrocklin
Copy link
Contributor

@bdarnell what is your tolerance for slowdown in the small message case and what is your tolerance for C?

Also, do you have benchmarks that we can work against that you perceive to be representative of Tornado's common case workload?

@pitrou
Copy link
Contributor Author

pitrou commented Oct 18, 2017

It occurs to me that I could post a smaller PR with only the write() changes, which would probably minimize the slowdown on small messages.

@bdarnell
Copy link
Member

My current policy for Tornado is that the use of C must be optional because mac and windows users often don't have C tools installed (we do publish wheels now for python 3.5+ on windows; maybe it's time to build out a more comprehensive wheel-building pipeline). As a consequence of that, any use of C must be kept small so there's not much duplication (and room for bugs) between C and the pure-python fallback.

Cython is an interesting possibility for generating a fast implementation from (annotated) python. It should also help avoid the wild pointer bugs that can plague C code.

I care more about HTTP performance than raw TCP. The only benchmark I really have (which isn't great) is demos/benchmark/benchmark.py. That's an extreme hello-world case, but it does reflect that I'm more interested in fairly small messages than large ones. Paying a 10-20% price doesn't sound great to me (although I think that number would come down a bit when you're looking at HTTP instead of TCP).

If we're going to have to keep the existing implementation for python 2, then maybe we could also punt on the problem and expose an option to optimize for either large or small messages.

@pitrou
Copy link
Contributor Author

pitrou commented Oct 19, 2017

I care more about HTTP performance than raw TCP. The only benchmark I really have (which isn't great) is demos/benchmark/benchmark.py.

Thanks. I get 1890 req/s on master, and 1820 req/s on this PR. That's looking like a 4% slowdown, which is better than I thought.

If we're going to have to keep the existing implementation for python 2, then maybe we could also punt on the problem and expose an option to optimize for either large or small messages.

I'm not sure what you mean by that. Do you only care about performance on Python 2, or do you imply that Python 3-only code could be made significantly more efficient?

@bdarnell
Copy link
Member

I was referring to the fact that USE_STREAM_BUFFER_FOR_READS is (currently) set only for py3. If we're going to have two implementations and the ability to switch between them, we could expose that setting to the application instead of deciding based on the python version. (I think I'd rather find a workaround to make StreamBuffer work on py2 so we can only have one, but if we're going to keep both and they have different performance characteristics, we might as well take advantage of it).

I have historically focused on py2 for performance measurements, but it's time now to make py3 the priority.

@pitrou
Copy link
Contributor Author

pitrou commented Oct 19, 2017

I've created #2169 with only the write-specific changes. This will make the code easier to review and minimizes the potential slowdown.

@pitrou
Copy link
Contributor Author

pitrou commented Oct 19, 2017

I was referring to the fact that USE_STREAM_BUFFER_FOR_READS is (currently) set only for py3

Well, I think the read-specific changes are the most contentious part of this PR, as they make read_until and read_until_regex non-trivially more complicated. Hence my splitting the write parts off to a specific, smaller PR :-)

I'm also starting to wonder whether, instead of changing the buffering logic for all reads, we should instead expose a readinto-like functionality on IOStream. Though I should first experiment with that, as it's not obvious that it's easy to add dedicated logic for it into IOStream :-)

@pitrou pitrou closed this Nov 8, 2017
@pitrou pitrou deleted the large_buffers2 branch November 8, 2017 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants