New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster headers implementation #544

Merged
merged 30 commits into from Feb 19, 2015

Conversation

Projects
None yet
5 participants
@ml31415
Contributor

ml31415 commented Feb 5, 2015

I had been trying to speed up urllib3 by monkeypatching with geventhttpclient. Unfortunately there is not too much gain, as lots of copying of response and header objects is done, which could largely be skipped, when the header and response objects would be sufficiently compatible and directly recognized and accepted as compatible, instead of being converted by lots of copying around.

So for the sake of better compatibility and speedups, I would have liked to directly create a header class - the standard urllib3.HTTPHeaderDict - by geventhttpclients c-parser when monkeypatching, which won't need any further processing from urllib3s response object. When I compared both header container implementations, turned out that the current urllib3 version isn't that fast as it could be. Below some timings comparing the old vs the new version, HTTPHeaderDict_ being the old version:

print "Empty initializations"
%timeit h = HTTPHeaderDict_()
%timeit x = HTTPHeaderDict()

print "Initializations with data"
%timeit h = HTTPHeaderDict_(asdf='ddd', abc='100', ddd='200', efg='300')
%timeit x = HTTPHeaderDict(asdf='ddd', abc='100', ddd='200', efg='300')

from random import choice
from string import ascii_lowercase as asciis
hdrs = dict((''.join(choice(asciis) for _ in range(9)),
             ''.join(choice(asciis) for _ in range(9))) for _ in range(200))

print "Initializations with more data"
%timeit h = HTTPHeaderDict_(hdrs)
%timeit x = HTTPHeaderDict(hdrs)

h = HTTPHeaderDict_(asdf='ddd', abc='100', ddd='200', efg='300')
x = HTTPHeaderDict(asdf='ddd', abc='100', ddd='200', efg='300')

print "Fetching items"
%timeit h.__getitem__('abc')
%timeit x.__getitem__('abc')

print "Copying"
%timeit h.copy()
%timeit x.copy()

print "Adding headers"
%timeit h = HTTPHeaderDict_(); [h.add(k,v) for k, v in hdrs.iteritems()];
%timeit x = HTTPHeaderDict(); [x.add(k,v) for k, v in hdrs.iteritems()];

h = HTTPHeaderDict_(hdrs)
x = HTTPHeaderDict(hdrs)

print "Iteration of keys"
%timeit [k for k in h]
%timeit [k for k in x]

print "Iteration of values"
%timeit [h[k] for k in h]
%timeit [x[k] for k in x]

print "Getting keys"
%timeit h.keys()
%timeit x.keys()

print "Getting values"
%timeit h.values()
%timeit x.values()

For my machine, I get the following timings:

Empty initializations
100000 loops, best of 3: 4.38 µs per loop
1000000 loops, best of 3: 858 ns per loop
Initializations with data
100000 loops, best of 3: 11.3 µs per loop
100000 loops, best of 3: 12 µs per loop
Initializations with more data
1000 loops, best of 3: 252 µs per loop
1000 loops, best of 3: 235 µs per loop
Fetching items
100000 loops, best of 3: 2.34 µs per loop
1000000 loops, best of 3: 1 µs per loop
Copying
100000 loops, best of 3: 11.7 µs per loop
100000 loops, best of 3: 5.36 µs per loop
Adding headers
1000 loops, best of 3: 267 µs per loop
1000 loops, best of 3: 242 µs per loop
Iteration of keys
10000 loops, best of 3: 58.3 µs per loop
100000 loops, best of 3: 17.9 µs per loop
Iteration of values
1000 loops, best of 3: 706 µs per loop
1000 loops, best of 3: 271 µs per loop
Getting keys
10000 loops, best of 3: 44.6 µs per loop
100000 loops, best of 3: 2.25 µs per loop
Getting values
1000 loops, best of 3: 671 µs per loop
1000 loops, best of 3: 267 µs per loop

The new version passes all current tests, though it comes with one downside, which is worth discussing: It's not storing the original case of the header items. While the current version stores that, and therefore can restore the case if desired e.g. when iterating over the keys, this implementation is consequently using lower case, except when pretty printing the object. The standards require case insensitivity, so not preserving the case seems to me like an acceptable, maybe even desireable behaviour. Furthermore, also the current implementation has to drop the case information, when joining same header items together.

In terms of compatibility, I hope this should work with all python versions, though I mainly tested it on 2.7.

python test/test_collections.py
................
----------------------------------------------------------------------
Ran 16 tests in 0.004s

OK
@r4ndsen

This comment has been minimized.

Show comment
Hide comment
@r4ndsen

r4ndsen commented Feb 5, 2015

👍

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Feb 5, 2015

Member

The new version passes all current tests, though it comes with one downside, which is worth discussing: It's not storing the original case of the header items.

I for one think this is unacceptable. We're all aware of the specification, but preserving the data as it is received is important and something I'm certain I've observed people relying upon. That said, if there's a way to preserve it, and improve peformance, I'm 100% for it. These improvements are good. What's most surprising to me is how drastic it is for an empty initialization (although I'm uncertain exactly how common that really is).

Member

sigmavirus24 commented Feb 5, 2015

The new version passes all current tests, though it comes with one downside, which is worth discussing: It's not storing the original case of the header items.

I for one think this is unacceptable. We're all aware of the specification, but preserving the data as it is received is important and something I'm certain I've observed people relying upon. That said, if there's a way to preserve it, and improve peformance, I'm 100% for it. These improvements are good. What's most surprising to me is how drastic it is for an empty initialization (although I'm uncertain exactly how common that really is).

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Feb 5, 2015

Member

Further, in its current state, this PR isn't compatible with Python 3 (according to Travis CI) and doesn't achieve 100% test coverage. Those are 2 things that are going to block this more significantly in the near term than anything else.

Member

sigmavirus24 commented Feb 5, 2015

Further, in its current state, this PR isn't compatible with Python 3 (according to Travis CI) and doesn't achieve 100% test coverage. Those are 2 things that are going to block this more significantly in the near term than anything else.

@ml31415

This comment has been minimized.

Show comment
Hide comment
@ml31415

ml31415 Feb 5, 2015

Contributor

Just extended the tests for full coverage. Have some issues with the full test run locally, so I haven't uploaded it right now. In case the suggested version would otherwise be fine, I'll fix it for Python3, too.

Contributor

ml31415 commented Feb 5, 2015

Just extended the tests for full coverage. Have some issues with the full test run locally, so I haven't uploaded it right now. In case the suggested version would otherwise be fine, I'll fix it for Python3, too.

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24

sigmavirus24 Feb 5, 2015

Member

@ml31415 I'd rather you fix it and push for Python 3 so I can pull it and verify your benchmarks.

Member

sigmavirus24 commented Feb 5, 2015

@ml31415 I'd rather you fix it and push for Python 3 so I can pull it and verify your benchmarks.

@ml31415

This comment has been minimized.

Show comment
Hide comment
@ml31415

ml31415 Feb 5, 2015

Contributor

Fixing the case preservation? I suppose that would kill most of the speed gains.

Contributor

ml31415 commented Feb 5, 2015

Fixing the case preservation? I suppose that would kill most of the speed gains.

@sigmavirus24

This comment has been minimized.

Show comment
Hide comment
@sigmavirus24
Member

sigmavirus24 commented Feb 5, 2015

@ml31415

This comment has been minimized.

Show comment
Hide comment
@ml31415

ml31415 Feb 5, 2015

Contributor

The last failing test, test_httplib_headers_case_insensitive, seems to be the exactly the point in question, whether or not case preservation is required or not.

Contributor

ml31415 commented Feb 5, 2015

The last failing test, test_httplib_headers_case_insensitive, seems to be the exactly the point in question, whether or not case preservation is required or not.

@ml31415

This comment has been minimized.

Show comment
Hide comment
@ml31415

ml31415 Feb 6, 2015

Contributor

I modified the changes, so that the original case information is preserved. After some tuning, the speed improvements are similar as the previous version.

Empty initializations
1000000 loops, best of 3: 1.01 µs per loop
Initializations with data
100000 loops, best of 3: 12.8 µs per loop
Initializations with more data
1000 loops, best of 3: 258 µs per loop
Fetching items
1000000 loops, best of 3: 940 ns per loop
Copying
100000 loops, best of 3: 6.29 µs per loop
Adding headers
1000 loops, best of 3: 237 µs per loop
Iteration of keys
100000 loops, best of 3: 17 µs per loop
Iteration of values
1000 loops, best of 3: 239 µs per loop
Getting keys
100000 loops, best of 3: 2.07 µs per loop
Getting values
1000 loops, best of 3: 251 µs per loop
Contributor

ml31415 commented Feb 6, 2015

I modified the changes, so that the original case information is preserved. After some tuning, the speed improvements are similar as the previous version.

Empty initializations
1000000 loops, best of 3: 1.01 µs per loop
Initializations with data
100000 loops, best of 3: 12.8 µs per loop
Initializations with more data
1000 loops, best of 3: 258 µs per loop
Fetching items
1000000 loops, best of 3: 940 ns per loop
Copying
100000 loops, best of 3: 6.29 µs per loop
Adding headers
1000 loops, best of 3: 237 µs per loop
Iteration of keys
100000 loops, best of 3: 17 µs per loop
Iteration of values
1000 loops, best of 3: 239 µs per loop
Getting keys
100000 loops, best of 3: 2.07 µs per loop
Getting values
1000 loops, best of 3: 251 µs per loop
Show outdated Hide outdated urllib3/_collections.py
@shazow

This comment has been minimized.

Show comment
Hide comment
@shazow

shazow Feb 6, 2015

Collaborator

Looks good overall. update_add is the messiest part, but I'm not sure how to clean it up. Renaming would be appreciated if you feel it makes sense.

Otherwise if everyone is onboard, I'm happy to merge. :)

Collaborator

shazow commented Feb 6, 2015

Looks good overall. update_add is the messiest part, but I'm not sure how to clean it up. Renaming would be appreciated if you feel it makes sense.

Otherwise if everyone is onboard, I'm happy to merge. :)

@shazow shazow added this to the v1.10.1 milestone Feb 7, 2015

@ml31415

This comment has been minimized.

Show comment
Hide comment
@ml31415

ml31415 Feb 7, 2015

Contributor

I named it like that on purpose, in order not to mess with the previous update implementation. Previous update replaces items when hitting double occurances, update_add extends entries when required. It's meant for replacing for-loops containing headerobj.add(key, value), like seen in the request module, or as a general import function of arbitrary other header objects. I also added the from_httplib constructor for that reason.

I also don't care about renaming it, though. In case you're talking about the .add implementation itself, well, it's just what benchmarks say it's the fastest. I played around with it for quite a while to achieve similar timings as before, when not preserving case information. This is the only solution which came close. The idea is, assume the general case i.e. no such item present, and optimize that as good as possible. I also had tried out another implementation using lists all over instead of tuples for single headers, but thatturned out to be too slow, about a factor of 1.5 slower than now.

Contributor

ml31415 commented Feb 7, 2015

I named it like that on purpose, in order not to mess with the previous update implementation. Previous update replaces items when hitting double occurances, update_add extends entries when required. It's meant for replacing for-loops containing headerobj.add(key, value), like seen in the request module, or as a general import function of arbitrary other header objects. I also added the from_httplib constructor for that reason.

I also don't care about renaming it, though. In case you're talking about the .add implementation itself, well, it's just what benchmarks say it's the fastest. I played around with it for quite a while to achieve similar timings as before, when not preserving case information. This is the only solution which came close. The idea is, assume the general case i.e. no such item present, and optimize that as good as possible. I also had tried out another implementation using lists all over instead of tuples for single headers, but thatturned out to be too slow, about a factor of 1.5 slower than now.

@shazow

This comment has been minimized.

Show comment
Hide comment
@shazow

shazow Feb 7, 2015

Collaborator

Yup, I got all that. :) I just mean that the name "update_add" is confusing. I understand why it exists, and I agree it's necessary. Another possible name would be "extend"? (Its behaviour is similar to list.extend in a way...)

Collaborator

shazow commented Feb 7, 2015

Yup, I got all that. :) I just mean that the name "update_add" is confusing. I understand why it exists, and I agree it's necessary. Another possible name would be "extend"? (Its behaviour is similar to list.extend in a way...)

@ml31415

This comment has been minimized.

Show comment
Hide comment
@ml31415

ml31415 Feb 7, 2015

Contributor

I suppose I'm quite uncreative with namings :) Better suggestions cordially welcome!

Contributor

ml31415 commented Feb 7, 2015

I suppose I'm quite uncreative with namings :) Better suggestions cordially welcome!

@shazow

This comment has been minimized.

Show comment
Hide comment
@shazow

shazow Feb 7, 2015

Collaborator
Collaborator

shazow commented Feb 7, 2015

@Lukasa

This comment has been minimized.

Show comment
Hide comment
@Lukasa

Lukasa Feb 7, 2015

Contributor

I think extend works for me.

Contributor

Lukasa commented Feb 7, 2015

I think extend works for me.

@ml31415

This comment has been minimized.

Show comment
Hide comment
@ml31415

ml31415 Feb 7, 2015

Contributor

What remained a bit undefined is, when to return case-preserved headers and when not. E.g. keys() now returns the actual lower case headers, items() restores it. To me it would have been preferable, to have separate case-preserving functions, but the others are already expected to preserve case in some cases, e.g. test_connectionpool, line 621.

Contributor

ml31415 commented Feb 7, 2015

What remained a bit undefined is, when to return case-preserved headers and when not. E.g. keys() now returns the actual lower case headers, items() restores it. To me it would have been preferable, to have separate case-preserving functions, but the others are already expected to preserve case in some cases, e.g. test_connectionpool, line 621.

@pyup-bot pyup-bot referenced this pull request Nov 1, 2017

Merged

Initial Update #1

@pyup-bot pyup-bot referenced this pull request Nov 22, 2017

Open

Initial Update #1

@pyup-bot pyup-bot referenced this pull request Dec 10, 2017

Open

Initial Update #1

@pyup-bot pyup-bot referenced this pull request Feb 13, 2018

Closed

Initial Update #42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment