-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add enforce_content_length
for responses
#949
Conversation
Nate: while you're working on a fix for this I'll hold off from reviewing. No reason beyond the fact that I'm trying to manage my workload! |
Totally! I think I've got the fix worked out, but doing some final testing. No rush on this at all. |
6fc439a
to
a92dfda
Compare
Keep hitting socket overlap, can someone kick the tests when they get a chance? Thanks! |
Restarted. |
09addb7
to
1632b96
Compare
""" | ||
|
||
CONTENT_DECODERS = ['gzip', 'deflate'] | ||
REDIRECT_STATUSES = [301, 302, 303, 307, 308] | ||
|
||
def __init__(self, body='', headers=None, status=0, version=0, reason=None, | ||
strict=0, preload_content=True, decode_content=True, | ||
original_response=None, pool=None, connection=None, retries=None): | ||
original_response=None, pool=None, connection=None, | ||
strict_content_length=False, retries=None, request_method=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should request_method be replaced with **response_kw so we're not defining minimal contact params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. However, please put strict_content_length
after retries
.
7bf3882
to
5c8becc
Compare
Alright, I think this is ready for a glance whenever you have a spare moment @Lukasa. This still has some rough edges, so I left a few comments inline, as well as my initial comment in the opening post. Thanks! |
5d087f6
to
2015f77
Compare
|
||
def test_length_when_chunked(self): | ||
headers = {'content-length': '5', | ||
'transfer-encoding': 'chunked'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This combination of headers is forbidden by RFC 7230 Section 3.3.2:
A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this test because the initial httplib
length logic checks to make sure things aren't chunked. I view this test in the same vein as receiving a Content-Length of "foo". It shouldn't happen but do we want to actually return the content length in the event urllib3 happens to receive both? I'd say no, the length should be None because it can't be determined.
Alternatively, we could raise an exception here but I'm not sure if that would buy us anything useful other than aborting the operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this boils down to "how do we want to deal with this"? The options are as follows:
- Raise an exception explaining what went wrong. This fails fast and clearly.
- Fall back to no content-length. That means we'll treat the body as chunked. If it's not, we'll fail fast (IncompleteRead, usually). If it is, everything works.
I think that's probably ok in this case, but I'd like a comment explaining the rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly also a log at warning level to explain what we're doing.
Cool, this is a really good patch so far! I've added some notes here for strictness and other things which I think are fairly important: let me know if you have thoughts! |
1b05327
to
d442647
Compare
Sorry for the confusion Nat. I didn't expand the diff enough and it looked as if you were adding to the DeflateDecoder class. Feel free to ignore that comment. |
Thanks for the feedback Ian! Things should be updated. |
@nateprewitt it looks like someone updated urllib3 to require your branch to be consistently rebased on top of master (easily one of GitHub's most annoying mistakes/misfeatures as it ties into other things people generally want). Can you rebase this as well please? |
2bc4abe
to
5b351aa
Compare
Ok cool, things are rebased onto current master. All of @sigmavirus24's proposed changes should be in place. Once @haikuginger gives the thumbs up, I'll squash things down to a more manageable commit list and update CHANGES. |
enforce_content_length
for responses
"chunked.") | ||
return None | ||
|
||
if length is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional is entirely supplemental to the one above. Possibly do elif length is not None
?
Couple minor nits left. |
84b6911
to
7c6c226
Compare
7c6c226
to
0a2a2dc
Compare
Alright, @Lukasa, @sigmavirus24, @haikuginger, I think everything has been addressed and I squashed the commits down into two separate feature commits. One for the |
Cool, I still like this. @haikuginger @sigmavirus24, are you two happy? |
I've skimmed through (haven't downloaded it and played with it) but it 👀-only looks good. |
I'm on the same page as @sigmavirus24; haven't played with it, but it looks good. |
Ok, cool, I'm happy with this then. Thanks for the great work @nateprewitt! |
And thanks so much for the reviews @sigmavirus24 and @haikuginger, fantastic team job all around. |
Thanks for the excellent work, @nateprewitt! Way to drive this PR through. ✨🍻✨ |
…ngth Add `enforce_content_length` for responses
So here's a pass at #723. This is kind of a weird edge case but it particularly prominent in the default configuration of Requests. Most calls performed by
urllib3
will raise aIncompleteRead
error fromhttplib
when the number of bytes in the body doesn't match theContent-Length
.The Skinny
httplib
raisesIncompleteRead
s appropriately everywhere except on incrementally read data. This is the primary way Requests usesurlopen
withpreload_content=False
and then reading withiter_content()
. Retrieving data this way hits the flaw inhttplib
. I've added a flag to enable this functionality, so as not to breakstream(amt)
andread(amt)
calls presently. In the next major release, I would advise the flag being removed to make allread
operations uniform by default.Notes:
test_strict_content_length
but the test does prove the changes are working correctly. Tornado won't allow you to send uneven data, so this was the only other solution I could come up with. Any suggestions on alternative methods of simulating this problem would be appreciated.I implementedlength
as a property to match the attribute nature ofhttplib.HTTPResponse.length
. I realize an int that we modify may be preferred to a property, but felt it would be more likely to break if we implement int updates everywhere IO might happen in the code.