Add enforce_content_length for responses#949
Conversation
|
Nate: while you're working on a fix for this I'll hold off from reviewing. No reason beyond the fact that I'm trying to manage my workload! |
|
Totally! I think I've got the fix worked out, but doing some final testing. No rush on this at all. |
6fc439a to
a92dfda
Compare
|
Keep hitting socket overlap, can someone kick the tests when they get a chance? Thanks! |
|
Restarted. |
09addb7 to
1632b96
Compare
| strict=0, preload_content=True, decode_content=True, | ||
| original_response=None, pool=None, connection=None, retries=None): | ||
| original_response=None, pool=None, connection=None, | ||
| strict_content_length=False, retries=None, request_method=None): |
There was a problem hiding this comment.
Should request_method be replaced with **response_kw so we're not defining minimal contact params?
There was a problem hiding this comment.
I don't think so. However, please put strict_content_length after retries.
7bf3882 to
5c8becc
Compare
|
Alright, I think this is ready for a glance whenever you have a spare moment @Lukasa. This still has some rough edges, so I left a few comments inline, as well as my initial comment in the opening post. Thanks! |
5d087f6 to
2015f77
Compare
|
|
||
| def test_length_when_chunked(self): | ||
| headers = {'content-length': '5', | ||
| 'transfer-encoding': 'chunked'} |
There was a problem hiding this comment.
This combination of headers is forbidden by RFC 7230 Section 3.3.2:
A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field.
There was a problem hiding this comment.
I added this test because the initial httplib length logic checks to make sure things aren't chunked. I view this test in the same vein as receiving a Content-Length of "foo". It shouldn't happen but do we want to actually return the content length in the event urllib3 happens to receive both? I'd say no, the length should be None because it can't be determined.
Alternatively, we could raise an exception here but I'm not sure if that would buy us anything useful other than aborting the operation.
There was a problem hiding this comment.
Yeah, this boils down to "how do we want to deal with this"? The options are as follows:
- Raise an exception explaining what went wrong. This fails fast and clearly.
- Fall back to no content-length. That means we'll treat the body as chunked. If it's not, we'll fail fast (IncompleteRead, usually). If it is, everything works.
I think that's probably ok in this case, but I'd like a comment explaining the rationale.
There was a problem hiding this comment.
Possibly also a log at warning level to explain what we're doing.
|
Cool, this is a really good patch so far! I've added some notes here for strictness and other things which I think are fairly important: let me know if you have thoughts! |
1b05327 to
d442647
Compare
|
Sorry for the confusion Nat. I didn't expand the diff enough and it looked as if you were adding to the DeflateDecoder class. Feel free to ignore that comment. |
|
Thanks for the feedback Ian! Things should be updated. |
|
@nateprewitt it looks like someone updated urllib3 to require your branch to be consistently rebased on top of master (easily one of GitHub's most annoying mistakes/misfeatures as it ties into other things people generally want). Can you rebase this as well please? |
2bc4abe to
5b351aa
Compare
|
Ok cool, things are rebased onto current master. All of @sigmavirus24's proposed changes should be in place. Once @haikuginger gives the thumbs up, I'll squash things down to a more manageable commit list and update CHANGES. |
enforce_content_length for responses
| "chunked.") | ||
| return None | ||
|
|
||
| if length is not None: |
There was a problem hiding this comment.
This conditional is entirely supplemental to the one above. Possibly do elif length is not None?
|
Couple minor nits left. |
84b6911 to
7c6c226
Compare
7c6c226 to
0a2a2dc
Compare
|
Alright, @Lukasa, @sigmavirus24, @haikuginger, I think everything has been addressed and I squashed the commits down into two separate feature commits. One for the |
|
Cool, I still like this. @haikuginger @sigmavirus24, are you two happy? |
|
I've skimmed through (haven't downloaded it and played with it) but it 👀-only looks good. |
|
I'm on the same page as @sigmavirus24; haven't played with it, but it looks good. |
|
Ok, cool, I'm happy with this then. Thanks for the great work @nateprewitt! |
|
And thanks so much for the reviews @sigmavirus24 and @haikuginger, fantastic team job all around. |
|
Thanks for the excellent work, @nateprewitt! Way to drive this PR through. ✨🍻✨ |
…ngth Add `enforce_content_length` for responses
So here's a pass at #723. This is kind of a weird edge case but it particularly prominent in the default configuration of Requests. Most calls performed by
urllib3will raise aIncompleteReaderror fromhttplibwhen the number of bytes in the body doesn't match theContent-Length.The Skinny
httplibraisesIncompleteReads appropriately everywhere except on incrementally read data. This is the primary way Requests usesurlopenwithpreload_content=Falseand then reading withiter_content(). Retrieving data this way hits the flaw inhttplib. I've added a flag to enable this functionality, so as not to breakstream(amt)andread(amt)calls presently. In the next major release, I would advise the flag being removed to make allreadoperations uniform by default.Notes:
test_strict_content_lengthbut the test does prove the changes are working correctly. Tornado won't allow you to send uneven data, so this was the only other solution I could come up with. Any suggestions on alternative methods of simulating this problem would be appreciated.I implementedlengthas a property to match the attribute nature ofhttplib.HTTPResponse.length. I realize an int that we modify may be preferred to a property, but felt it would be more likely to break if we implement int updates everywhere IO might happen in the code.