Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enforce_content_length for responses #949

Merged
merged 2 commits into from
Aug 29, 2016

Conversation

nateprewitt
Copy link
Member

@nateprewitt nateprewitt commented Aug 17, 2016

So here's a pass at #723. This is kind of a weird edge case but it particularly prominent in the default configuration of Requests. Most calls performed by urllib3 will raise a IncompleteRead error from httplib when the number of bytes in the body doesn't match the Content-Length.

The Skinny

httplib raises IncompleteReads appropriately everywhere except on incrementally read data. This is the primary way Requests uses urlopen with preload_content=False and then reading with iter_content(). Retrieving data this way hits the flaw in httplib. I've added a flag to enable this functionality, so as not to break stream(amt) and read(amt) calls presently. In the next major release, I would advise the flag being removed to make all read operations uniform by default.

Notes:

  • My unfamiliarity with the testing harness is definitely showing in test_strict_content_length but the test does prove the changes are working correctly. Tornado won't allow you to send uneven data, so this was the only other solution I could come up with. Any suggestions on alternative methods of simulating this problem would be appreciated.
  • I implemented length as a property to match the attribute nature of httplib.HTTPResponse.length. I realize an int that we modify may be preferred to a property, but felt it would be more likely to break if we implement int updates everywhere IO might happen in the code.

@Lukasa
Copy link
Contributor

Lukasa commented Aug 17, 2016

Nate: while you're working on a fix for this I'll hold off from reviewing. No reason beyond the fact that I'm trying to manage my workload!

@nateprewitt
Copy link
Member Author

Totally! I think I've got the fix worked out, but doing some final testing. No rush on this at all.

@nateprewitt nateprewitt force-pushed the 723_strict_content_length branch 5 times, most recently from 6fc439a to a92dfda Compare August 17, 2016 21:55
@nateprewitt
Copy link
Member Author

Keep hitting socket overlap, can someone kick the tests when they get a chance? Thanks!

@Lukasa
Copy link
Contributor

Lukasa commented Aug 18, 2016

Restarted.

@nateprewitt nateprewitt force-pushed the 723_strict_content_length branch 4 times, most recently from 09addb7 to 1632b96 Compare August 23, 2016 03:31
"""

CONTENT_DECODERS = ['gzip', 'deflate']
REDIRECT_STATUSES = [301, 302, 303, 307, 308]

def __init__(self, body='', headers=None, status=0, version=0, reason=None,
strict=0, preload_content=True, decode_content=True,
original_response=None, pool=None, connection=None, retries=None):
original_response=None, pool=None, connection=None,
strict_content_length=False, retries=None, request_method=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should request_method be replaced with **response_kw so we're not defining minimal contact params?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. However, please put strict_content_length after retries.

@nateprewitt nateprewitt force-pushed the 723_strict_content_length branch 2 times, most recently from 7bf3882 to 5c8becc Compare August 23, 2016 16:34
@nateprewitt
Copy link
Member Author

Alright, I think this is ready for a glance whenever you have a spare moment @Lukasa.

This still has some rough edges, so I left a few comments inline, as well as my initial comment in the opening post.

Thanks!

@nateprewitt nateprewitt force-pushed the 723_strict_content_length branch 3 times, most recently from 5d087f6 to 2015f77 Compare August 23, 2016 21:38

def test_length_when_chunked(self):
headers = {'content-length': '5',
'transfer-encoding': 'chunked'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This combination of headers is forbidden by RFC 7230 Section 3.3.2:

A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this test because the initial httplib length logic checks to make sure things aren't chunked. I view this test in the same vein as receiving a Content-Length of "foo". It shouldn't happen but do we want to actually return the content length in the event urllib3 happens to receive both? I'd say no, the length should be None because it can't be determined.

Alternatively, we could raise an exception here but I'm not sure if that would buy us anything useful other than aborting the operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this boils down to "how do we want to deal with this"? The options are as follows:

  1. Raise an exception explaining what went wrong. This fails fast and clearly.
  2. Fall back to no content-length. That means we'll treat the body as chunked. If it's not, we'll fail fast (IncompleteRead, usually). If it is, everything works.

I think that's probably ok in this case, but I'd like a comment explaining the rationale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly also a log at warning level to explain what we're doing.

@Lukasa
Copy link
Contributor

Lukasa commented Aug 24, 2016

Cool, this is a really good patch so far! I've added some notes here for strictness and other things which I think are fairly important: let me know if you have thoughts!

@nateprewitt nateprewitt force-pushed the 723_strict_content_length branch 2 times, most recently from 1b05327 to d442647 Compare August 25, 2016 18:03
@sigmavirus24
Copy link
Contributor

Sorry for the confusion Nat. I didn't expand the diff enough and it looked as if you were adding to the DeflateDecoder class. Feel free to ignore that comment.

@nateprewitt
Copy link
Member Author

Thanks for the feedback Ian! Things should be updated.

@sigmavirus24
Copy link
Contributor

@nateprewitt it looks like someone updated urllib3 to require your branch to be consistently rebased on top of master (easily one of GitHub's most annoying mistakes/misfeatures as it ties into other things people generally want). Can you rebase this as well please?

@nateprewitt
Copy link
Member Author

Ok cool, things are rebased onto current master. All of @sigmavirus24's proposed changes should be in place. Once @haikuginger gives the thumbs up, I'll squash things down to a more manageable commit list and update CHANGES.

@sigmavirus24 sigmavirus24 changed the title strict_content_length enforcement Add enforce_content_length for responses Aug 28, 2016
"chunked.")
return None

if length is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional is entirely supplemental to the one above. Possibly do elif length is not None?

@haikuginger
Copy link
Contributor

Couple minor nits left.

@nateprewitt nateprewitt force-pushed the 723_strict_content_length branch 2 times, most recently from 84b6911 to 7c6c226 Compare August 28, 2016 17:06
@nateprewitt
Copy link
Member Author

Alright, @Lukasa, @sigmavirus24, @haikuginger, I think everything has been addressed and I squashed the commits down into two separate feature commits. One for the length_remaining attribute and then building on that to implement enforce_content_length.

@Lukasa
Copy link
Contributor

Lukasa commented Aug 28, 2016

Cool, I still like this. @haikuginger @sigmavirus24, are you two happy?

@sigmavirus24
Copy link
Contributor

I've skimmed through (haven't downloaded it and played with it) but it 👀-only looks good.

@haikuginger
Copy link
Contributor

I'm on the same page as @sigmavirus24; haven't played with it, but it looks good.

@Lukasa
Copy link
Contributor

Lukasa commented Aug 29, 2016

Ok, cool, I'm happy with this then. Thanks for the great work @nateprewitt!

@Lukasa
Copy link
Contributor

Lukasa commented Aug 29, 2016

And thanks so much for the reviews @sigmavirus24 and @haikuginger, fantastic team job all around.

@Lukasa Lukasa merged commit 65b8c52 into urllib3:master Aug 29, 2016
@haikuginger
Copy link
Contributor

Thanks for the excellent work, @nateprewitt! Way to drive this PR through. ✨🍻✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants