Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[9678] Fix HTTP multipart parsing of chunked requests on Py3.7+ #1207

Merged
merged 12 commits into from
Jan 9, 2020

Conversation

twm
Copy link
Contributor

@twm twm commented Dec 16, 2019

See a562242 for a detailed explanation. This is the simplest possible fix. I do think that we should probably ditch the cgi module for multipart parsing eventually because it is very str-oriented, but we want bytes.

This includes the tests from #1180. Thank you @msdemlei!

Contributor Checklist:

msdemlei and others added 8 commits August 14, 2019 14:14
bpo-29979's PR python/cpython#991 changed
cgi.parse_multipart() to require a 'CONTENT-LENGTH' member in its
`pdict` parameter (which is documented as "dictionary containing other
parameters of the content type header"). The PR seems to have some
issues:

* The change seems to conflate `pdict` with an environment dict,
  probably because the test passes a dict called `env` as `pdict`.
* 'CONTENT-LENGTH' isn't a customary parameter of the Content-Type
  header. `pdict` is intended to pass the multipart boundary [1].
* In a CGI environment the length of the request body is in the
  CONTENT_LENGTH variable, anyway [2]. Typo?
* FieldStorage doesn't require a Content-Length header anyway! It's
  potionally used to enforce cgi.maxlen, which limits the requests size
  (because this is the layer for that, right?).

#1012 dealt with bpo-29979's
backward-incompatible change by passing the HTTP `Content-Length` header
as the 'CONTENT-LENGTH' member of `pdict` when the header exists. When
the header isn't given, Request skips parsing the request content.

The problem is that in the Twisted context, unlike the CGI context
(where the web server is responsible for removing any transfer encodings),
that header won't necessarily exist. For example, when a chunked transfer
encoding is used by the client.

This commit does the simplest possible thing: synthesize the content
length based on what was actually received.

[1]: https://tools.ietf.org/html/rfc7578#section-4.1
[2]: https://tools.ietf.org/html/rfc3875#section-4.1.2
Also add a reference to python/cpython#8530
Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this; the fix looks correct to me.

@glyph glyph merged commit 8d101b4 into trunk Jan 9, 2020
@glyph glyph deleted the 9678-http-multipart-py37 branch January 9, 2020 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants