-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate use of cgi
#377
Eliminate use of cgi
#377
Conversation
Does vendoring a stdlib function have license implications? I'd rather deal with the possibility of tinkering in the email module than copy the PSF license. |
@glyph I thought we've vendored stuff from the stdlib in Twisted before... is that not the case? |
Okay, I switched to using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update.
I just left a very generic review.
I have never used treq so far.
It looks good. Only minor comments.
I prefer stdlib here... but multipart is also ok.
Regards
def _encoding_from_headers(headers: Headers) -> Optional[str]: | ||
content_types = headers.getRawHeaders("content-type") | ||
if content_types is None: | ||
return None | ||
|
||
# This seems to be the choice browsers make when encountering multiple | ||
# content-type headers. | ||
content_type, params = cgi.parse_header(content_types[-1]) | ||
media_type, params = multipart.parse_options_header(content_types[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a FYI.
In Twisted we/I went for stdlib usage https://github.com/twisted/twisted/pull/12016/files#diff-3093a8f64dec64c2305f6322f276a5cee1437f0037efc660ce6524ffa58017a1R229-R234
My reasoning for using stdlib, is that this is quite common usage and if there is a bug, it's best to have it fixed in stdlib rather than rely on 3rd party packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning for using stdlib, is that this is quite common usage and if there is a bug, it's best to have it fixed in stdlib rather than rely on 3rd party packages.
I'm curious what experience this is based on. In my experience, known bugs often remain in the stdlib for years (perhaps because there are relatively many barriers to landing a change in the stdlib, as well as the fact that the stdlib can only change with a release which happens relatively infrequently - plus such a release can not fix the issue for already-released versions of python).
If there is a third-party module of reasonable quality that is practical to use then I would generally prefer this over something from the stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the source of the 3rd-party module. More dependencies = more attack surface, both in the literal sense (more people whose PyPI upload credentials can ruin your day), and in the metaphoric sense of unanticipated changes (what happens if a dependency goes AGPL3).
In this sense "the stdlib" or any existing dependency are roughly equivalent, and I'd probably prefer to depend more on our existing deps. Taking on a new dependency ought to give us a moment of pause, but really only a moment, especially if we know the maintainers or it's sufficiently widely used that we are not entirely on the hook for due diligence on its continued maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case this is a blocker to 3.13 support and the dependency is only for tests, so we should probably not block on this minor concern :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the dependency here is not just for tests. In 0a3d17b I switched to using the equivalent of cgi.parse_header()
provided by multipart
to avoid vendoring code under the PSF license as that seemed to give @glyph pause.
I picked multipart
because it has a few familiar faces attached, e.g. @cjwatson, who has PyPI upload permissions. It is hosted under an individual's GitHub account rather than an org, so that's a downside.
I also considered pulling the relevant bits of cgi
into a separate package and putting it on PyPI. I'd be happy to do that under the Twisted org if that's preferable to an external dependency.
I can only echo @exarkun's skepticism of using the stdlib for this. The stdlib has an anti-bytes
bias that is often incorrect and has led to a lot of makework for us in Twisted. The cgi
module has seen some baffling refactors, then deprecation and removal. Now we're told to use the email
package, which isn't explicitly implementing the HTTP RFCs. Seems risky. Also, it was recently rewritten! It's possible for code to be too maintained.
from typing import cast, AnyStr | ||
|
||
from io import BytesIO | ||
|
||
from multipart import MultipartParser # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's ok to use multipart in the testing code, to double check the implementation... even if the code under tests would have use stdlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, the cgi
docs suggest the multipart
package:
Deprecated since version 3.11, will be removed in version 3.13: This function, like the rest of the cgi module, is deprecated. It can be replaced with the functionality in the email package (e.g. email.message.EmailMessage/email.message.Message) which implements the same MIME RFCs, or with the multipart PyPI project.
While it suggests the email
package too, I do not think that the email
package is representative of real-world multipart/form-data parsers, in particular because it targets email-dialect MIME that involves nesting never seen in HTTP. (As just one example, Django's multipart parser doesn't support nesting. I guarantee no browser ever generates it.)
A more ideal test suite would test against a variety of real-world multipart parsers, but I don't think that'd have great ROI.
self.assertIsInstance(resource.request_finishes[0].value, ConnectionDone) | ||
|
||
|
||
class EncodingFromHeadersTests(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a docstring to describe the criteria for grouping tests into this class.
This should help future devs know if they should add a new tests here or add it somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! In the future it's fine to remind me of the Twisted coding standard, no need to write so much. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @adiroiban! I'll give a little time for folks to discuss the third-party dependency situation.
def _encoding_from_headers(headers: Headers) -> Optional[str]: | ||
content_types = headers.getRawHeaders("content-type") | ||
if content_types is None: | ||
return None | ||
|
||
# This seems to be the choice browsers make when encountering multiple | ||
# content-type headers. | ||
content_type, params = cgi.parse_header(content_types[-1]) | ||
media_type, params = multipart.parse_options_header(content_types[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the dependency here is not just for tests. In 0a3d17b I switched to using the equivalent of cgi.parse_header()
provided by multipart
to avoid vendoring code under the PSF license as that seemed to give @glyph pause.
I picked multipart
because it has a few familiar faces attached, e.g. @cjwatson, who has PyPI upload permissions. It is hosted under an individual's GitHub account rather than an org, so that's a downside.
I also considered pulling the relevant bits of cgi
into a separate package and putting it on PyPI. I'd be happy to do that under the Twisted org if that's preferable to an external dependency.
I can only echo @exarkun's skepticism of using the stdlib for this. The stdlib has an anti-bytes
bias that is often incorrect and has led to a lot of makework for us in Twisted. The cgi
module has seen some baffling refactors, then deprecation and removal. Now we're told to use the email
package, which isn't explicitly implementing the HTTP RFCs. Seems risky. Also, it was recently rewritten! It's possible for code to be too maintained.
self.assertIsInstance(resource.request_finishes[0].value, ConnectionDone) | ||
|
||
|
||
class EncodingFromHeadersTests(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! In the future it's fine to remind me of the Twisted coding standard, no need to write so much. :)
from typing import cast, AnyStr | ||
|
||
from io import BytesIO | ||
|
||
from multipart import MultipartParser # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, the cgi
docs suggest the multipart
package:
Deprecated since version 3.11, will be removed in version 3.13: This function, like the rest of the cgi module, is deprecated. It can be replaced with the functionality in the email package (e.g. email.message.EmailMessage/email.message.Message) which implements the same MIME RFCs, or with the multipart PyPI project.
While it suggests the email
package too, I do not think that the email
package is representative of real-world multipart/form-data parsers, in particular because it targets email-dialect MIME that involves nesting never seen in HTTP. (As just one example, Django's multipart parser doesn't support nesting. I guarantee no browser ever generates it.)
A more ideal test suite would test against a variety of real-world multipart parsers, but I don't think that'd have great ROI.
Thanks for the feedback. Much appreciated! I have approved the PR and I am ok with using My comment about Also, I remember some discussions for getting treq code include in Twisted and have it as a high-level API . From what I can see in this PR, I think that HTTP handling should be a core feature of any library, and this is why I would prefer to have http code implemented in stdlib. For example, in Twisted, HTTP protocol handling is part of main Twisted package, while for LDAP we have a separate package. Something similar for stdlib. I would expect Python to have a good enough HTTP handing, and use a separate package for LDAP handling. |
Any chance we could get a new release with this fix? Fedora rawhide just updated to Python 3.13b2 and treq is failing to build. This PR doesn't apply cleanly to 23.11.0. Thanks. |
I think that first we should fix the 3.13 support in twisted/twisted |
Unfortunately on Fedora we don't package multipart, but python-multipart which is a different project. And while not entirely sure of the stability of each project, the multipart used here looks abandoned upstream (no commits since 2021). |
|
That is awesome, thanks for the response here :). I'll work on getting it packaged for Fedora then. |
@opoplawski Thanks for the ping! I'm currently a bit loopy on cough medicine, but I should be able to manage a release this weekend or next week. |
Stop using
cgi
, which was deprecated in Python 3.11 and dropped in Python 3.13. Replace it with themultipart
package, which becomes a dependency.Fixes #355.