-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The values of the MIME type parameters aren't parsed as ASCII strings #141
Comments
Wow, nice find! And also somewhat surprising this was overlooked for so long. I guess this is a use case for "isomorphic string" as we shouldn't subset HTTP. |
The "parse a MIME type" algorithm works on strings, not on byte sequences as it used to do, so no isomorphic decoding is needed here. This means that a caller could pass code points greater than U+00FF, but trying to parse them as part of a parameter value would already fail because they wouldn't be HTTP quoted-string token code points. Should then the definition of parameter value refer to HTTP quoted-string token code points? |
With "isomorphic string" I meant a string whose code points are in the range U+0000 to U+00FF, inclusive (aka latin1, but we avoid latin1 as a term on the web as it also means windows-1252 there). But yeah, we could also define the parameter value as a string whose code points are HTTP quoted-string token code points. |
If the "parse a MIME type" algorithm is called on a string like
"multipart/form-data; boundary=áèîøü"
, the parsing succeeds, and the resulting MIME type record has a"boundary"
parameter of"áèîøü"
, even though the MIME type definition specifies that parameter values are ASCII strings.This is because in the parsing algorithm, the essence and parameter names must only contain HTTP token code points, which are a subset of ASCII; but parameter values must only contain HTTP quoted-string token code points, which aren't.
I found this as part of working on a
multipart/form-data
parser in https://github.com/andreubotella/multipart-form-data, since I noticed that some browsers accept a boundary string with code points between U+0080 and U+00FF while others don't, and after going down the rabbit hole of fetch algorithms, this seems to be the cause of that incompatibility.The text was updated successfully, but these errors were encountered: