Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
New MIME type parsing tests #10279
Tests suggested at #1851 (comment).
Co-Authored-By: Matt Menke firstname.lastname@example.org
I didn't add these for now:
I didn't see why this would end up as GBK. We'd have
Semi-colons are allowed, so this would not parse as GBK.
Thanks for this!
For text/html;charset= ";charset=GBK - according to the algorithm, only quotes immediately after the space matters. So this would be interpreted as:
Quotes aren't allowed in chraset values (See the "parameterName solely contains HTTP token code points" rule, which looks to exclude quotes), so the first charset value isn't set, so we fall back to the second.
Semicolons also don't look to be allowed: "An HTTP token code point is U+0021 (!), U+0023 (#), U+0024 ($), U+0025 (%), U+0026 (&), U+0027 ('), U+002A (*), U+002B (+), U+002D (-), U+002E (.), U+005E (^), U+005F (_), U+0060 (`), U+007C (|), U+007E (~), or an ASCII alphanumeric."
Am I missing some reason why semicolons and quotes would be allowed?
So instead, I'd suggest:
text/html;charset= ";\xFF;charset=GBK (comes out as GBK, since the \xFF is considered part of the first chrset.)
text/html;charset="\xFF;charset=foo";charset=GBK (Comes out as GBK)
The first is to catch incorrect handling of a quote after a space. The second is intended to catch incorrect tokenization around semicolons.
Apr 4, 2018
1 check passed
I don't think this one is correct: https://github.com/w3c/web-platform-tests/pull/10279/files#diff-4d2126605002e90f20a8cb91502ede9cR135
Per the parsing rules, the quoted string should run up to the end of GBK. The encoding should be null, I believe.
Sorry for the late comment - was just looking over the tests where Chrome fails as I work on fixing most of the remianing difference. One or two I'll probably leave failing, unless other browsers show the same behavior - I'm a bit paranoid about making the charset= "GBK" ones pass, in particular.
That one still works, because of the space before the quote. The spec only interprets a parameter as quoted if it sees the exact sequence
So it sees two versions of