-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIME type parameter parsing tests #7764
Conversation
Build PASSEDStarted: 2017-12-05 11:25:26
Failing Jobs
Unstable BrowsersBrowser: "Microsoftedge 14.14393" (failures allowed)View in: WPT PR Status | TravisCI |
In the hope that the charset parameter is a good proxy for the whole thing. Would be good to run these through more endpoints though. Relates to issues 30-41 on https://github.com/whatwg/mimesniff/issues.
69bad64
to
caaf8a3
Compare
There are no owners for this pull request. Please reach out on W3C's irc server (irc.w3.org, port 6665) on channel #testing (web client) to get help with this. Thank you! |
Things I don't see tested:
|
I still don't buy that 127 is an actual thing. No code has it and it seems rather arbitrary.
What code path would this test? Note that Response/Request only store the parsed output internally for generating Blob objects... Do you want to block on having most of those tests or can they be follow-up? I left whatwg/mimesniff#45 open quite intentionally. It requires some additional thought into how to structure the tests as not all places will react to non-ASCII input the same way. |
I think in general when changing a standard it's a good idea to test that the changes you made are supported by browsers, even if the old version doesn't have tests backing it up.
The idea is to test the networking code, which at least in Chrome I believe uses a completely different MIME type parser from other parts of the codebase.
No need to block, but the more tests are in place before I start writing a JS implementation, the better. |
@domenic you're talking about the MIME type the server sees though. There's already quite a few tests that ensure the browser networking stack doesn't touch the value of a custom |
Well so for example fetch or, perhaps more directly, XHR, can send Content-Type headers. Probably they should only send ones that have been parsed-then-serialized. |
No they shouldn't (and we already test this). We don't want to embed header-specific knowledge in the networking library. That would prevent web pages from experimenting with new formats. |
There's a specific case with XMLHttpRequest where we do modify the value and in that case it's indeed parsed and serialized; that's #8422. |
Generating tests with a Python script is doable. The way to do those tests would be to write a Python script that generates a JSON resource. Ensure both are committed and ensure |
const mime = val.input; | ||
async_test(t => { | ||
const frame = document.createElement("iframe"), | ||
expectedEncoding = val.encoding === "" ? "UTF-8" : val.encoding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think null, instead of the empty string, should be used to signify no parsed encoding.
"type/subtype longer than 127", | ||
{ | ||
"input": "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789/0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789", | ||
"output": "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789/0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only entry that parses correctly but does not have an "encoding" line. The unpredictable data format made it harder to write a test runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
navigable and encoding are both optional and only make sense in specific contexts. I'll document it in more clearly in the README.
"navigable": true, | ||
"encoding": "GBK" | ||
}, | ||
"Single quotes (invalid)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single quotes are valid per https://pr-preview.s3.amazonaws.com/whatwg/mimesniff/pull/36.html#http-token-code-point, so these next three tests seem wrong.
Coverage report at https://lcov-report-sngrqbipez.now.sh. Missing one branch in the parser, plus any mixed-case tests. Pretty good! |
Oh, I guess several "else" paths were never taken, mostly where the input ends prematurely. Those are more serious coverage gaps. |
I added generated tests. I think we have ample coverage now for a v0/v1 of this. |
}, | ||
{ | ||
"input": "x/x;x=\t;bonus=x", | ||
"output": "x/x;x=\"\t\";bonus=x" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My parser says this should be "x/x;bonus=x"
. Trailing whitespace gets removed from the parameter value, then it becomes the empty string, so it's omitted.
}, | ||
{ | ||
"input": "x/x;x= ;bonus=x", | ||
"output": "x/x;x=\" \";bonus=x" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My parser says this should be "x/x;bonus=x"
. Trailing whitespace gets removed from the parameter value, then it becomes the empty string, so it's omitted.
"Valid", | ||
{ | ||
"input": "!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz", | ||
"output": "!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"w" is missing in a few places here; the input has 1 w (should be 2) and the output has 0.
}, | ||
{ | ||
"input": "x/x;x=\" !\\\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u0080\u0081\u0082\u0083\u0084\u0085\u0086\u0087\u0088\u0089\u008A\u008B\u008C\u008D\u008E\u008F\u0090\u0091\u0092\u0093\u0094\u0095\u0096\u0097\u0098\u0099\u009A\u009B\u009C\u009D\u009E\u009F\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7\u00B8\u00B9\u00BA\u00BB\u00BC\u00BD\u00BE\u00BF\u00C0\u00C1\u00C2\u00C3\u00C4\u00C5\u00C6\u00C7\u00C8\u00C9\u00CA\u00CB\u00CC\u00CD\u00CE\u00CF\u00D0\u00D1\u00D2\u00D3\u00D4\u00D5\u00D6\u00D7\u00D8\u00D9\u00DA\u00DB\u00DC\u00DD\u00DE\u00DF\u00E0\u00E1\u00E2\u00E3\u00E4\u00E5\u00E6\u00E7\u00E8\u00E9\u00EA\u00EB\u00EC\u00ED\u00EE\u00EF\u00F0\u00F1\u00F2\u00F3\u00F4\u00F5\u00F6\u00F7\u00F8\u00F9\u00FA\u00FB\u00FC\u00FD\u00FE\u00FF\"", | ||
"output": "x/x;x=\" !\\\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u0080\u0081\u0082\u0083\u0084\u0085\u0086\u0087\u0088\u0089\u008A\u008B\u008C\u008D\u008E\u008F\u0090\u0091\u0092\u0093\u0094\u0095\u0096\u0097\u0098\u0099\u009A\u009B\u009C\u009D\u009E\u009F\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7\u00B8\u00B9\u00BA\u00BB\u00BC\u00BD\u00BE\u00BF\u00C0\u00C1\u00C2\u00C3\u00C4\u00C5\u00C6\u00C7\u00C8\u00C9\u00CA\u00CB\u00CC\u00CD\u00CE\u00CF\u00D0\u00D1\u00D2\u00D3\u00D4\u00D5\u00D6\u00D7\u00D8\u00D9\u00DA\u00DB\u00DC\u00DD\u00DE\u00DF\u00E0\u00E1\u00E2\u00E3\u00E4\u00E5\u00E6\u00E7\u00E8\u00E9\u00EA\u00EB\u00EC\u00ED\u00EE\u00EF\u00F0\u00F1\u00F2\u00F3\u00F4\u00F5\u00F6\u00F7\u00F8\u00F9\u00FA\u00FB\u00FC\u00FD\u00FE\u00FF\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output slashes here (between the []s) is, per my parser, removed.
Coverage is 100%, but found some test bugs. Looking good though. |
@@ -0,0 +1,3 @@ | |||
def main(request, response): | |||
response.headers.set("Content-Type", request.GET.first("type")); | |||
response.content = "<meta charset=utf-8>\n<script>document.write(document.characterSet)</script>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW you can also write this as
def main(request, response):
return ([("Content-Type", request.GET.first("type"))],
"<meta charset=utf-8>\n<script>document.write(document.characterSet)</script>")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you; it was just in case you didn't know.
tools/ci/ci_built_diff.sh
Outdated
@@ -18,6 +18,7 @@ main() { | |||
) | |||
|
|||
./update-built-tests.sh | |||
python ./mimesniff/mime-types/resources/generated-mime-types.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry this should go in update_built_tests.sh
. I guess the distinction is to make it easy to bulk update all the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed!
"input": "!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz", | ||
"output": "!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz" | ||
"input": "!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz;!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz", | ||
"output": "!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvwxyz/!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz;!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing a "w". You can tell because the strings start in different columns but end in the same column.
}, | ||
{ | ||
"input": "x/x;\" ", | ||
"output": "x/x;\" \"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my parser the output is x/x
, with no parameters. There is no parameter value here, so this is expected.
}, | ||
{ | ||
"input": "x/x;x=\"\t", | ||
"output": "x/x;x=\"\t\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my parser the output is x/x
, with no parameters. Remember that you remove leading and trailing whitespace from the whole MIME type in step 1 of the entire parser algorithm, so this is equivalent to parsing x/x;x="
Generated tests still not quite there: Maybe we should try to do this while we're both online? Ping me sometime perhaps? |
Are you sure you're looking at the latest version? I don't see those tests in generated-mime-types.json. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, my script was not properly updating generated-mime-types.json. Woohoo!!
This addresses all open inline issues with respect to the parser and serializer, aligns both closer with implementations, except where those stood in the way of an improved model. This also updates all of it to make extensive use of the Infra Standard. See #42 for the testing story (included all linked issues) and web-platform-tests/wpt#7764 for the majority of tests.
In the hope that the charset parameter is a good proxy for the whole thing. Would be good to run these through more endpoints though.
Relates to issues 30-41 on https://github.com/whatwg/mimesniff/issues.