MIME type parameter parsing tests #7764

annevk · 2017-10-13T14:48:53Z

In the hope that the charset parameter is a good proxy for the whole thing. Would be good to run these through more endpoints though.

Relates to issues 30-41 on https://github.com/whatwg/mimesniff/issues.

ghost · 2017-10-13T15:01:10Z

Build PASSED

Started: 2017-12-05 11:25:26
Finished: 2017-12-05 11:31:00

This report has been truncated because the number of unstable tests exceeds GitHub.com's character limit for comments (65536 characters).

Failing Jobs

chrome:unstable
MicrosoftEdge:14.14393

Unstable Browsers

Browser: "Microsoftedge 14.14393" (failures allowed)

View in: WPT PR Status | TravisCI

In the hope that the charset parameter is a good proxy for the whole thing. Would be good to run these through more endpoints though. Relates to issues 30-41 on https://github.com/whatwg/mimesniff/issues.

wpt-pr-bot · 2017-11-29T13:26:31Z

There are no owners for this pull request. Please reach out on W3C's irc server (irc.w3.org, port 6665) on channel #testing (web client) to get help with this. Thank you!

domenic · 2017-11-29T16:09:41Z

annevk · 2017-11-29T16:19:04Z

I still don't buy that 127 is an actual thing. No code has it and it seems rather arbitrary.

Server-side tests which ensure the server only sees the parsed-then-serialized output.

What code path would this test? Note that Response/Request only store the parsed output internally for generating Blob objects...

Do you want to block on having most of those tests or can they be follow-up? I left whatwg/mimesniff#45 open quite intentionally. It requires some additional thought into how to structure the tests as not all places will react to non-ASCII input the same way.

domenic · 2017-11-29T17:17:10Z

I still don't buy that 127 is an actual thing. No code has it and it seems rather arbitrary.

I think in general when changing a standard it's a good idea to test that the changes you made are supported by browsers, even if the old version doesn't have tests backing it up.

What code path would this test? Note that Response/Request only store the parsed output internally for generating Blob objects...

The idea is to test the networking code, which at least in Chrome I believe uses a completely different MIME type parser from other parts of the codebase.

Do you want to block on having most of those tests or can they be follow-up?

No need to block, but the more tests are in place before I start writing a JS implementation, the better.

annevk · 2017-11-29T17:35:53Z

@domenic you're talking about the MIME type the server sees though. There's already quite a few tests that ensure the browser networking stack doesn't touch the value of a custom Content-Type header. So I wonder what API entry point you have in mind.

domenic · 2017-11-29T17:44:20Z

Well so for example fetch or, perhaps more directly, XHR, can send Content-Type headers. Probably they should only send ones that have been parsed-then-serialized.

annevk · 2017-11-29T17:45:15Z

No they shouldn't (and we already test this). We don't want to embed header-specific knowledge in the networking library. That would prevent web pages from experimenting with new formats.

annevk · 2017-11-29T17:46:13Z

There's a specific case with XMLHttpRequest where we do modify the value and in that case it's indeed parsed and serialized; that's #8422.

annevk · 2017-11-30T14:58:35Z

Generating tests with a Python script is doable. The way to do those tests would be to write a Python script that generates a JSON resource. Ensure both are committed and ensure ci_built_diff.sh runs the Python script so modifications that do not keep the script and generated resource in sync are detected. Documenting this here mostly for future reference, since I won't get to it today.

domenic · 2017-11-30T20:37:44Z

mimesniff/mime-types/charset-parameter.window.js

+    const mime = val.input;
+    async_test(t => {
+      const frame = document.createElement("iframe"),
+            expectedEncoding = val.encoding === "" ? "UTF-8" : val.encoding;


I think null, instead of the empty string, should be used to signify no parsed encoding.

domenic · 2017-11-30T20:53:51Z

mimesniff/mime-types/resources/mime-types.json

+  "type/subtype longer than 127",
+  {
+    "input": "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789/0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789",
+    "output": "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789/0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789"


This is the only entry that parses correctly but does not have an "encoding" line. The unpredictable data format made it harder to write a test runner.

navigable and encoding are both optional and only make sense in specific contexts. I'll document it in more clearly in the README.

domenic · 2017-11-30T21:01:06Z

mimesniff/mime-types/resources/mime-types.json

+    "navigable": true,
+    "encoding": "GBK"
+  },
+  "Single quotes (invalid)",


Single quotes are valid per https://pr-preview.s3.amazonaws.com/whatwg/mimesniff/pull/36.html#http-token-code-point, so these next three tests seem wrong.

domenic · 2017-11-30T21:02:55Z

Coverage report at https://lcov-report-sngrqbipez.now.sh. Missing one branch in the parser, plus any mixed-case tests. Pretty good!

domenic · 2017-11-30T21:03:40Z

Oh, I guess several "else" paths were never taken, mostly where the input ends prematurely. Those are more serious coverage gaps.

annevk · 2017-12-01T18:17:39Z

I added generated tests. I think we have ample coverage now for a v0/v1 of this.

domenic · 2017-12-01T22:35:10Z

mimesniff/mime-types/resources/generated-mime-types.json

+  },
+  {
+    "input": "x/x;x=\t;bonus=x",
+    "output": "x/x;x=\"\t\";bonus=x"


My parser says this should be "x/x;bonus=x". Trailing whitespace gets removed from the parameter value, then it becomes the empty string, so it's omitted.

domenic · 2017-12-01T22:35:28Z

mimesniff/mime-types/resources/generated-mime-types.json

+  },
+  {
+    "input": "x/x;x= ;bonus=x",
+    "output": "x/x;x=\" \";bonus=x"


My parser says this should be "x/x;bonus=x". Trailing whitespace gets removed from the parameter value, then it becomes the empty string, so it's omitted.

domenic · 2017-12-01T22:37:58Z

mimesniff/mime-types/resources/mime-types.json

+  "Valid",
+  {
+    "input": "!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz",
+    "output": "!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz"


"w" is missing in a few places here; the input has 1 w (should be 2) and the output has 0.

domenic · 2017-12-01T22:38:55Z

mimesniff/mime-types/resources/mime-types.json

+  },
+  {
+    "input": "x/x;x=\" !\\\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u0080\u0081\u0082\u0083\u0084\u0085\u0086\u0087\u0088\u0089\u008A\u008B\u008C\u008D\u008E\u008F\u0090\u0091\u0092\u0093\u0094\u0095\u0096\u0097\u0098\u0099\u009A\u009B\u009C\u009D\u009E\u009F\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7\u00B8\u00B9\u00BA\u00BB\u00BC\u00BD\u00BE\u00BF\u00C0\u00C1\u00C2\u00C3\u00C4\u00C5\u00C6\u00C7\u00C8\u00C9\u00CA\u00CB\u00CC\u00CD\u00CE\u00CF\u00D0\u00D1\u00D2\u00D3\u00D4\u00D5\u00D6\u00D7\u00D8\u00D9\u00DA\u00DB\u00DC\u00DD\u00DE\u00DF\u00E0\u00E1\u00E2\u00E3\u00E4\u00E5\u00E6\u00E7\u00E8\u00E9\u00EA\u00EB\u00EC\u00ED\u00EE\u00EF\u00F0\u00F1\u00F2\u00F3\u00F4\u00F5\u00F6\u00F7\u00F8\u00F9\u00FA\u00FB\u00FC\u00FD\u00FE\u00FF\"",
+    "output": "x/x;x=\" !\\\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\u0080\u0081\u0082\u0083\u0084\u0085\u0086\u0087\u0088\u0089\u008A\u008B\u008C\u008D\u008E\u008F\u0090\u0091\u0092\u0093\u0094\u0095\u0096\u0097\u0098\u0099\u009A\u009B\u009C\u009D\u009E\u009F\u00A0\u00A1\u00A2\u00A3\u00A4\u00A5\u00A6\u00A7\u00A8\u00A9\u00AA\u00AB\u00AC\u00AD\u00AE\u00AF\u00B0\u00B1\u00B2\u00B3\u00B4\u00B5\u00B6\u00B7\u00B8\u00B9\u00BA\u00BB\u00BC\u00BD\u00BE\u00BF\u00C0\u00C1\u00C2\u00C3\u00C4\u00C5\u00C6\u00C7\u00C8\u00C9\u00CA\u00CB\u00CC\u00CD\u00CE\u00CF\u00D0\u00D1\u00D2\u00D3\u00D4\u00D5\u00D6\u00D7\u00D8\u00D9\u00DA\u00DB\u00DC\u00DD\u00DE\u00DF\u00E0\u00E1\u00E2\u00E3\u00E4\u00E5\u00E6\u00E7\u00E8\u00E9\u00EA\u00EB\u00EC\u00ED\u00EE\u00EF\u00F0\u00F1\u00F2\u00F3\u00F4\u00F5\u00F6\u00F7\u00F8\u00F9\u00FA\u00FB\u00FC\u00FD\u00FE\u00FF\""


The output slashes here (between the []s) is, per my parser, removed.

domenic · 2017-12-01T22:40:35Z

Coverage is 100%, but found some test bugs. Looking good though.

jgraham · 2017-12-04T11:17:09Z

mimesniff/mime-types/resources/mime-charset.py

@@ -0,0 +1,3 @@
+def main(request, response):
+    response.headers.set("Content-Type", request.GET.first("type"));
+    response.content = "<meta charset=utf-8>\n<script>document.write(document.characterSet)</script>"


FWIW you can also write this as

def main(request, response): return ([("Content-Type", request.GET.first("type"))], "<meta charset=utf-8>\n<script>document.write(document.characterSet)</script>")

Is that better?

Up to you; it was just in case you didn't know.

jgraham · 2017-12-04T11:18:24Z

tools/ci/ci_built_diff.sh

@@ -18,6 +18,7 @@ main() {
    )

    ./update-built-tests.sh
+    python ./mimesniff/mime-types/resources/generated-mime-types.py


Oh, sorry this should go in update_built_tests.sh. I guess the distinction is to make it easy to bulk update all the tests.

domenic · 2017-12-04T18:51:27Z

mimesniff/mime-types/resources/mime-types.json

-    "input": "!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz",
-    "output": "!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz/!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz;!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvxyz"
+    "input": "!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz;!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
+    "output": "!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvxyzabcdefghijklmnopqrstuvwxyz/!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz;!#$%&'*+-.^_`|~0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"


Still missing a "w". You can tell because the strings start in different columns but end in the same column.

domenic · 2017-12-04T18:53:09Z

mimesniff/mime-types/resources/mime-types.json

+  },
+  {
+    "input": "x/x;\" ",
+    "output": "x/x;\" \""


In my parser the output is x/x, with no parameters. There is no parameter value here, so this is expected.

domenic · 2017-12-04T18:59:17Z

mimesniff/mime-types/resources/mime-types.json

+  },
+  {
+    "input": "x/x;x=\"\t",
+    "output": "x/x;x=\"\t\""


In my parser the output is x/x, with no parameters. Remember that you remove leading and trailing whitespace from the whole MIME type in step 1 of the entire parser algorithm, so this is equivalent to parsing x/x;x="

domenic · 2017-12-05T22:34:05Z

Generated tests still not quite there: x/x;x=\t;bonus=x and x/x;x= ;bonus=x should get their x parameter removed, not quoted.

Maybe we should try to do this while we're both online? Ping me sometime perhaps?

annevk · 2017-12-06T08:06:23Z

Are you sure you're looking at the latest version? I don't see those tests in generated-mime-types.json.

domenic

My bad, my script was not properly updating generated-mime-types.json. Woohoo!!

This addresses all open inline issues with respect to the parser and serializer, aligns both closer with implementations, except where those stood in the way of an improved model. This also updates all of it to make extensive use of the Infra Standard. See #42 for the testing story (included all linked issues) and web-platform-tests/wpt#7764 for the majority of tests.

wpt-pr-bot added the http label Oct 13, 2017

wpt-pr-bot requested a review from mnot October 13, 2017 14:49

annevk mentioned this pull request Oct 13, 2017

Sort out MIME type tests whatwg/mimesniff#42

Closed

4 tasks

annevk mentioned this pull request Nov 27, 2017

MIME type parsing, other cases whatwg/mimesniff#37

Closed

annevk added 3 commits November 29, 2017 13:26

MIME type parameter parsing tests

2ce529f

In the hope that the charset parameter is a good proxy for the whole thing. Would be good to run these through more endpoints though. Relates to issues 30-41 on https://github.com/whatwg/mimesniff/issues.

>127 for parameter name

1b2efa2

double quotes with trailing garbage

caaf8a3

annevk force-pushed the annevk/mime-type-parameters branch from 69bad64 to caaf8a3 Compare November 29, 2017 12:26

annevk mentioned this pull request Nov 29, 2017

Link to tests from top of the standard whatwg/mimesniff#50

Closed

Move these all into mimesniff/ start with a JSON resource

d80e42f

wpt-pr-bot added mimesniff status:needs-reviewers labels Nov 29, 2017

test Blob/File/Request/Response

36627ba

annevk mentioned this pull request Nov 29, 2017

Implementations allow all values in type getter w3c/FileAPI#43

Open

annevk added 2 commits November 29, 2017 16:19

add some more tests

d2ab146

describe JSON format

f06ebec

add the simple requested tests (not enough time today)

66f8383

domenic reviewed Nov 30, 2017

View reviewed changes

annevk added 6 commits December 1, 2017 10:00

Address feedback on single quotes and JSON format

28bcaf5

There are a lot of HTTP token code points

8281c24

Hit end-of-file branches

6d80800

Support different test contexts better

d1f560e

include an example that has all valid quoted-value input

0ca94c9

Add the generated tests

f15f7db

wpt-pr-bot added the infra label Dec 1, 2017

wpt-pr-bot requested review from gsnedders and jgraham December 1, 2017 18:16

domenic requested changes Dec 1, 2017

View reviewed changes

address feedback

19012db

jgraham reviewed Dec 4, 2017

View reviewed changes

invoke Python from update-built-tests.sh instead

05b2e4d

domenic requested changes Dec 4, 2017

View reviewed changes

with apologies and a lot of thanks to Domenic

a4bddb0

domenic approved these changes Dec 6, 2017

View reviewed changes

annevk merged commit b15e885 into master Dec 7, 2017

annevk deleted the annevk/mime-type-parameters branch December 7, 2017 13:11

Hexcles mentioned this pull request Dec 7, 2017

Fix the flaky MIME parsing test #8621

Merged

annevk mentioned this pull request Apr 11, 2018

should Response.blob() type include parameters like charset? whatwg/fetch#540

Closed

andreubotella mentioned this pull request Jul 19, 2021

Test the default Content-Type headers of Request and Response objects #29554

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIME type parameter parsing tests #7764

MIME type parameter parsing tests #7764

annevk commented Oct 13, 2017 •

edited by wpt-pr-bot

Loading

ghost commented Oct 13, 2017 •

edited by ghost

Loading

wpt-pr-bot commented Nov 29, 2017

domenic commented Nov 29, 2017 •

edited by annevk

Loading

annevk commented Nov 29, 2017

domenic commented Nov 29, 2017

annevk commented Nov 29, 2017

domenic commented Nov 29, 2017

annevk commented Nov 29, 2017 •

edited

Loading

annevk commented Nov 29, 2017

annevk commented Nov 30, 2017

domenic Nov 30, 2017

domenic Nov 30, 2017

annevk Dec 1, 2017

domenic Nov 30, 2017

domenic commented Nov 30, 2017

domenic commented Nov 30, 2017

annevk commented Dec 1, 2017

domenic Dec 1, 2017

domenic Dec 1, 2017

domenic Dec 1, 2017

domenic Dec 1, 2017

domenic commented Dec 1, 2017

jgraham Dec 4, 2017

annevk Dec 4, 2017

jgraham Dec 5, 2017

jgraham Dec 4, 2017

annevk Dec 4, 2017

domenic Dec 4, 2017

domenic Dec 4, 2017

domenic Dec 4, 2017

domenic commented Dec 5, 2017

annevk commented Dec 6, 2017

domenic left a comment

MIME type parameter parsing tests #7764

MIME type parameter parsing tests #7764

Conversation

annevk commented Oct 13, 2017 • edited by wpt-pr-bot Loading

ghost commented Oct 13, 2017 • edited by ghost Loading

Build PASSED

Failing Jobs

Unstable Browsers

Browser: "Microsoftedge 14.14393" (failures allowed)

wpt-pr-bot commented Nov 29, 2017

domenic commented Nov 29, 2017 • edited by annevk Loading

annevk commented Nov 29, 2017

domenic commented Nov 29, 2017

annevk commented Nov 29, 2017

domenic commented Nov 29, 2017

annevk commented Nov 29, 2017 • edited Loading

annevk commented Nov 29, 2017

annevk commented Nov 30, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domenic commented Nov 30, 2017

domenic commented Nov 30, 2017

annevk commented Dec 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domenic commented Dec 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

domenic commented Dec 5, 2017

annevk commented Dec 6, 2017

domenic left a comment

Choose a reason for hiding this comment

annevk commented Oct 13, 2017 •

edited by wpt-pr-bot

Loading

ghost commented Oct 13, 2017 •

edited by ghost

Loading

domenic commented Nov 29, 2017 •

edited by annevk

Loading

annevk commented Nov 29, 2017 •

edited

Loading