Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIME type parsing, code points #45

Closed
annevk opened this issue Nov 27, 2017 · 7 comments
Closed

MIME type parsing, code points #45

annevk opened this issue Nov 27, 2017 · 7 comments

Comments

@annevk
Copy link
Member

annevk commented Nov 27, 2017

#36 is problematic in that it says type and subtype are ASCII strings, but there's nothing that guarantees that.

new Blob([], { type: "†/†" }).type does yield the empty string and xhr.overrideMimeType("text/xml;charset=†") fails in some implementations, so maybe returning failure for non-ASCII would be okay.

It would be good to do more exhaustive checks though, including C0 codes, U+007F, and the like. And of course write tests.

@annevk
Copy link
Member Author

annevk commented Nov 27, 2017

I decided on the following behavior:

  • Return failure for erroneous code points (not matching the HTTP token production) in the type and subtype fields. I think this can be justified as those end up not being MIME types that browsers do something with anyway (you end up in a sniffing code path).
  • Ignore parameters containing erroneous code points. Given that currently text/html;?=? is treated as text/html it seems rather dangerous to disallow that, but not putting the parameter into the MIME type record seems extremely safe, as browsers really only support charset and boundary parameters at this point.

This keeps serialized MIME types fully compatible with HTTP, but the parser side will end up tolerating quite some garbage when it comes to parameters (though again, that garbage is dropped, not preserved).

If that's the outcome we manage to get here that'd be better than with URLs and I'd consider it a huge win.

@annevk
Copy link
Member Author

annevk commented Nov 27, 2017

(@yutakahirano said this seemed reasonable on IRC, FWIW.)

@annevk
Copy link
Member Author

annevk commented Nov 27, 2017

Hmm, this would mean that text/html;charset= gbk ends up as text/html since a space is not an HTTP token. That is probably acceptable, but the most risky aspect of this proposition.

@annevk
Copy link
Member Author

annevk commented Nov 28, 2017

@annevk
Copy link
Member Author

annevk commented Nov 28, 2017

An alternative way of solving that risky aspect is that upon serializing ;charset= gbk we go with ;charset=" gbk". So we don't generate invalid values, but we are more accepting of them as we likely need to be.

@annevk
Copy link
Member Author

annevk commented Nov 28, 2017

#36 defines that alternative approach.

@annevk
Copy link
Member Author

annevk commented Dec 4, 2017

This has test coverage now.

@annevk annevk closed this as completed Dec 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant