New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data URLs: revised specification #234

Closed
cvrebert opened this Issue Mar 4, 2016 · 14 comments

Comments

3 participants
@cvrebert
Member

cvrebert commented Mar 4, 2016

As of SimonSapin/data-urls@82fe8eb , it no longer defines any algorithms.

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Mar 4, 2016

Member

I know. Not really sure what to do. Should URL define it inline? Are you interested in defining it?

Member

annevk commented Mar 4, 2016

I know. Not really sure what to do. Should URL define it inline? Are you interested in defining it?

@cvrebert

This comment has been minimized.

Show comment
Hide comment
@cvrebert

cvrebert Mar 4, 2016

Member

Are you interested in defining it?

No, just noticed the problem when looking up Fetch-ing data: on a lark because of https://dev.opera.com/blog/opera-35/#fetch-api-data-and-blob-url-scheme-support

Member

cvrebert commented Mar 4, 2016

Are you interested in defining it?

No, just noticed the problem when looking up Fetch-ing data: on a lark because of https://dev.opera.com/blog/opera-35/#fetch-api-data-and-blob-url-scheme-support

@cvrebert

This comment has been minimized.

Show comment
Hide comment
@cvrebert
Member

cvrebert commented Mar 31, 2016

X-Ref: whatwg/url#68

@annevk annevk changed the title from [DATAURL] is no longer a (quasi-)spec to data URLs: revised specification Aug 11, 2017

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Aug 11, 2017

Member

I think we should define it in Fetch and leave the URL Standard to concern itself with a higher layer (namely the initial parse step).

https://simonsapin.github.io/data-urls/ has some good test angles.

https://gist.github.com/annevk/4287452653921b2b7de35e4208b4a985 has a basic setup for a web-platform-tests test suite that can be reused by non-browser implementations (especially if we split out the JSON format into its own resource at some point).

Member

annevk commented Aug 11, 2017

I think we should define it in Fetch and leave the URL Standard to concern itself with a higher layer (namely the initial parse step).

https://simonsapin.github.io/data-urls/ has some good test angles.

https://gist.github.com/annevk/4287452653921b2b7de35e4208b4a985 has a basic setup for a web-platform-tests test suite that can be reused by non-browser implementations (especially if we split out the JSON format into its own resource at some point).

@annevk annevk referenced this issue Aug 11, 2017

Closed

Define data URLs? #68

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Aug 11, 2017

Member

I think we should define it in Fetch and leave the URL Standard to concern itself with a higher layer (namely the initial parse step).

I think I agree with this, but could you say more about the split?

Member

domenic commented Aug 11, 2017

I think we should define it in Fetch and leave the URL Standard to concern itself with a higher layer (namely the initial parse step).

I think I agree with this, but could you say more about the split?

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Aug 12, 2017

Member

First you run the URL parser over a data URL string to get a URL record. That URL record is the input to the data URL processor (or whatever we call it) which ends up returning a response (this is the algorithm Fetch will invoke from Scheme Fetch and also define someplace). Once we have that we could maybe also consider a Response.fromDataURL() or some such if we wanted to.

Member

annevk commented Aug 12, 2017

First you run the URL parser over a data URL string to get a URL record. That URL record is the input to the data URL processor (or whatever we call it) which ends up returning a response (this is the algorithm Fetch will invoke from Scheme Fetch and also define someplace). Once we have that we could maybe also consider a Response.fromDataURL() or some such if we wanted to.

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Aug 14, 2017

Member

See whatwg/html#2912 for base64 handling.

Member

annevk commented Aug 14, 2017

See whatwg/html#2912 for base64 handling.

annevk added a commit to whatwg/url that referenced this issue Aug 14, 2017

Define percent decoding of strings
This is useful for whatwg/fetch#234 and also simplifies the host parser a bit.
@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Aug 14, 2017

Member

Differences between browsers:

  • Safari only accepts ";base64" at the end as per the RFC. Others also accept the part before the comma containing ";base64;". I'm inclined to go with others even though Safari is better.
  • Chrome and Firefox drop unknown parameters. Others don't. (I haven't investigated deeply yet what is known and unknown, but on the surface it seems rather bogus as image/gif;charset=x is apparently fine.) Inclined to go with others.
  • Chrome and Firefox lowercase MIME types and known parameter names (not parameter values). Others don't. Inclined to go with others.
  • Chrome and Firefox remove U+0020 for non text/* MIME types (though note that application/xml does not count so it's more complicated and that %20 does end up as non-removed U+0020). This doesn't seem to be needed. See also one of the Mozilla bugs above for further analysis by me. Inclined to go with others.
  • Edge and Firefox uses "text/plain" rather than "text/plain;charset=US-ASCII" as fallback for invalid MIME types. Safari often echoes the input string, doesn't even prepend a lone ";charset=x" with "text/plain". Inclined to go with Chrome (which uses the default also as the fallback).
  • Chrome and Edge don't strip U+000C when base64 decoding. Chrome does for window.atob() so probably silly mismatch. Inclined to go with others.
Member

annevk commented Aug 14, 2017

Differences between browsers:

  • Safari only accepts ";base64" at the end as per the RFC. Others also accept the part before the comma containing ";base64;". I'm inclined to go with others even though Safari is better.
  • Chrome and Firefox drop unknown parameters. Others don't. (I haven't investigated deeply yet what is known and unknown, but on the surface it seems rather bogus as image/gif;charset=x is apparently fine.) Inclined to go with others.
  • Chrome and Firefox lowercase MIME types and known parameter names (not parameter values). Others don't. Inclined to go with others.
  • Chrome and Firefox remove U+0020 for non text/* MIME types (though note that application/xml does not count so it's more complicated and that %20 does end up as non-removed U+0020). This doesn't seem to be needed. See also one of the Mozilla bugs above for further analysis by me. Inclined to go with others.
  • Edge and Firefox uses "text/plain" rather than "text/plain;charset=US-ASCII" as fallback for invalid MIME types. Safari often echoes the input string, doesn't even prepend a lone ";charset=x" with "text/plain". Inclined to go with Chrome (which uses the default also as the fallback).
  • Chrome and Edge don't strip U+000C when base64 decoding. Chrome does for window.atob() so probably silly mismatch. Inclined to go with others.

annevk added a commit that referenced this issue Aug 14, 2017

Define data: URLs
Unfortunately RFC 2397 has some ambiguities and implementations never really followed it in detail. So here's an attempt to define the processing model more clearly and get implementations aligned.

Tests: ...

Fixes #234.

@annevk annevk referenced this issue Aug 14, 2017

Merged

Define data: URLs #579

7 of 7 tasks complete
@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Aug 14, 2017

Member

First you run the URL parser over a data URL string to get a URL record. That URL record is the input to the data URL processor (or whatever we call it) which ends up returning a response (this is the algorithm Fetch will invoke from Scheme Fetch and also define someplace).

I was hoping there'd be some intermediate algorithm that takes a URL record (or perhaps just a path string) and returns you a tuple like (MIME type, byte sequence) (maybe with text encoding too? Or just MIME parameters in general?). This algorithm would then live in the URL Standard.

That feels like something that'd be easier to reuse across environments or as part of larger pipelines. And in general just give a good factoring of "data URL processing" separate from fetching.

WDYT?

Member

domenic commented Aug 14, 2017

First you run the URL parser over a data URL string to get a URL record. That URL record is the input to the data URL processor (or whatever we call it) which ends up returning a response (this is the algorithm Fetch will invoke from Scheme Fetch and also define someplace).

I was hoping there'd be some intermediate algorithm that takes a URL record (or perhaps just a path string) and returns you a tuple like (MIME type, byte sequence) (maybe with text encoding too? Or just MIME parameters in general?). This algorithm would then live in the URL Standard.

That feels like something that'd be easier to reuse across environments or as part of larger pipelines. And in general just give a good factoring of "data URL processing" separate from fetching.

WDYT?

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Aug 14, 2017

Member

That's more or less what I ended up with (the multiple return values I have now should probably be a tuple), but I'm not sure it should be in the URL Standard. That would also lead credence to the URL Standard having to deal with the mailto: URL scheme and other such things, whereas I don't think it should.

Member

annevk commented Aug 14, 2017

That's more or less what I ended up with (the multiple return values I have now should probably be a tuple), but I'm not sure it should be in the URL Standard. That would also lead credence to the URL Standard having to deal with the mailto: URL scheme and other such things, whereas I don't think it should.

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Aug 14, 2017

Member

Hmmmmmmm. I think I am convinced by the slippery-slope argument. data: feels special intuitively, but reflecting logically for a while, I think it's not. The process of going from URL -> action and resource is fetching, so I guess it makes sense there. Thanks!

Member

domenic commented Aug 14, 2017

Hmmmmmmm. I think I am convinced by the slippery-slope argument. data: feels special intuitively, but reflecting logically for a while, I think it's not. The process of going from URL -> action and resource is fetching, so I guess it makes sense there. Thanks!

@annevk

This comment has been minimized.

Show comment
Hide comment
@annevk

annevk Aug 21, 2017

Member

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1392241 as a tracking bug for Firefox. It also contains a summary of the state of this issue and the associated PR.

Member

annevk commented Aug 21, 2017

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1392241 as a tracking bug for Firefox. It also contains a summary of the state of this issue and the associated PR.

annevk added a commit to whatwg/url that referenced this issue Oct 10, 2017

Define percent decoding of strings
This is useful for whatwg/fetch#234, the HTML Standard, and also simplifies the host parser a bit.

annevk added a commit that referenced this issue Jan 23, 2018

Define data: URLs
Unfortunately RFC 2397 has some ambiguities and implementations never really followed it in detail. So here's an attempt to define the processing model more clearly and get implementations aligned.

Tests: ...

Fixes #234.

@annevk annevk closed this in #579 Jan 31, 2018

annevk added a commit that referenced this issue Jan 31, 2018

Define data: URL processing
Unfortunately RFC 2397 has some ambiguities and implementations never really followed it in detail.

Tests: web-platform-tests/wpt#6890.

Fixes #234.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment