Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define data: URLs #579

Merged
merged 18 commits into from Jan 31, 2018
Merged

Define data: URLs #579

merged 18 commits into from Jan 31, 2018

Conversation

annevk
Copy link
Member

@annevk annevk commented Aug 14, 2017

Unfortunately RFC 2397 has some ambiguities and implementations never really followed it in detail. So here's an attempt to define the processing model more clearly and get implementations aligned.

Tests: web-platform-tests/wpt#6890.

Fixes #234.

TODO:

  • Upload tests.
  • Add more tests.
  • Figure out a base64 location, see whatwg/html#2912.
  • Refactor window.atob() to make use of the new base64 algorithm wherever we decide to locate it.
  • Use that new base64 algorithm here.
  • Run the applicable tests of html/webappapis/atob/base64.html also for data: URLs. Maybe as a separate data file in fetch/data-urls/ so any new base64 tests always run through both endpoints.
  • MIME type validation / extraction, see whatwg/mimesniff#36.

Preview | Diff

fetch.bs Outdated
@@ -5630,6 +5621,81 @@ if the script checks that the URL has the right hostname.



<h2 id=data-urls><code>data:</code> URLs</h2>

<p>For an informative description of <code>data:</code> URLs see RFC 2397. This section replaces its
Copy link
Member

@sideshowbarker sideshowbarker Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<p>For an informative description of <code>data:</code> URLs see RFC 2397

Probably better to have a comma after URLs:

For an informative description of data: URLs, see RFC 2397

fetch.bs Outdated
<h2 id=data-urls><code>data:</code> URLs</h2>

<p>For an informative description of <code>data:</code> URLs see RFC 2397. This section replaces its
normative processing requirements. [[!RFC2397]]
Copy link
Member

@TimothyGu TimothyGu Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Informative link instead of normative?

fetch.bs Outdated
<li><p>If <var>body</var> is failure, then return failure.

<li><p>Remove the code point sequence that is an <a>ASCII case-insensitive</a> match for
"<code>;base64</code>" from <var>mimeType</var>.
Copy link
Member

@TimothyGu TimothyGu Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or ;base64;

Copy link
Member Author

@annevk annevk Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would remove a ";" too many.

Copy link
Member

@TimothyGu TimothyGu Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. What if there are multiple occurrences of ;base64?

Copy link
Member Author

@annevk annevk Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't just remove all of them because then you get into trouble with something like ;base64x;base64. And it's really hard to figure out what Chrome and Firefox do because their MIME type parser throws away unknown parameters.

Copy link
Member Author

@annevk annevk Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess I should block on figuring out MIME types after all, which seems like a massive pain, but oh well.

fetch.bs Outdated
</ol>

<li><p>If <var>mimeType</var> starts with an <a>ASCII case-insensitive</a> match for
"<code>charset;</code>", then prepend "<code>text/plain</code>" to <var>mimeType</var>.
Copy link
Member

@TimothyGu TimothyGu Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;charset?

The "starts with" part would also break URLs like data:;myownprop=abc;charset=UTF-8,%E4%B8%AD%E6%96%87 and data:;myownprop=abc,stuff. The latter is technically invalid according to the RFC but works in Chrome and Firefox.

Copy link
Member Author

@annevk annevk Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter doesn't preserve anything in Chrome and Firefox though. The former does seem to preserve the charset value somehow, but doesn't work at all in Safari. I'm not sure we need to support it.

fetch.bs Outdated
equal value.
<!-- See above about a dedicated function. -->

<li><p>Return <var>mimeTypeBytes</var> and <var>body</var>.
Copy link
Member

@TimothyGu TimothyGu Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return mimeType instead?

Copy link
Member Author

@annevk annevk Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A header value is a byte sequence.

annevk added a commit to web-platform-tests/wpt that referenced this issue Aug 15, 2017
annevk added a commit to whatwg/infra that referenced this issue Aug 15, 2017
For use by window.btoa()/window.atob() and soon data: URLs, see whatwg/fetch#579. Much of this text originated in the HTML Standard, see whatwg/html#2920 about removing that.
fetch.bs Outdated
@@ -5685,7 +5686,14 @@ that RFC's normative processing requirements. [[RFC2397]]

<li><p>Set <var>mimeTypeBytes</var> to the <a>isomorphic encode</a> of <var>mimeType</var>.

<li><p>Return <var>mimeTypeBytes</var> and <var>body</var>.
<li><p>Let <var>dataURLStruct</var> be a <a>struct</a> consisting of a
Copy link
Member

@domenic domenic Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the struct type should be defined outside the algorithm (like "URL record" currently is) as well as its consituents. Then you can have a single step that says "Return a data URL struct whose MIME type is mimeTypeBytes and whose body is body."

Copy link
Member Author

@annevk annevk Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess, it seems rather pointless though. Maybe I should just use response directly?

Copy link
Member

@domenic domenic Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno, I think people might want to reuse this parser more generally, and I think it's good to have it be standalone from the rest of the spec.

Copy link
Member

@domenic domenic Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use a tuple to be super-lightweight on the spec side.

Copy link
Member Author

@annevk annevk Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to add indexing syntax to Infra to be able to address tuple entries then. Currently you have to name them.

Copy link
Member Author

@annevk annevk Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anywya, I guess it's not too much trouble to define a struct upfront.

Copy link
Member

@domenic domenic Aug 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking "Let (mimeType, body) be the result of running the data URL processor". I guess that's not explicitly defined, but it should be.

Copy link
Member

@foolip foolip left a comment

I reviewed this because it came up in https://docs.google.com/document/d/1FIUk5Y5_VmZ8rqHEsFEoXaysJUCRzf3RbvcyuYpim9c/edit?usp=sharing

Overall makes a lot of sense, just some missing links and questions about whitespace.

fetch.bs Outdated
<var>dataURL</var> and then runs these steps:

<ol>
<li><p>Assert: <var>dataURL</var>'s <a for=URL>scheme</a> is "<code>data</code>".
Copy link
Member

@foolip foolip Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scheme ends up not being linked to anything, is the concept not exported in URL?

Copy link
Member Author

@annevk annevk Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably lowercase URL.

fetch.bs Outdated

<li><p>Let <var>encodedBody</var> be the remainder of <var>input</var>.

<li><p>Let <var>mimeTypeBytes</var> be the <a>string percent decoding</a> of
Copy link
Member

@foolip foolip Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string percent decoding doesn't get linked

Copy link
Member Author

@annevk annevk Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because the corresponding PR in URL hasn't landed.

Copy link
Member

@foolip foolip Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that's whatwg/url#340

fetch.bs Outdated

<li><p>Let <var>mimeTypeBytes</var> be the <a>string percent decoding</a> of
<var>encodedMimeType</var>.
<!-- Note: implementations leave the percent-encoded bits around. That strikes me as rather broken,
Copy link
Member

@foolip foolip Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, implementations don't percent decode the mime type at all? Are any of the tests failing everywhere because of this? If so, going with the broken reality seems preferable.

Copy link
Member Author

@annevk annevk Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, nobody wants to put effort in this? I guess I should check again to make sure no implementation does it anywhere, but it seems rather sad and it means you cannot have commas anywhere (although the case for them is rather theoretical I suppose).

Copy link
Member

@foolip foolip Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having no context on this, I guess I don't see why percent decoding is the default reasonable thing to do. Is it just generally true of "parts of URLs"? This part is constrained by the syntax of mime types, does that even allow for anything that needs to be percent encoded?

Copy link
Member Author

@annevk annevk Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's why you'd percent-decode before you try to parse the MIME type. And yeah, generally in URLs "a" and "%61" would be equivalent and consumers of URLs are expected to treat those equivalently.

Copy link
Member

@foolip foolip Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a cast that isn't entirely far fetched is data:video/ogg;codecs=theora,vorbis,.... Even though I don't know of a case where the codecs are needed, it is valid to have a comma in a media type.

No objections to try to spec the more reasonable behavior, but it'd be good if it's isolated to a few tests, so that if they're all still failing in a year or so, reversing course will be a small effort.

fetch.bs Outdated

<li><p><a>Strip leading and trailing ASCII whitespace</a> from <var>mimeType</var>.

<li><p>Let <var>body</var> be the <a>string percent decoding</a> of <var>encodedBody</var>.
Copy link
Member

@foolip foolip Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not linked

fetch.bs Outdated

<li>
<p>If <var>mimeType</var> contains an <a>ASCII case-insensitive</a> match for
"<code>;base64;</code>" or ends with an <a>ASCII case-insensitive</a> match for
Copy link
Member

@foolip foolip Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is no whitespace around "base64" allowed? The URL data:text/html ; base64 ;,YW5uZXZr works for me on ChromeOS, so tests will be needed for this.

Copy link
Member Author

@annevk annevk Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The silly thing is that data:text/html%20;%20base64%20;,YW5uZXZr doesn't work, but I also cannot get that to work in other browsers (tested Firefox and Safari).

Copy link
Member

@foolip foolip Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that not just because of the above issue about percent decoding the media type?

When it comes to whitespace, is there an ideal behavior that we should try to do, or should we just look for what's most web compatible?

Copy link
Member Author

@annevk annevk Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideal might relate to how we parse MIME types in general. Ignoring whitespace is what we typically do though so I suppose that makes sense, but will require more of an actual parser.

fetch.bs Outdated

<li><p>If <var>body</var> is failure, then return failure.

<li><p>Remove the code point sequence that is an <a>ASCII case-insensitive</a> match for
Copy link
Member

@foolip foolip Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it say to remove the first such sequence, given that it could appear again later?

fetch.bs Outdated
</ol>

<li><p>If <var>mimeType</var> starts with an <a>ASCII case-insensitive</a> match for
"<code>;charset=</code>", then prepend "<code>text/plain</code>" to <var>mimeType</var>.
Copy link
Member

@foolip foolip Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question of whitespace here as well.

@annevk
Copy link
Member Author

@annevk annevk commented Aug 30, 2017

@foolip the bigger problem here is how MIME types in general are parsed. Chrome and Firefox end up normalizing them to some extent, but not in a super logical way. Basically everything unknown gets dropped, charset parameters are preserved, regardless of MIME type. See also whatwg/mimesniff#30.

@foolip
Copy link
Member

@foolip foolip commented Sep 7, 2017

whatwg/mimesniff#30 is interesting, commented. In order to ultimately resolve https://bugzilla.mozilla.org/show_bug.cgi?id=1389251, can this PR leave the question of MIME type parsing unanswered, and just extract a string that's handed off to something else, or left as a TODO? The most urgent bit to reach interop on is spaces within the body.

I'm the babysitter for this issue in https://docs.google.com/spreadsheets/d/11OdLcQs6nr3WQgkzu3kSKOPGnneMbZ8Unung8gE6jm8/edit?usp=sharing and will come back to it again eventually, right now is there anything I can help with to move it along? Tests look like a good start to me already.

@annevk
Copy link
Member Author

@annevk annevk commented Sep 8, 2017

Most of this is about the MIME type. The only spaces that are dropped are within base64-bodies. And determining it's a base64 body depends on how we parse MIME types. MIME types is step 9-16 basically, give or take. I can't just drop that.

I've reached out to @bzbarsky to see if he can help set a direction. My personal take is that we should parse MIME types here (and serialize them on output). But rather than dropping unknown parameters as Chrome and Firefox do, we preserve them (except for base64, assuming we can treat that as a parameter of sorts). Not preserving them doesn't seem good enough for a generic utility like data: URLs.

@bzbarsky
Copy link

@bzbarsky bzbarsky commented Sep 15, 2017

So observably, the MIME type bit just affects what getResponseHeader("Content-Type") returns for data: URIs with parameters in the MIME type, right?

In practice, I expect that including parameters (but not the ;base64 parameter) in there is fairly compatible, in the sense that no one uses those to a first approximation.

Looking at the actual spec proposal, I have the following questions:

  1. The proposal supports %-escaping of the MIME type, as far as I can tell. So it would support data:text/pl%61in,some text or so. I guess WebKit and Blink support that, but Gecko does not. Does Edge? Is there a reason we'd actually want to support it? It seems like an avenue for filter bypasses...
  2. This spec will, I think, treat data:text/plain;foo=';base64;',stuff as a base64-encoded URI. I think that matches Gecko; dunno about other browsers, and not sure what behavior we actually want here.
  3. "forgiving-base64 decode" is not linkified. It should be. Presumably it's https://infra.spec.whatwg.org/#forgiving-base64-decode
  4. The definition of "valid MIME type" is not linkified... I have no idea what browsers do here in practice.
  5. This spec allows ;charset=stuff to come after ;base64. Is that needed for compat. The data: RFC requires that ;base64 come last, and at least Gecko semi-enforces that by ensuring that charset is not looked for after ;base64.
  6. For something like data:;foo=bar;charset=UTF-8 it looks like the proposed algorithm will treat it as text/plain;charset=US-ASCII, right? What do UAs do here?
  7. At least Gecko canonicalizes the actual MIME type to lowercase for data:, just like it does for HTTP and every other network protocol. This spec doesn't do that at this level, but maybe higher levels end up doing it? What do other UAs do?

@bzbarsky
Copy link

@bzbarsky bzbarsky commented Sep 15, 2017

Oh, and the general direction here seems ok to me.

annevk added a commit to web-platform-tests/wpt that referenced this issue Sep 18, 2017
@annevk
Copy link
Member Author

@annevk annevk commented Sep 18, 2017

  1. The proposal supports %-escaping of the MIME type, as far as I can tell.

The main reason was because otherwise you could not have a MIME type with a comma in it. I don't think Chrome and Safari have proper support for it. They don't normalize text/pl%61in to text/plain. And they render data:text/htm%6C,<b>test as text (<b>test). I guess I'll remove that since you're the second one to complain about it.

  1. data:text/plain;foo=';base64;',stuf [I removed the final f to make it forgiving-base64-compatible]

This does not work in Safari, but does work in Chrome. I would prefer it not working.

  1. This spec allows ;charset=stuff to come after ;base64.

Something like data:text/plain;base64;charset=utf-16,stufstuf gives three results. Firefox treats it as base64, but not UTF-16. Chrome treats it as base64 and UTF-16. Safari applies neither and renders "stufstuf".

I think we should either require ;base64 to be the actual last thing as per the RFC or allow it to occur anywhere. Some kind of mix seems rather confusing.

  1. For something like data:;foo=bar;charset=UTF-8 it looks like the proposed algorithm will treat it as text/plain;charset=US-ASCII, right? What do UAs do here?

Safari never expands (just uses the given value as MIME type) and Chrome and Firefox prepend text/plain even if it doesn't start with ;charset. @TimothyGu pointed this out first and I guess I was wrong to dismiss it.

  1. At least Gecko canonicalizes the actual MIME type to lowercase for data:, just like it does for HTTP and every other network protocol.

Safari does not, Chrome does.

I'll go ahead and update the proposed standard and tests per the above.

@bzbarsky
Copy link

@bzbarsky bzbarsky commented Sep 18, 2017

otherwise you could not have a MIME type with a comma in it.

That's not allowed anyway, per the syntax for MIME types defined in https://tools.ietf.org/html/rfc2045#section-5.1. At least for the type itself; MIME parameters are a separate issue....

They don't normalize text/pl%61in to text/plain

Hmm. In my testing via the URL bar they did. And <iframe src="data:text/pl%61in,Some text"></iframe> rendered as plaintext. But you're right, that the latter just seems like "no valid type" error recovery, and who knows what browsers do with URL bar input.

I would prefer it not working [the busted base64 thing]

I think I would too, if we can figure out a simple way to spec that...

I think we should either require ;base64 to be the actual last thing as per the RFC or allow it to occur anywhere. Some kind of mix seems rather confusing.

I think the idea behind the Firefox impl is that people might add random garbage to the end of the RFC-specified bits. Might be worth checking the history of why the parsing there is so loose.

That said, does Safari basically enforce ";base64" being the very last thing? Just trying to feel out the compat constraints here.

@annevk
Copy link
Member Author

@annevk annevk commented Sep 18, 2017

For MIME type with a comma I was including MIME type parameters. E.g., the codecs parameter of various media MIME types.

Per the web-platform-tests I wrote Safari seems to enforce ;base64 being last. Doesn't even seem to allow for a trailing space (I think we should allow spaces as we already want to trim spaces around the MIME type anyway).

@annevk
Copy link
Member Author

@annevk annevk commented Sep 18, 2017

Safari treats data:text/plain;x=';base64,stuf as base64 by the way. That's a little ugly in a way, but it seems acceptable to just drop ASCII case-insensitive ";base64" from the end and then feed the remainder to a MIME type with parameters parser. ";base64" isn't really a MIME type parameter after all, as those are technically required to contain "=" and a value.

@annevk
Copy link
Member Author

@annevk annevk commented Sep 19, 2017

Having required ;base64 to be last what is left is deciding how MIME types are parsed and then serialized.

E.g., text/HTML; becomes text/html in Chrome and Firefox. Parameter names are lowercased too: text/html;CHARSET=UTF-8 becomes text/html;charset=UTF-8.

I think that's a reasonable model. The only thing I'm unsure about is:

  1. Preserving unknown parameters. I think this would be much better for a generic URL scheme.
  2. Parameters with bugs. E.g., parameters are required to have a value. Technically per the RFC if one doesn't have a value that makes the entire MIME type not parsable, just like text/html; is an error, but that's not tenable. Maybe we should just drop parameters that don't have a value?

@foolip
Copy link
Member

@foolip foolip commented Sep 27, 2017

Checking back in on this, seeing if I can be of service :)

Preserving unknown parameters. I think this would be much better for a generic URL scheme

Per #579 (comment) Firefox already preserves them, so that seems likely to be web compatible. What does Edge do? I was trying things like new URL('data:text/Plain;foo=bar,pla').toString() but that can't be the right thing, Chrome also preserves that foo=bar.

Maybe we should just drop parameters that don't have a value?

That or behaving as if they're the empty string seems fine? What do implementations do?

@annevk
Copy link
Member Author

@annevk annevk commented Sep 27, 2017

@foolip I don't think any browser preserves them at the moment. The way to test it is to fetch the data URL and look at the resulting Content-Type header.

Implementations throw away "unknown" parameters. However, for charset:

  • Chrome refuses to fetch ;charset=. ;charset results in the default MIME type (so also treated as an error of sorts, but different).
  • Firefox consistently ignores ;charset= and just ;charset, returning the MIME type without that parameter.

@foolip
Copy link
Member

@foolip foolip commented Sep 27, 2017

OK, so I was testing at the wrong level, simply parsing and serializing data: URLs doesn't do it, but passing throw fetch() would?

Chrome refuses to fetch ;charset=. ;charset results in the default MIME type (so also treated as an error of sorts, but different).

That seems weird.

Firefox consistently ignores ;charset= and just ;charset, returning the MIME type without that parameter.

I like that better.

Ultimately, though, I'm not at all attached to the details here, as long as we get interop with low compat risk.

@annevk
Copy link
Member Author

@annevk annevk commented Sep 28, 2017

OK, so I was testing at the wrong level, simply parsing and serializing data: URLs doesn't do it, but passing [to] fetch() would?

Right. data: URL processing happens after the URL parser, effectively.

@foolip
Copy link
Member

@foolip foolip commented Oct 11, 2017

This is now blocked on whatwg/mimesniff#30, right?

@annevk
Copy link
Member Author

@annevk annevk commented Oct 12, 2017

Yeah, which is blocked on whatwg/mimesniff#39. In particular, whether we can still turn MIME types into a sound system consisting of parse/model/serialize or whether it needs to be as ad-hoc as it is in implementations.

foolip
foolip approved these changes Jan 24, 2018
Copy link
Member

@foolip foolip left a comment

LGTM % nit, thanks Anne!

fetch.bs Outdated
@@ -6095,7 +6094,13 @@ that RFC's normative processing requirements to be compatible with deployed cont

<li><p>If <var>body</var> is failure, then return failure.

<li><p>Remove the last 7 code points from <var>mimeType</var>.
<li><p>Remove trailing U+0020 SPACE <a>code points</a> from <var>mimeType</var>, if any.
Copy link
Member

@foolip foolip Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd be worth pointing out that just not implementing these steps would have no observable effect, since the MIME parser will ignore it? Or if it can make a difference (didn't check deeply) pointing that out, to avoid someone making that optimization.

Copy link
Member Author

@annevk annevk Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds like a good idea. I mainly clean up here in case we want to add some kind of error reporting to the MIME type parser later.

@domenic
Copy link
Member

@domenic domenic commented Jan 24, 2018

The tests don't seem to be updated, but I implemented this in my JS implementation and only got a couple test failures with the old tests, so at least it didn't totally break everything :). I'll comment over there with the details.

@annevk
Copy link
Member Author

@annevk annevk commented Jan 24, 2018

Thanks! The main reason I had not updated the tests yet is because I was not quite sure this was all correct, and indeed we missed something. We do a starts with ; check that would fail if mimeType starts with spaces. Fixed now.

@foolip
Copy link
Member

@foolip foolip commented Jan 24, 2018

whitespace [...] cause of webcompat/web-bugs#8733

This didn't seem right after I'd tested https://software.hixie.ch/utilities/js/live-dom-viewer/saved/5761 in Firefox, where the space is accepted. See webcompat/web-bugs#8733 (comment), for now let's just write that case up as unknown cause.

@dholbert kindly clarified this in webcompat/web-bugs#8733 (comment), apparently the behavior is different for base64 and non-base64. Curious, but I don't think it changes the conclusion for this issue.

@domenic
Copy link
Member

@domenic domenic commented Jan 25, 2018

Tests and spec-as-reified-in-JS match, so LGTM!

fetch.bs Outdated
<p><a>Strip leading and trailing ASCII whitespace</a> from <var>mimeType</var>.

<p class="note">This will only remove U+0020 <a>code points</a>, if any.

<li><p>If <var>position</var> is past the end of <var>input</var>, then return failure.
Copy link
Member

@domenic domenic Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no tests that hit this condition.

Copy link
Member Author

@annevk annevk Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I added some.

@annevk
Copy link
Member Author

@annevk annevk commented Jan 30, 2018

Sigh.

The last three steps above are redundant with MIME type parsing as that data would end up discarded. It is removed here as it is not part of the MIME type and MIME type parsing might report errors out-of-band in the future.

is not actually redundant due to the next step, checking if mimeType starts with ";". I'm going to remove that note.

@annevk
Copy link
Member Author

@annevk annevk commented Jan 30, 2018

Commit message:

Define data: URL processing

Unfortunately RFC 2397 has some ambiguities and implementations never really followed it in detail.

Tests: https://github.com/w3c/web-platform-tests/pull/6890.

Fixes #234.

@annevk
Copy link
Member Author

@annevk annevk commented Jan 30, 2018

I guess I'll wait with landing another day since the above change was not anticipated by, e.g., @foolip. And I'd rather land together with the tests and Travis isn't green yet for those.

The only thing that remains to be done is filing bugs against non-Firefox browsers.

@foolip
Copy link
Member

@foolip foolip commented Jan 31, 2018

is not actually redundant due to the next step, checking if mimeType starts with ";". I'm going to remove that note.

Oh, I see. As long as some test exercises that distinction, this is fine with me then.

@annevk
Copy link
Member Author

@annevk annevk commented Jan 31, 2018

Yeah, looking through it quite a few tests cover this and I think originally those tests went the other way even, and when @domenic pointed out they were wrong I didn't realize it was because of this. So we're all good then.

annevk added a commit to web-platform-tests/wpt that referenced this issue Jan 31, 2018
@annevk annevk merged commit 36ef3c8 into master Jan 31, 2018
2 checks passed
@annevk annevk deleted the annevk/data-url branch Jan 31, 2018
@foolip
Copy link
Member

@foolip foolip commented Jan 31, 2018

Finally! Celebration! Exclamation!

@annevk, do you know of existing browser bugs we should hang these changes onto, or file new bugs?

@annevk
Copy link
Member Author

@annevk annevk commented Jan 31, 2018

@foolip see #234 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet