New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify how media elements make and validate range requests #2814

Open
wants to merge 9 commits into
base: master
from

Conversation

4 participants
@jakearchibald
Collaborator

jakearchibald commented Jul 5, 2017

I'm looking for early feedback on this. I'm going to add some comments to highlight parts I'm not sure about.

source Outdated
<span>CORS-cross-origin</span> responses aren't mixed with responses from other URLs.</p>
<p>A <span>media resource</span> has a
<dfn data-x="concept-media-resource-uses-opaque-data-flag">uses opaque data flag</dfn>,

This comment has been minimized.

@jakearchibald

jakearchibald Jul 5, 2017

Collaborator

See #2813 - we may need to change the name of this.

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

I like "uses opaque response flag". We don't go more granular thus far than a Response for that kind of data and I doubt we'll ever will so it seems reasonable?

source Outdated
@@ -31982,6 +31992,18 @@ interface <dfn>HTMLMediaElement</dfn> : <span>HTMLElement</span> {
</p>
<p>A <span>media resource</span> has a
<dfn data-x="concept-media-resource-response-urls">list of response urls</dfn>, a
<span>list</span>, which is initially empty.</p>

This comment has been minimized.

@jakearchibald

jakearchibald Jul 5, 2017

Collaborator

Should this be an ordered set? It doesn't really matter in terms of behaviour, but an implementation using a set would use less memory.

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

Say you have: URL -> URL2 -> URL where the first URL redirects but the last one doesn't due the a cookie. Would it matter if that ends up as (URL, URL2)?

This comment has been minimized.

@jakearchibald

jakearchibald Jul 6, 2017

Collaborator

I don't think so. The order isn't important here either.

I want to ensure that all chunks of a particular media resource were requested from the same endpoint if opaque responses are involved. If the server redirects, that's fine. If all the responses are not-opaque, anything goes.

The attack I'm wanting to avoid is:

  1. Video src is https://personal.example.com, which contains personal data and is not a valid media element.
  2. Media element issues request for 0-.
  3. Service worker responds with 0-100 from a resource which is a known to be a video header, where the 101st byte represents the media length.
  4. Media element issues request for 101-.
  5. Service worker responds with fetch(event.request).

Since we've already given the media element a valid video header, the additional bytes from https://personal.example.com may leak as the video length etc etc.

However, this PR would see that the initial response url of null doesn't match https://personal.example.com, and return a network error.

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

So why would you store a list of URLs? Wouldn't you just stay opaque the moment you encounter it once?

This comment has been minimized.

@jakearchibald

jakearchibald Jul 6, 2017

Collaborator

"opaque" isn't enough information. All the responses could be opaque but from different places. As a result, a load of non-video data could become valid video data and leak information.

I guess I could replace this with "initial response url list", which is the url list of the first response used, and a "mixed response sources flag".

Then, if the first entry of an additional response's url list doesn't match the first entry in the "initial response url list", set the "mixed response sources flag".

If the "uses opaque data flag" is set, or the response is opaque, and "mixed response sources flag" is set, return a network error.

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

I see.

This comment has been minimized.

@jakearchibald

jakearchibald Jul 7, 2017

Collaborator

I think I'm breaking HLS by doing this. Will rethink my approach. I think "If opaque data is used, every response's (including ones previously used by this resource) first entry in the url list must match its request url" works.

This comment has been minimized.

@jakearchibald

jakearchibald Jul 11, 2017

Collaborator

Should be fine now

source Outdated
data-x="concept-media-resource-uses-opaque-data-flag">uses opaque data flag</span> affects
whether subtitles referenced in the <span>media data</span> are exposed in the API and, for
<code>video</code> elements, whether a <code>canvas</code> gets tainted when the video is
drawn on it.</p>

This comment has been minimized.

@jakearchibald

jakearchibald Jul 5, 2017

Collaborator

Those last few lines are adapted from some text that already existed. I'm a little concerned about how hand-wavey they are, especially as the canvas spec seems to contradict it (#2813).

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

The last few lines are allowed to be hand-wavey since they are statements of fact, not normative requirements. However they are not allowed to be wrong, so #2813 may indeed be a problem. I think it's OK for now though, pending getting #2813 figured out?

This comment has been minimized.

@jakearchibald

jakearchibald Jul 7, 2017

Collaborator

Yeah, the text here matches what browsers do. The canvas section is wrong.

source Outdated
data-x="concept-media-ranged-fetch-steps">ranged fetch steps</span> to
perform HTTP range retrieval requests for a given start and end range, or switching to a
streaming protocol. The user agent must consider a resource erroneous only if it has given
up trying to fetch it.</p>

This comment has been minimized.

@jakearchibald

jakearchibald Jul 5, 2017

Collaborator

Again, I'm worried about the hand-waving here, but I'm not sure how to improve it without defining how each codec works.

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

How does fetching behavior relate to the codec?

This comment has been minimized.

@jakearchibald

jakearchibald Jul 6, 2017

Collaborator
  1. Video starts fetching.
  2. Container/codec decoding happens.
  3. Through knowledge of the codec, the browser knows the metadata sits in the last 1000 bytes of the resource.
  4. Fetch Range: 123456-

Or

  1. User seeks resource to 10 minutes in.
  2. Through knowledge of the codec, the browser knows the content at 10:00 sits 9000000 bytes in.
  3. Fetch Range: 9000000-.

Without defining "Through knowledge of the codec", I'm just providing the steps for making a range request, but never really defining when it's called.

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

It might be good to call out that codec contract a little bit. So it's more clear what we're not defining. That will also make it easier for folks to tell us we're making the wrong assumptions.

This comment has been minimized.

@jakearchibald

jakearchibald Jul 6, 2017

Collaborator

👍

@@ -33015,6 +33053,115 @@ interface <dfn>HTMLMediaElement</dfn> : <span>HTMLElement</span> {
will end up firing a <code data-x="event-media-suspend">suspend</code> event, as described
earlier.</p>
<p>The <dfn data-x="concept-media-ranged-fetch-steps">ranged fetch steps</dfn> with a
<var>start</var> and optional <var>end</var> are the following steps:</p>

This comment has been minimized.

@jakearchibald

jakearchibald Jul 5, 2017

Collaborator

I've written this like an inner function that can access variables defined in the parent steps. Is this in any way ok?

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

Probably a question for the Infra Standard, but I think generally we try to avoid that and pass parameters explicitly.

This comment has been minimized.

@jakearchibald

jakearchibald Jul 6, 2017

Collaborator

I placed it within the resource fetch algorithm because that runs for the entire load of the media element. As in, it only reaches the final step if the whole media finishes loading.

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

@domenic what do you think? I think personally I'd rater avoid nested functions.

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

Given the way in this this is referenced, it seems totally fine to me. It's barely a function; it's more annotating some steps by saying /* begin ranged fetch steps */ ... /* end range fetch steps */.

source Outdated
list</span>'s <span data-x="concept-header-list-allow-privileged-headers">allow privilaged
headers flag</span>.</p></li>
<li><p>Let <var>rangeValue</var> be `<code data-x="">bytes </code>`.</p></li>

This comment has been minimized.

@jakearchibald

jakearchibald Jul 5, 2017

Collaborator

I think I'm going to move these "creating a range header" steps into the fetch spec, to avoid these lines being duplicated in every spec that wants to make a range request.

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

If there's at least two callers (or expected to be) that seems reasonable.

This comment has been minimized.

@jakearchibald

jakearchibald Jul 6, 2017

Collaborator

It'll be needed for downloads, and probably background fetch.

source Outdated
<li><p>Let <var>rangeValue</var> be `<code data-x="">bytes </code>`.</p></li>
<li><p><span data-x="UTF-8 encode">Encode</span> <var>start</var> and append the result to
<var>rangeValue</var></p></li>

This comment has been minimized.

@jakearchibald

jakearchibald Jul 5, 2017

Collaborator

I wasn't really sure how to build this header. Is "append" correct here?

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

For building a byte sequence that seems fine. There's some debate still over whether they should maybe be immutable, but this will probably end up working.

source Outdated
<ol>
<li><p>If <var>firstResponseUrl</var> is null, and <var>url</var> is not null, return a
<span>network error</span>.</p></li>

This comment has been minimized.

@jakearchibald

jakearchibald Jul 5, 2017

Collaborator

Wait, this is wrong. brb

jakearchibald added some commits Jul 5, 2017

nit
source Outdated
<span data-x="concept-request-header-list">header list</span>.</p></li>
<li><p>If <var>range</var>'s first-byte-pos doesn't equal <var>start</var>, return a
<span>network error</span>.</p></li>

This comment has been minimized.

@jakearchibald

jakearchibald Jul 5, 2017

Collaborator

Can I refer to first-byte-pos like this? It's from https://tools.ietf.org/html/rfc7233#section-4.2.

This comment has been minimized.

@annevk

annevk Jul 6, 2017

Member

It might be good to cross-reference that term. Given the tight integration with HTTP uplifting to Fetch makes more sense to me now, even if this is the only caller.

source Outdated
response even if it requested a range. It <!--non-normative-->may terminate fetches
that are (or become) redundant, e.g. if the returned range is already covered by a
previous or in-flight response, or the media is seeked as such that active fetches are
unlikely to be useful.</p>

This comment has been minimized.

@jakearchibald

jakearchibald Jul 5, 2017

Collaborator

Is this too much detail for a note? It's difficult since the process of aborting fetches isn't really covered.

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

I think it needs to not use should and may in a note. You can e.g. substitute "encouraged" and "could". Otherwise it does seem helpful.

@jakearchibald

This comment has been minimized.

Collaborator

jakearchibald commented Jul 5, 2017

@foolip Could you take a look at this?

@foolip

This comment has been minimized.

Member

foolip commented Jul 6, 2017

Going on vacation soon and won't have time to finish all things, like this. I'll check back early in August to see if this is still open.

source Outdated
associated:
<ul class="brief">
<li><dfn data-x="concept-header-list-allow-privileged-headers" data-x-href="https://fetch.spec.whatwg.org/#concept-header-list-allow-privileged-headers">allow privileged headers flag</dfn></li>
<li><dfn data-x="concept-header-list-append" data-x-href="https://fetch.spec.whatwg.org/#concept-header-list-append">append</dfn> method</li>

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

Nit: s/method/algorithm, I think.

source Outdated
<span>CORS-cross-origin</span> responses aren't mixed with responses from other URLs.</p>
<p>A <span>media resource</span> has a
<dfn data-x="concept-media-resource-uses-opaque-data-flag">uses opaque data flag</dfn>,

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

Nit: <dfn goes on previous line. The Great Re-Wrapper can help by turning unwrapped lines into wrapped ones.

This comment has been minimized.

@jakearchibald

jakearchibald Jul 7, 2017

Collaborator

I wasn't aware of that tool, cheers!

(fwiw I'd rather we just relied on editors for this kind of wrapping. Breaking mid-line makes it hard to search for terms across the document)

source Outdated
@@ -31982,6 +31992,18 @@ interface <dfn>HTMLMediaElement</dfn> : <span>HTMLElement</span> {
</p>
<p>A <span>media resource</span> has a

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

Do you think media resource is the right place to store these things? So far in the spec it seems less like a spec data structure with fields and more like a concept. Maybe putting them on the media element might be more appropriate.

This comment has been minimized.

@jakearchibald

jakearchibald Jul 6, 2017

Collaborator

I use media resource as it changes when the src changes. If that happens, we want the properties to reset.

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

Fair enough.

source Outdated
<span>CORS-cross-origin</span> responses aren't mixed with responses from other URLs.</p>
<p>A <span>media resource</span> has a
<dfn data-x="concept-media-resource-uses-opaque-data-flag">uses opaque data flag</dfn>,

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

Since we're trying to move away from flags and to booleans, maybe this should be a boolean? OK either way though; maybe we should only do an en-masse conversion instead of mixing up both into the spec.

This comment has been minimized.

@jakearchibald

jakearchibald Jul 7, 2017

Collaborator

I wasn't aware of the flag->boolean change. Happy to change it here.

@@ -32933,11 +32955,25 @@ interface <dfn>HTMLMediaElement</dfn> : <span>HTMLElement</span> {
<!--FETCH--><p><span data-x="concept-fetch">Fetch</span> <var>request</var>.

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

What is the relation between this request and the one possibly created by the range request steps? It seems like we're normatively requiring this one, and then if the browser wants to do a range request, it needs to do a second request. Is that correct?

This comment has been minimized.

@jakearchibald

jakearchibald Jul 7, 2017

Collaborator

Yeah, that's what I intended. It feels like browsers should make a non-range request first to get the start of the resource, but also to determine if ranges are accepted.

source Outdated
<p>If <var>responseUrlList</var>[0] <span data-x="list contains">exists</span>, <span
data-x="list append">append</span> <var>responseUrlList</var>[0] to the <var>current media
resource</var>'s <span data-x="concept-media-resource-response-urls">list of response
urls</span>.</p>

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

It is a bit surprising we append the first, not the last (i.e. not the response's url). A note explaining that we are intentionally doing this for the purpose of tracking redirects would be good, IMO.

source Outdated
data-x="concept-media-resource-uses-opaque-data-flag">uses opaque data flag</span> affects
whether subtitles referenced in the <span>media data</span> are exposed in the API and, for
<code>video</code> elements, whether a <code>canvas</code> gets tainted when the video is
drawn on it.</p>

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

The last few lines are allowed to be hand-wavey since they are statements of fact, not normative requirements. However they are not allowed to be wrong, so #2813 may indeed be a problem. I think it's OK for now though, pending getting #2813 figured out?

source Outdated
data-x="concept-media-ranged-fetch-steps">ranged fetch steps</span> to
perform HTTP range retrieval requests for a given start and end range, or switching to a
streaming protocol. The user agent must consider a resource erroneous only if it has given
up trying to fetch it.</p>

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

I have a separate worry about the hand-waving here, which is that it gives the user agent license to do literally anything. For example a user agent could issue range requests but not use the ranged fetch steps. Maybe the intent is clear enough though that we don't need to change it.

This comment has been minimized.

@jakearchibald

jakearchibald Jul 7, 2017

Collaborator

I'm worried about the "streaming protocol" bit, in that it might not follow the rules in terms of mixing opaque & non-opaque responses.

source Outdated
<span>environment settings object</span> and
<span data-x="concept-request-type">type</span> to "<code data-x="">audio</code>" if the
<span>media element</span> is an <code>audio</code> element and to "<code
data-x="">video</code>" otherwise.</p></li>

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

This would be clearer as two steps IMO.

source Outdated
response even if it requested a range. It <!--non-normative-->may terminate fetches
that are (or become) redundant, e.g. if the returned range is already covered by a
previous or in-flight response, or the media is seeked as such that active fetches are
unlikely to be useful.</p>

This comment has been minimized.

@domenic

domenic Jul 6, 2017

Member

I think it needs to not use should and may in a note. You can e.g. substitute "encouraged" and "could". Otherwise it does seem helpful.

@jakearchibald

This comment has been minimized.

Collaborator

jakearchibald commented Jul 11, 2017

I'm going to move on to range requests & downloads, but in the mean time… any ideas how I could write tests?

I think all browsers will issue ranges requests for a simple wav file that's seeked, which I can automate in browsers that support autoplay. However, the spec doesn't require browsers to make range requests, so it feels like a bad approach. Is there any kind of "unable to test" warning I could create in this situation.

Or, should I make it a manual test, where you have to seek a media element to generate range requests?

data-x="concept-request-header-list">header list</span>.</p></li>
<li><p>If <var>range</var>'s <a
href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16">first-byte-pos</a>

This comment has been minimized.

@jakearchibald

jakearchibald Jul 11, 2017

Collaborator

@annevk Is there a better way to link to this? The HTTP spec doesn't allow linking to the term directly.

This comment has been minimized.

@domenic

domenic Jul 11, 2017

Member

Well, it should be abstracted out into a <span> with the appropriate <dfn data-x-href=""> in the dependencies section, for one, but that's just editorial...

This comment has been minimized.

@annevk

annevk Jul 12, 2017

Member

Yeah, other than that I was thinking that maybe this should be in Fetch given the tight integration with HTTP?

This comment has been minimized.

@jakearchibald

jakearchibald Jul 12, 2017

Collaborator

Wouldn't that mean duplicating HTTP's definition?

This comment has been minimized.

@annevk

annevk Jul 12, 2017

Member

We'd still reference HTTP directly in Fetch as you did here, but for everyone else in the platform we'd have it abstracted.

This comment has been minimized.

@jakearchibald

jakearchibald Jul 12, 2017

Collaborator

It's be nice to have a header list algorithm that took a header name, and returned the parsed parts as defined by HTTP. It might be nice to expose this on the headers object too.

Eg: Let parsedRange be the result of parsing a header given headerList and Range.

Where parsedRange would be a structure like:

[
  {
    "ranges-specifier": "bytes=0-",
    "byte-ranges-specifier": "bytes=0-",
    "bytes-unit": "bytes",
    "byte-range-set": "0-",
    "suffix-byte-range-spec": null,
    "byte-range-spec": "0-",
    "first-byte-pos": "0",
    "last-byte-pos": null
  }
]

This comment has been minimized.

@annevk

annevk Jul 12, 2017

Member

That would be great, but that would only work for headers the browser supports (and might be tricky given where that code lives).

@domenic

This comment has been minimized.

Member

domenic commented Jul 11, 2017

I think all browsers will issue ranges requests for a simple wav file that's seeked, which I can automate in browsers that support autoplay. However, the spec doesn't require browsers to make range requests, so it feels like a bad approach. Is there any kind of "unable to test" warning I could create in this situation.

In general it's OK to write tests that aren't fully required by the spec. This is especially true if all browsers pass them. (Or intend to pass them, e.g. all use range requests.)

@jakearchibald jakearchibald changed the title from WIP: Specify how media elements make and validate range requests to Specify how media elements make and validate range requests Jul 12, 2017

@jakearchibald

This comment has been minimized.

Collaborator

jakearchibald commented Jul 12, 2017

No longer a WIP as such, but I need to fix the first-byte-pos reference.

Will pick this up again soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment