Allowing UA to do <source> selection for media element #10077

marcoscaceres · 2024-01-19T02:56:38Z

What is the issue with the HTML Standard?

The “resource selection algorithm” for a media element (as part of the “Otherwise (mode is children)” case) states that when a developer has listed various different formats, the browser is supposed to use the first media type that's recognized (after matching on media=).

Tree-order based selection is problematic because developers lack sufficient information about the end user’s device/environment to make an adequate determination as to which <source> is most optimal, potentially leading to the wrong <source> order appearing in a document (or the user agent being put into a situation where it has to choose a sub-optimal <source>).

Unlike developers, user agent have a greater understanding of the user’s device and environment, so are in a privileged position to choose the most optimal <source> that will give the best user's experience. There could be a lot of conditions where its better for the user agent to intelligently choose between formats, for example picking hardware decoding support over those that are software-only, picking one that preserves battery, or one that's higher quality / a better codec, etc.

Ultimately developers shouldn’t concern themselves as to which format will be best for a given set of environmental conditions (as currently implied by the order, which could lead to a sub-optimal choice!). Instead, they can put the responsibility on the use agent to make the best choice for users.

Admittedly, we could use a little help updating that part of the spec. The current algorithm seems a little finicky with all the pointers and async waits/loops, so we could really use some help or guidance with updating it to say that after matching on media= and checking all the type=’s it supports, the user agent may choose the most suitable <source> element based on the user’s environment, device capabilities, or optimal hardware support.

NB: we are aware that the above could also apply equally to <picture>, but we should probably do that independently once we sort out media elements.

Cc @jyavenard

The text was updated successfully, but these errors were encountered:

annevk · 2024-01-19T07:57:39Z

@whatwg/media thoughts?

zcorpan · 2024-01-19T09:28:25Z

I think this would require a fundamental change to how the algorithm works. Currently, the algorithm can pick a source while the parser has yielded before the </video> tag (so more source elements can still appear). If the browser needs to have all sources before making a selection, we need to wait until </video> is parsed. cc @hsivonen

For the script-created elements case, we can probably do what picture does: queue a microtask before making a selection.

If the algorithm fails to select something because all types are unsupported or all medias don't match, the networkState can be NETWORK_NO_SOURCE and allow for resource selection to happen again if another source element is inserted (after a microtask).

This change would probably make the spec simpler.

While arbitrary order makes sense for type, I don't think it makes sense for media. How do you envision it should work when media is also used?

Example:

<video>
 <source src="a" type="video/mp4" media="(min-width: 600px)">
 <source src="b" type="video/mp4">
 <source src="c" type="video/webm" media="(min-width: 600px)">
 <source src="d" type="video/webm">
</video>

Maybe selection could be in two passes: first, select a list of candidates that has the first (in tree order) source element where media matches (whether the attribute is present or not), for each unique type value. For the above example, the lists would be either ["a", "c"] or ["b", "d"]. Then, select among those in a UA-defined manner, but use the source element without a type attribute (if any) as last resort.

marcoscaceres · 2024-01-23T03:13:36Z

Yeah, I was also thinking this would need to be done in two passes after parsing. And yeah, I was also thinking it would simplify the spec quite a lot, but I was unsure also how much breakage there would be given how much stuff seems to happen right now in the algorithm.

domenic · 2024-01-26T05:26:23Z

It seems like the intent of this feature is to allow the browser to make the best choice for users.

In a case like

<video>
  <source ...>
  <source ...>
  [... the server hangs for 5 seconds ...]
  <source ...>
</video>

it seems like the best choice is for the browser to give up sometime early during that 5 second window, and choose from the first 2 sources. It would be bad to mandate in the spec that the browser has to wait until it sees the end tag.

Does that fit with what you all are thinking? How does it play into the discussions above about multiple passes after parsing?

jyavenard · 2024-01-26T07:25:07Z

it seems like the best choice is for the browser to give up sometime early during that 5 second window, and choose from the first 2 sources

I don't believe the proposal prevents this behaviour.
If those first two sources aren't playable, then you would need to wait anyway.

If two sources are available and the second is preferable over the first, then the UA play the 2nd.
If there's only one source available at the time you parse it and it is playable you play it.

I would limit to the two passes parsing to only the sources that are immediately available.

The 2nd parse cycle being only required if during the first parse; one of the source was playable, but the system would have liked to see if a preferred one was there (like if the available source only allowed for software decoding).

domenic · 2024-01-26T07:57:14Z

As long as this proposal doesn't involve waiting for the end </video> tag, then that's good. Some of the discussion in #10077 (comment) implied waiting for the </video> tag, so I wanted to make sure that was not part of the proposal.

zcorpan · 2024-01-26T12:42:48Z

Waiting for the end tag is what I had in mind, to make processing predictable and not depend on network latency. I'm not convinced it's best to allow processing of partial data here. I think speculatively fetching a video is OK, but AFAIK browsers don't do that today.

If the server stalls while parsing the video start tag, the user also won't see a video, even if the src attribute has been seen.

But maybe there's some heuristic we can apply to load (and play) sooner without introducing network latency impact on resource selection. For example, when seeing fallback content (non-whitespace text or elements other than source and track), run resource selection. A risk here is existing content might have bogus elements or text in video (between source elements), which would no longer work, at least if the resource selection only runs once (as today).

dalecurtis · 2024-01-26T18:24:18Z

How are you proposing to decide a better source? Do you just want to always select the mp4?

I'm not sure how you'd make a better decision beyond some simple cases (that seem uncommon nowadays) w/o parsing and loading each source -- which could slow down loading significantly. It's not common to list codecs in type, so at best you're guessing which codecs and resolution would be in each source and there's significant overlap. E.g., webm -> (av1, vp9, vp8), mp4 -> (av1, vp9, h264, hevc).

jernoble · 2024-01-26T18:33:28Z

How are you proposing to decide a better source? Do you just want to always select the mp4?

Not at all; we would want to select the "best" source, which could (and has) meant the one with a HW decoder available.

For context, we have previously de-prioritized HEVC on platforms containing only a SW HEVC codec, and way, way, way back in the day, de-prioritized Ogg on systems that had Perian installed as a QuickTime extension.

And of course, this behavior will work better when there is more information provided in type. But source selection in general works better when there is more information provided in type.

dalecurtis · 2024-01-26T18:37:45Z

Can you clarify "works" in the case of type w/o codecs? I don't see how it would even work in that case w/o parsing.

jernoble · 2024-01-26T18:42:34Z

Adding a <source> without a type already requires parsing (in the form of sniffing). This is just a more advanced form of sniffing.

smaug---- · 2024-01-27T10:18:20Z

There might be some privacy issues here. The site could learn more about what sort of hardware the user has.

jyavenard · 2024-01-27T10:56:51Z

There might be some privacy issues here. The site could learn more about what sort of hardware the user has.

MediaCapabilities already expose in much more details the information. Whichever remedial work done for MC is applicable here too (as is, disabling under some circumstances the selection algorithm)

smaug---- · 2024-01-27T16:19:47Z

Interesting, since the privacy and security issues are similar (not the same though) as with delayed clipboard rendering. And that one is being objected strongly because of the privacy issues

jyavenard · 2024-01-27T23:20:35Z

Interesting, since the privacy and security issues are similar (not the same though) as with delayed clipboard rendering.

One exposes specific user behaviour , the other could expose hardware capabilities (information already available). And for the latter we have existing policy to limit exposure covering it

zcorpan · 2024-01-29T22:20:38Z

A possible web compat issue is with error events on source elements. Currently, since the sources are tried in order, sites may use an error event listener on the last source element as a signal that all resources have failed to play, and maybe decide to show an error message to the user. The spec even has an example:

https://html.spec.whatwg.org/multipage/embedded-content.html#the-source-element:event-error

If a browser then picks that source to try first, but it's not playable, it will trigger the error message code path of the web page even though the browser still hasn't tried any other source.

A possible mitigation could be to defer firing error events on source elements until all available options have failed.

zcorpan · 2024-03-14T12:48:50Z

Per https://webkit.org/blog/15063/webkit-features-in-safari-17-4/#source-prioritization it seems this change was shipped in Safari 17.4. Correct? If so, can you clarify what was implemented?

eeeps · 2024-03-14T13:36:25Z

A (too late?) note that from an author education/understanding standpoint, the model for years has been:

srcset: an unordered list of URLs with descriptors, provided by the author. The UA is best-suited to make decisions about which one should be picked, when, and does so, using the provided descriptors.
<source>: an ordered list of "sources" (each possibly having its own srcset). The author is best-suited to make decisions about which one should be picked, when, and they attach explicit instructions describing the situations in which UAs should pick each.

I agree that for the type-switching use case, UAs will generally make better decisions than authors. The question is less "what does this particular content and page context need" (author best-suited), and more "what does this particular user and UA context need" (UA best-suited).

Rather than changing <video><source> to have a srcset-like selection mechanism, it would be much cleaner to introduce srcset + a type() descriptor to <video>. Teaching and understanding these mechanisms is hard enough, and being able to transfer knowledge about how markup patterns map to mechanisms from one media element to another is great. Muddling mechanisms makes understanding, explaining, testing — authoring — much harder.

jyavenard · 2024-03-15T06:39:37Z

Per https://webkit.org/blog/15063/webkit-features-in-safari-17-4/#source-prioritization it seems this change was shipped in Safari 17.4. Correct? If so, can you clarify what was implemented?

This is the webkit change that implemented the policy on iOS devices only
https://bugs.webkit.org/show_bug.cgi?id=267753

It does as described as an earlier proposal; if at the time of the check we have an alternative source following the one currently being checked, and if the currently checked source is using VP8 (SW only) or doesn't have hardware decoded VP9; then this source will be skipped.

If there's no source to try after, then it will be used.

full WebM support was only added in Safari on iOS 17.4, one of the reason for this decision was to reduce the potential for unexpected regressions.

annevk · 2024-03-15T07:35:38Z

(I did some digging and I think that WebKit has had the "wait for end tag" behavior for about 12 years, although seemingly it's only used for <track> elements. This is the commit I found that seems to introduce that: https://trac.webkit.org/changeset/102968/webkit.)

zcorpan · 2024-03-15T08:50:22Z

Waiting for the end tag for tracks is per spec: https://html.spec.whatwg.org/multipage/media.html#text-track-model:blocked-on-parser

zcorpan · 2024-03-15T10:22:15Z

The code in WebKit to try source elements is like the spec, i.e. evaluated in order without waiting for </video>, as far as I can tell.

https://github.com/WebKit/WebKit/blob/a7b863a49945946c913e6e194ec047da844094a4/Source/WebCore/html/HTMLMediaElement.cpp#L5404 responds to inserting source elements
https://github.com/WebKit/WebKit/blob/a7b863a49945946c913e6e194ec047da844094a4/Source/WebCore/html/HTMLMediaElement.cpp#L886 ("end tag seen") only invokes text track selection.

@jyavenard

It does as described as an earlier proposal; if at the time of the check we have an alternative source following the one currently being checked, and if the currently checked source is using VP8 (SW only) or doesn't have hardware decoded VP9; then this source will be skipped.

If there's no source to try after, then it will be used.

If WebKit doesn't wait for the </video> end tag, then whether there is an alternative source at the time depends on where the HTML parser yields, which can depend on network conditions, right?

zcorpan · 2024-03-15T10:49:11Z

Note that a server hang already prevents:

track selection
picture selection (it only starts when the img is seen)

The effect of not waiting for the end tag is that users will sometimes get a suboptimal format selected, for the same HTML. As I said above, I think processing should be predictable and not depend on network latency.

annevk · 2024-03-28T16:06:53Z

Yeah, that seems correct. If network conditions are bad the end user won't see media so in practice it prolly doesn't matter, but it seems reasonable to make this more solid, especially as we already have this logic for <track>.

annevk added needs implementer interest Moving the issue forward requires implementers to express interest topic: media agenda+ To be discussed at a triage meeting labels Jan 19, 2024

marcoscaceres mentioned this issue Jan 24, 2024

ready and complete events/Promises for responding to source file availability etc immersive-web/model-element#75

Open

past removed the agenda+ To be discussed at a triage meeting label Jan 26, 2024

past mentioned this issue Jan 26, 2024

Upcoming WHATNOT meeting on 1/25/2024 #10052

Closed

past mentioned this issue Feb 8, 2024

Upcoming WHATNOT meeting on 2/8/2024 #10094

Closed

zcorpan added the agenda+ To be discussed at a triage meeting label Mar 14, 2024

past removed the agenda+ To be discussed at a triage meeting label Mar 14, 2024

past mentioned this issue Mar 15, 2024

Upcoming WHATNOT meeting on 3/14/2024 #10200

Closed

mfreed7 mentioned this issue Mar 28, 2024

Need a callback for when children changed or parser finished parsing children WICG/webcomponents#809

Open

past mentioned this issue Mar 28, 2024

Upcoming WHATNOT meeting on 3/28/2024 #10215

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing UA to do <source> selection for media element #10077

Allowing UA to do <source> selection for media element #10077

marcoscaceres commented Jan 19, 2024 •

edited

annevk commented Jan 19, 2024

zcorpan commented Jan 19, 2024

marcoscaceres commented Jan 23, 2024

domenic commented Jan 26, 2024

jyavenard commented Jan 26, 2024

domenic commented Jan 26, 2024

zcorpan commented Jan 26, 2024 •

edited

dalecurtis commented Jan 26, 2024

jernoble commented Jan 26, 2024 •

edited

dalecurtis commented Jan 26, 2024

jernoble commented Jan 26, 2024

smaug---- commented Jan 27, 2024

jyavenard commented Jan 27, 2024

smaug---- commented Jan 27, 2024

jyavenard commented Jan 27, 2024

zcorpan commented Jan 29, 2024

zcorpan commented Mar 14, 2024

eeeps commented Mar 14, 2024 •

edited

jyavenard commented Mar 15, 2024

annevk commented Mar 15, 2024 •

edited

zcorpan commented Mar 15, 2024

zcorpan commented Mar 15, 2024

zcorpan commented Mar 15, 2024

annevk commented Mar 28, 2024

Allowing UA to do <source> selection for media element #10077

Allowing UA to do <source> selection for media element #10077

Comments

marcoscaceres commented Jan 19, 2024 • edited

What is the issue with the HTML Standard?

annevk commented Jan 19, 2024

zcorpan commented Jan 19, 2024

marcoscaceres commented Jan 23, 2024

domenic commented Jan 26, 2024

jyavenard commented Jan 26, 2024

domenic commented Jan 26, 2024

zcorpan commented Jan 26, 2024 • edited

dalecurtis commented Jan 26, 2024

jernoble commented Jan 26, 2024 • edited

dalecurtis commented Jan 26, 2024

jernoble commented Jan 26, 2024

smaug---- commented Jan 27, 2024

jyavenard commented Jan 27, 2024

smaug---- commented Jan 27, 2024

jyavenard commented Jan 27, 2024

zcorpan commented Jan 29, 2024

zcorpan commented Mar 14, 2024

eeeps commented Mar 14, 2024 • edited

jyavenard commented Mar 15, 2024

annevk commented Mar 15, 2024 • edited

zcorpan commented Mar 15, 2024

zcorpan commented Mar 15, 2024

zcorpan commented Mar 15, 2024

annevk commented Mar 28, 2024

marcoscaceres commented Jan 19, 2024 •

edited

zcorpan commented Jan 26, 2024 •

edited

jernoble commented Jan 26, 2024 •

edited

eeeps commented Mar 14, 2024 •

edited

annevk commented Mar 15, 2024 •

edited