Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Media Capabilities #218

Open
chcunningham opened this issue Nov 14, 2017 · 25 comments

Comments

@chcunningham
Copy link

commented Nov 14, 2017

Hello TAG!

I'm requesting a TAG review of:

Further details (optional):

You should also know that...

The API is available in Chrome behind a flag. --enable-blink-features=MediaCapabilities
Implementation bugs are tracked here.

We'd prefer the TAG provide feedback as (please select one):

  • open issues in our Github repo for each point of feedback
  • open a single issue in our Github repo for the entire review
  • [x ] leave review feedback as a comment in this issue and @-notify [mounirlamouri, chcunningham]
@plinss plinss added the extra time label Feb 2, 2018
@plinss plinss added this to the tag-f2f-london-2018-01-31 milestone Feb 2, 2018
@torgo

This comment has been minimized.

Copy link
Member

commented Feb 2, 2018

Discussed at London f2f day 3

@triblondon

This comment has been minimized.

Copy link

commented Feb 2, 2018

Sangwhan to write up this review over dinnertime today.

@triblondon

This comment has been minimized.

Copy link

commented Feb 2, 2018

Issues raised in conversation:

  • Privacy: potential for fingerprinting, private mode is insufficient mitigation
@foolip

This comment has been minimized.

Copy link

commented Feb 27, 2018

FYI, there is now an Intent to Ship: Media Capabilities: decoding on blink-dev.

I note that this review was slower than usual, with 2.5 months passing before there was any activity, and sounds like there's still a write-up to come? Feedback at any stage of the lifecycle of a spec is welcome of course, but I'll suggest in the blink-dev thread to not block waiting for more feedback.

@cynthia

This comment has been minimized.

Copy link
Member

commented Feb 27, 2018

@foolip apologies for the delay. We try to triage incoming reviews as soon as we see them, but there are times stuff falls through the cracks. I think it’s safe to not consider this a blocker for shipping the feature, I’ll summarize the discussion from the F2F into a write-up shortly.

@cynthia

This comment has been minimized.

Copy link
Member

commented Mar 6, 2018

Apologies that this took so long. @chcunningham @foolip

As for the privacy issues, thanks for the links related to fingerprinting. The S&P questionnaire link in the original review request seems to be a 404, could you clarify on this? https://github.com/WICG/media-capabilities/blob/master/security-privacy-questionnaire.md

First pass review I notice a inconsistency with a API that touches on the same domain - Web Audio. Web Audio defines channels in the form of a unsigned long (which does obstruct away the presence of a low frequency channel) and the sample rate is a float. I don't have a strong opinion on which is better, but types for parameters touching the same concepts should probably be consistent. How to deal with the presence of a low frequency channel is an open question though - and whether or not exposing this detail is actually useful to the content authors.

The content mime type would most likely require additional parsing from each application that uses this - would it make sense to provide this in structured data to make it easier to use? It seems like most content authors would do codec checks via regex or substring matching with this approach, which isn't great. A section in the explainer (https://github.com/WICG/media-capabilities/blob/master/explainer.md#why-not-a-list-of-formats) seems to touch on this, but the intention for this review comment is different from the one mentioned here.

A normative definition of what defines a screen change (or a reference back to a normative definition) would be helpful.

Minor question 1: The explainer example code seems to suggest that the screen color depth is a string (the spec is missing a definition for this though) - is there any particular reason for this decision?

Minor question 2: The explainer touches on HDCP - but that isn't on the spec. Wouldn't the approach in the explainer break when a user launches the content on a HDCP capable screen, starts playback, then drag it into a non-HDCP capable screen?

Since it is unclear exactly what from the spec is shipping - would you mind sharing the CL that relates to the I2S?

@mounirlamouri

This comment has been minimized.

Copy link

commented Mar 6, 2018

The S&P questionnaire link in the original review request seems to be a 404, could you clarify on this? https://github.com/WICG/media-capabilities/blob/master/security-privacy-questionnaire.md

I believe the link is working. Maybe GH had troubles when you tried?

Web Audio and channels

As mentioned in the spec, channels is still a placeholder and we do not currently use it in Chrome's implementation. I have filed w3c/media-capabilities#73 to make it clearer.

The content mime type would most likely require additional parsing from each application that uses this - would it make sense to provide this in structured data to make it easier to use?

I'm not entirely sure what you meant by this. Specifically, what you mean by "additional parsing from each application". I would expect web applications to copy-paste the type in their code or read it directly from a manifest of some sort.

A normative definition of what defines a screen change (or a reference back to a normative definition) would be helpful.

As mentioned below, only part 2 of the spec is something we are launching in Chrome. Part 3 is more draft-y and most of it was or will be merged into a CSS spec. This will likely be the case with the change event if it ever happens.

Minor question 1: The explainer example code seems to suggest that the screen color depth is a string (the spec is missing a definition for this though) - is there any particular reason for this decision?

Color depth changes we had were merged into the appropriate CSS spec. I believe 3.3 is a leftover from a removel. I've filed w3c/media-capabilities#74

Minor question 2: The explainer touches on HDCP - but that isn't on the spec. Wouldn't the approach in the explainer break when a user launches the content on a HDCP capable screen, starts playback, then drag it into a non-HDCP capable screen?

HDCP was split into another specification with a brief explainer in the directory. It will be an extension of EME. Your point about EME and screen changes is correct though I believe CDM might deal with this. The screen change event would be another way but the intent of this event is larger and could be fired when the screen size has changed.

Since it is unclear exactly what from the spec is shipping - would you mind sharing the CL that relates to the I2S?

That's a very good point. Part 2 is the one that is shipping in Chrome: https://wicg.github.io/media-capabilities/#decoding-encoding-capabilities

@cynthia

This comment has been minimized.

Copy link
Member

commented Mar 6, 2018

I believe the link is working. Maybe GH had troubles when you tried?

It seems so - I just tried again and it works just fine.

I'm not entirely sure what you meant by this. Specifically, what you mean by "additional parsing from each application". I would expect web applications to copy-paste the type in their code or read it directly from a manifest of some sort.

I imagined a use case where the content author wants to parse out just the codec and not the container information, with a string based format this would require parsing the string.

As mentioned below, only part 2 of the spec is something we are launching in Chrome. Part 3 is more draft-y and most of it was or will be merged into a CSS spec. This will likely be the case with the change event if it ever happens.

It would be great if the spec could be trimmed down to only what is shipping. Stale draft material tends to confuse both implementors and content authors.

@mounirlamouri

This comment has been minimized.

Copy link

commented Mar 7, 2018

Good point for the spec state, I will add a warning on top of section 3 mentioning that it's still WIP.

Regarding the codecs string, we require the container and the codec, such as video/webm;codecs=vp8. I believe that most places in the web platform ask for formats in this form (old APIs would accept container-only).

@torgo torgo changed the title Review Request: Media Capabilities Media Capabilities Oct 30, 2018
@cynthia

This comment has been minimized.

Copy link
Member

commented Oct 31, 2018

The framerate is the number of frames used in one second (frames per second). It is represented either as a double or as a fraction.

This is a bit strange, but I'm guessing some sort of compatibility legacy reason? Would be useful to know why this is like this (and if this is the long term right way forward)

@cynthia

This comment has been minimized.

Copy link
Member

commented Oct 31, 2018

Hey all,

Thanks for filing this issue. We took it up during our Paris F2F. Apologies that it took so long.

I had a question about how this could be use to test capabilities in a multiple media stream context. For example, can we understand if it's possible to efficiently decode more than one video stream at once? Or get a maximum number of streams/channels/densities at which the client would hit limits? This case could come up in an RTC scenario. There may be cases where decode won't be smooth when you have two decoders or a encoder and decoder pair running, for example.

We also had questions about naming and types of the returned capabilities, specifically smooth and powerEfficient. These names imply a guarantee - despite what you've already written about how they are not. Have there been any alternative names considered? Curious if this can be addressed somehow? (We ballbrainstormed ''janky or powerInefficient as poor choices, but with the logic inverted.)

Thanks again, and looking forward to hearing back.

@chcunningham

This comment has been minimized.

Copy link
Author

commented Oct 31, 2018

Hey Cynthia,

Re: framerate, this stems discussion here
w3c/media-capabilities#12

But its recently being reconsidered
w3c/media-capabilities#96

Happy to have your input.

@chcunningham

This comment has been minimized.

Copy link
Author

commented Oct 31, 2018

For example, can we understand if it's possible to efficiently decode more than one video stream at once?

Its not easy to know with current API shape and I haven't thought much about ways the API could be changed for this use case. If you wanted to show 2 videos of same resolution you could approximate by doing a query that doubles the framerate. For sw codecs this is decent. For hw codecs it will depend how the hw resources are divided (e.g. by time slice vs by playback). AFAIK platform decoder APIs don't often surface parallel session limits up front, so this would involve some guessing/learning even for implementers (not a deal breaker).

These names imply a guarantee...

We could prefix the names with "ideally" or "typically"... I tend to favor the shorter names though. I expect sophisticated users of the API to understand that nothing is ever guaranteed in media ;).

@torgo torgo removed this from the 2018-11-20-telcon milestone Nov 28, 2018
@chrisn

This comment has been minimized.

Copy link

commented Jan 10, 2019

I think there's a need to support remote playback capabilities. The Remote Playback API describes three modes of operation: media mirroring, media remoting, and media flinging.

In cases where the UA is requesting media content to pass through to a remote playback device (media mirroring and media remoting), you'd want to use the capabilities of the remote playback device to decide which media encoding to request from the server. So capability discovery is something we'd want to consider in the Open Screen Protocol design.

Discussion in the Second Screen WG so far has focused mainly on the media flinging case, where the remote playback device is typically handed the URL of the media, so would do its own negotiation with the server, so maybe here there's less of a need to expose the remote device capabilities? I'd need to think about this some more to be more certain though.

cc @mfoltzgoogle

@cynthia

This comment has been minimized.

Copy link
Member

commented Apr 10, 2019

@chcunningham did you folks have a chance to discuss this internally about the feedback above?

@chcunningham

This comment has been minimized.

Copy link
Author

commented May 2, 2019

@cynthia apologies for the delay.

Re: native APIs for concurrent playbacks, I'll have to double check - some changes coming to new Android, but not sure if this particular query is possible. @jernoble for Safari/Mac (and thoughts on the use case).

Re: webRTC - the API currently doesn't cover WebRTC decoding at all. Its debatable whether it should. There is some ongoing exploration of doing this for WebRTC encoding via encodingInfo()... The alternative approach is to augment WebRTC to better handle capabilities questions in its domain.

@mounirlamouri, thoughts on Remote Playback? What is capability discovery like in that API right now?

@mounirlamouri

This comment has been minimized.

Copy link

commented May 2, 2019

Regarding remoting, I think supporting something via Open Screen Protocol and exposed through the Presentation API would be interested but I do not think we should extend the Remote Playback API which is meant to be a very simple API to play things remotely.

@deanliao

This comment has been minimized.

Copy link

commented May 8, 2019

Hi TAG,

I'm implementing MediaCapabilities encodingInfo() API, specifically "transmission" type.
Intent to Implement
Design doc
I'm requesting privacy / security review as it adds fingerprinting surface just like decodingInfo(). Except above, do I need to provide any information for launching the encodingInfo() method?

@chcunningham

This comment has been minimized.

Copy link
Author

commented May 8, 2019

Re encodingInfo(), I recommend holding off on TAG review of that for a bit longer. We are still discussing the API internally (particularly the WebRTC part), so lets wait for that dust to settle (should know soon).

@mfoltzgoogle

This comment has been minimized.

Copy link

commented May 8, 2019

Re: #218 (comment)

@chrisn Sorry I missed this earlier, it got filtered into a folder.

Capability queries for the Open Screen Protocol are being discussed in webscreens/openscreenprotocol#123. Depending on where it lands, it should help user agents accurately determine which devices are capable of remote playback of a specific media element. But I didn't see a direct impact on the Media Capabilities API itself, since the idea is the user agent (not the Web application) figures out the best way to play the media remotely.

For the Presentation API, if the user agent rendering the presentation supports the Media Capabilities API, the presentation can use that to make choices about what media to play; but the API should work the same as on any other document.

Happy to discuss further at our upcoming F2F.

@hober

This comment has been minimized.

Copy link

commented May 23, 2019

Hi, @kenchris and I took a quick look at this during our F2F in Reykjavik. We note that the two pending bits of review that got added to this issue recently—decodingInfo() for encrypted media, and the "transmission" type of encodingInfo()—have fingerprinting implications, but the Security & Privacy Questionnaire assessment hasn't been updated in light of these new features. Could you update your security & privacy assessment accordingly?

Additionally, it's difficult for us to review the "transmission" type for encodingInfo() as your design document is not visible to non-Google employees. We'll hold off on reviewing this aspect until your internal dust settles and you make a design document public.

@mounirlamouri

This comment has been minimized.

Copy link

commented May 23, 2019

@hober and @kenchris I do not believe that we need to update the Security & Privacy Questionnaire as the potential fingerprinting concerns for encrypted media and "transmission" type are the same category as the fingerprinting concerns of the rest of the API.

@deanliao can you create a publicly visible version of your design document?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.