New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"user-chooses": Does required constraints make any sense now? #6
Comments
Also I saw that the other ones were labeled "April 2020 Interim" but it's March 30th, right? |
Right, this is precisely why I want some experimentations and experimentation results before defining a new API/new API model. This will allow us to make sure we are all comfortable implementing the new model. And to decide whether this is a whole new model or not from getUserMedia as well. In general, I see constraints as a way to help the user agent pick the default selected devices presented to the user. That is all, maybe also rank the devices but I think this will be counter-intuitive to user. It does not seem to make sense to allow camera and microphone selection but only a subset would actually be selectable by the user. If a user wants to select microphone and not camera, user should have the choice. In that sense, this is not Audio&Video, but Audio|Video that constraints would end up defining. Or maybe we should have an API for microphone and a separate API for camera? I also do not really like the model of a prompt that would happen or not based on what the web page would provide as input. If a webpage says it wants the default camera and default microphone, why shouldn't the user be allowed to override this selection with different devices, or say no to camera but yes to microphone? I guess there might be some latitude here for heuristics based on page capture history, but the whole story is not clear to me. And I am unclear as to whether the spec will provide guidelines/requirements in that area as well/ |
I think this is a straw man. The sole difference is the tool to build a device picker no longer exposes labels. Everything remains fundamentally the same wrt user experience. The app retains the same power it had before to limit choice. Want to build a picker with only choices the user doesn't want? Go ahead. You can do that today. The fingerprint probing exposure from failed gum calls is exactly the same. We've said this is ok because tracking libraries won't risk a prompt. I think I answered everything else in w3c/mediacapture-main#667 (comment) |
Last point: I don't think a transition plan to a more limited API will succeed. I think constraints are here to stay. |
If we go with a device picker, we should design it for the best user experience, from scratch.
It depends. You could always reduce the power of the old API, for instance by only selecting the default devices with the old API (after some deprecation time) and using constraints as ideal in that case. Also, I doubt that the fact of modifying how we handle constraints will be seen as a more limited API. I might be biased but to me, constraints are over-complex and sub-used. Simplifying constraints would be beneficial. |
A powerful reason for a potential "user-chooses" API is that it works straight out of the box - no need to partially implement picking logic in the application - and that, if we get it right, we could guarantee consistent prompting behavior across browsers. Besides, if the prompt is actually good enough to do its job, why would prompting be a problem? If the user is to choose, why would letting the user choose be a problem? I would argue that there is a difference between an undesirable superflous yes/no re-prompt when the application has already done the choosing for you and a prompt that is actually selecting something that the user wants to select (and would otherwise select inside of application picker logic). All of my reasoning though is based on sometime in the future deprecating the old way. If we give up on that then making a suboptimal API is an option. But if it's optional, even far down the road, then we haven't really addressed the privacy concerns. |
@youennf Let's not confuse API with user experience. Firefox has had a picker forever. Join us. 😉 @henbos I'd caution against oversimplifying the problem. While it may seem appealing to expose prompt methods wholesale to JS, they're often a bad idea (see permission.request() or roc's old blog). There's a role for user agents to negotiate permission at an app's point of (media) access. Thus the media access API design is still an abstraction separate from a user agent's prompting story, even when that API puts requirements on it. I don't even agree "consistent prompting behavior across browsers" is a general goal. That's a common web developer ask, a different problem from what we're solving (end user privacy). For instance, Firefox has not committed to removing its prompt on seeing The goal of Like today, the app remains in control of what it wants to ask.
Because users don't want to be prompted for their camera and microphone every time. @henbos I don't want to dismiss criticisms that constraints were over-designed (they were), and while I'm glad That's important to stress since @youennf seems to suggest UX has to come before API here, even though the workin group agreed last interim to move ahead with w3c/mediacapture-main#667. |
@henbos I think you have raised a lot of good questions. Thank you. Overall, we are really talking about a very different model from the current Media Capture approach, much more like Screen Capture. Just as Screen Capture forced us to re-think the role of constraints, it seems to me that an "in-chrome" approach to Media Capture will require some new thinking. Given that we are really talking about a new model, I am wondering whether the right way to handle this might be to create a new "Media Capture and Streams Version 2" work item, rather than trying to make all these changes to the existing Media Capture and Streams document before bringing it to PR. This doesn't imply a new API, just that we use separate documents for the old approach and the new one, so that we can clearly document each one. Otherwise, I am concerned that we could confuse the reader, who will not be able to distinguish "old" from "new" approaches. |
Maybe we are trying to fit too many things in a single getUserMedia method. The main/sole use case for MediaDeviceInfo.label is a a web-based device picker. If we attach this API to a MediaStreamTrack, we could add a new API or try extending applyConstraints, given it can potentially already be used to switch between user and environment cameras. We probably want user activation whenever changing the device. This would also be conceptually consistent with what we are trying to do for speakers, where speaker authorisation is a simple user click, and, in the case user wants to change the selected speaker, a new API would trigger a browser device picker. As an extra bonus, no new track means no need to call replaceTrack, update MediaStreams, WebAudio nodes... This might simplify things for web developers. One potential worry is the handling of cloned tracks. Maybe that would be the web app job to update any cloned track with the newly selected device (no user prompt needed here, could be done with applyConstraints). |
We can do a lot of things, but I think we need to start with problems we want to solve. A couple of red flags for me here: to me it's not the goal of this spec to express all these things, but to create a model within which user agents can experiment in a web compatible way, and JS can express its needs. That model is:
That's expressly forbidden in this model. Also, that's just one use case (needing to replacing a source with another). The general use case is adding a second source. The latter more general use case supports the former.
It cannot. applyConstraints cannot change the source of a track. Not unless the user agent exposes a single unique camera that returns: console.log(track.getCapabilities().facingMode); // ["user", "environment"] E.g. like a motorized pivot camera. Sure, a user agent could in theory expose a bunch of virtual device all with the capability to mimic every other device of its kind, but it undermines the value of the model by doing so. What you describe sounds like an entirely different model. That's fine, but I think I'm going to need a convincing problem we don't solve today, to justify spending time considering a new model at this point. |
The problem we want to solve here is removing label info either entirely or for devices not used by the given web page.
I am not sure how common it is to add a second source of the same device type. Anyway, let's say we want that. The way we are doing this right now (and this proposal does not change anything) is for the web app to call getUserMedia a second time to pick a second device, potentially using enumerateDevices information to pass some specific constraints, like a deviceId. It seems one underlying goal that you might have is to allow a user agent to only expose granted devices as part of enumerateDevices. This is a fine goal and maybe we can go there one day. We probably want to expose some information anyway to let know the page that other devices can be used for instance. This would need more effort and can already be experimented by user agents by gradually exposing enumerateDevices information instead with the devicechange event.
OK
This is the current model. Some apps might want to select the same device as last time. Current API allows that. In general, I think the user agent is best suited to do that job.
As long as the user gives consent to use the new device, I do not see any real issue here. On the phone, several apps have a simple button to switch from the user facing camera to the environment facing camera. From the user point of view, the feed remains the same and is expected to go wherever it goes, only the source is changing.
Sounds fine. Aren't we somehow trying to deprecate label though?
Similar to 4 somehow, I do not see any difference between two cloned tracks and two getUserMedia tracks using the same underlying device.
'but not change source' seems like an artificial limitation. For a phone, a user agent can decide to expose one camera device, supporting both environment and user or two camera devices. Why shouldn't the user agent be allowed to have some UI allowing the user to switch cameras on the fly outside of the web page control? Also, if the page is capturing with both devices, there is no difference between applyConstraints(use_the_other_source) and clone-the-other-source-track-then-applyConstraints-then-replace-track-wherever-needed.
OK.
Maybe I am missing how different this model differs from today's model. Can you be more explicit? This change seems to me like an incremental change, which targets the issue of 'removing that offending label' or 'removing that device picker'. The other problem that it could solve is the fact that, apparently, user agents are not allowed to change the source of a capture track. I question this limitation. This change also has a potential good story for migrating web sites. First implement it, then sanitize labels for not granted devices to things like microphone1, camera2... |
So we have some experience with in-browser camera & mic pickers in Firefox to draw on.
I agree on the problem statement, but I see no reason to change our whole model over it. Conservatively if we look at how labels are used today, sites build rudimentary pickers. All they need to replace that effort is a tool to provoke an in-browser picker. This already almost works in Firefox by default, which is why I find it hard to comprehend what a leap this is for some: await getUserMedia({video: {deviceId: {exact: [...allOtherDeviceIds]}}); Admittedly, the above has API problems and UX problems, but we need to separate them: The API problems:
The UX problems (which are orthogonal i.e. already exist today on second-device requests!):
If you work on (or predominantly use) a browser without per-device permission (that doesn't tell you which device you're sharing), you'll be forgiven for thinking these problems as intrinsically linked. They are not. We solve the API problems with: await navigator.mediaDevices.getUserMedia({video: true, semantics: "user-chooses"}); This would be enough to force all user agents to show a picker of all devices. This seems to solve the API problem with no change to the model, with near-parity with all existing in-content device selection I've seen. That wins in my book. Leave UX to user agents. Now there are interesting UX-related corner cases here we can discuss in the interest of sharing, but I want to leave the pie in the sky first.
We have a basket for that. Just like w3c/mediacapture-main#646 would prevent sites from optimizing out camera- and mic-launching buttons, removing info of other devices would prevent sites from optimizing out camera- and mic-changing buttons in their config panel(s). The "interesting" UX-related corner cases I alluded to, include what to do e.g. when there's only one choice. |
@henbos Specifically on removing required constraints, note that Chrome today implements That API exists to allow a site to enforce its constraints while building a picker, or choosing another device outright. Most sites enforce some constraints. That API is also a trove of fingerprinting information. Luckily, await getUserMedia({video: constraints, semantics: "user-chooses"}); So merging w3c/mediacapture-main#667 would let us retire |
Exact constraints were added to the spec because participants thought that in some cases, for some apps, the app would prefer not to work at all rather than have to work beyond those requirements. My position at the moment is that we don't need to change this; removing required constraints now would only increase the uncertainty for developers, and have zero benefit for the users. |
The purpose of prompting and the user picking is...
My gut-reaction to the user making the choice is that we don't need a lot of constraints anymore.
But there is still value in specifying desired resolution and frame rate. If the application only wants X then exceeding X is just wasting resources. For example if the application is happy with VGA 20 fps then it wastes resources to open the camera at UltraHD 60 fps.
But what if your device(s) can't do what the application asks for?
Example 1: I have a single device and it can only do 30 fps but the application is asking for 60.
I would argue that 30 fps is better than no camera whatsoever.
I would also argue that if the requests rejects because of over-constraining, then we are exposing unnecessary information to the application.
Example 2: Front/back camera or multiple cameras. E.g. I have two cameras, one pointing at me and one pointing at my living room.
Maybe one of the cameras can do HD and the other can't and the application is asking for HD. When it was the application's job to do the picking for you, it made a lot of sense to rule out which device to pick. If the user is picking anyway, I'm not sure it is valid to rule out options. In getDisplayMedia() we purposefully prevented the application from influencing selection, ensuring that we only provide fingerprinting surface to whether audio, video and display surfaces are present.
I don't see why getUserMedia(), in a world where device picking is not the application's job, would be any different from getDisplayMedia(). I don't think it is valid to rule out one camera or the other. It is the user's decision whether to show their face or their living room.
Example 3: Audio+video? No, only audio? Re-prompt!
Today, getUserMedia() asks for the kinds of media that was specified. And they're required, with "audio+video" you either give both or none. So the application may ask for both only to have a mute button later (unnecessarily opening both camera and microphone, not ideal for privacy), or it asks, rejects and then asks again. Or the application asks the user in an application-specific UI which kinds to pass in to getUserMedia(), doing some of the choosing for the user outside the browser UI.
Discussion:
The text was updated successfully, but these errors were encountered: