Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"user-chooses": Does required constraints make any sense now? #6

Open
henbos opened this issue Mar 20, 2020 · 16 comments
Open

"user-chooses": Does required constraints make any sense now? #6

henbos opened this issue Mar 20, 2020 · 16 comments
Assignees

Comments

@henbos
Copy link
Contributor

henbos commented Mar 20, 2020

The purpose of prompting and the user picking is...

  1. Address privacy issues. If the user makes the decision, the application does not need to know what other options are available.
  2. Ensure consistent prompting behavior across browsers.

My gut-reaction to the user making the choice is that we don't need a lot of constraints anymore.
But there is still value in specifying desired resolution and frame rate. If the application only wants X then exceeding X is just wasting resources. For example if the application is happy with VGA 20 fps then it wastes resources to open the camera at UltraHD 60 fps.

But what if your device(s) can't do what the application asks for?

Example 1: I have a single device and it can only do 30 fps but the application is asking for 60.
I would argue that 30 fps is better than no camera whatsoever.
I would also argue that if the requests rejects because of over-constraining, then we are exposing unnecessary information to the application.

  • "60 fps" makes sense as a guideline, not as a requirement.
  • Constraints should not be a way to probe for capabilities.

Example 2: Front/back camera or multiple cameras. E.g. I have two cameras, one pointing at me and one pointing at my living room.
Maybe one of the cameras can do HD and the other can't and the application is asking for HD. When it was the application's job to do the picking for you, it made a lot of sense to rule out which device to pick. If the user is picking anyway, I'm not sure it is valid to rule out options. In getDisplayMedia() we purposefully prevented the application from influencing selection, ensuring that we only provide fingerprinting surface to whether audio, video and display surfaces are present.

I don't see why getUserMedia(), in a world where device picking is not the application's job, would be any different from getDisplayMedia(). I don't think it is valid to rule out one camera or the other. It is the user's decision whether to show their face or their living room.

  • Constraints may possibly influence device settings, but should not influence user choices (e.g. which device).
  • Device capabilities is not a valid guideline for what the "right choice" is. Example: even if HD is generally preferable, whether or not my camera can do HD has nothing to do with what the direction that camera is facing, which I as a user definitely care more about.

Example 3: Audio+video? No, only audio? Re-prompt!
Today, getUserMedia() asks for the kinds of media that was specified. And they're required, with "audio+video" you either give both or none. So the application may ask for both only to have a mute button later (unnecessarily opening both camera and microphone, not ideal for privacy), or it asks, rejects and then asks again. Or the application asks the user in an application-specific UI which kinds to pass in to getUserMedia(), doing some of the choosing for the user outside the browser UI.


Discussion:

  • Should constraints be able to limit the user choices? I would argue no.
  • Do constraints need to be more complicated than limits used for downsampling? Possibly not?
  • Should audio/video be optional? Seems like a good idea.
@henbos
Copy link
Contributor Author

henbos commented Mar 20, 2020

@jan-ivar and @youennf please share your thoughts

@henbos
Copy link
Contributor Author

henbos commented Mar 20, 2020

Also I saw that the other ones were labeled "April 2020 Interim" but it's March 30th, right?

@youennf
Copy link
Contributor

youennf commented Mar 20, 2020

Right, this is precisely why I want some experimentations and experimentation results before defining a new API/new API model. This will allow us to make sure we are all comfortable implementing the new model. And to decide whether this is a whole new model or not from getUserMedia as well.

In general, I see constraints as a way to help the user agent pick the default selected devices presented to the user. That is all, maybe also rank the devices but I think this will be counter-intuitive to user. It does not seem to make sense to allow camera and microphone selection but only a subset would actually be selectable by the user.

If a user wants to select microphone and not camera, user should have the choice. In that sense, this is not Audio&Video, but Audio|Video that constraints would end up defining. Or maybe we should have an API for microphone and a separate API for camera?

I also do not really like the model of a prompt that would happen or not based on what the web page would provide as input. If a webpage says it wants the default camera and default microphone, why shouldn't the user be allowed to override this selection with different devices, or say no to camera but yes to microphone?

I guess there might be some latitude here for heuristics based on page capture history, but the whole story is not clear to me. And I am unclear as to whether the spec will provide guidelines/requirements in that area as well/

@jan-ivar
Copy link
Member

I think this is a straw man. The sole difference is the tool to build a device picker no longer exposes labels. Everything remains fundamentally the same wrt user experience.

The app retains the same power it had before to limit choice. Want to build a picker with only choices the user doesn't want? Go ahead. You can do that today.

The fingerprint probing exposure from failed gum calls is exactly the same. We've said this is ok because tracking libraries won't risk a prompt.

I think I answered everything else in w3c/mediacapture-main#667 (comment)

@jan-ivar
Copy link
Member

jan-ivar commented Mar 23, 2020

Last point: I don't think a transition plan to a more limited API will succeed.

I think constraints are here to stay.

@youennf
Copy link
Contributor

youennf commented Mar 24, 2020

If we go with a device picker, we should design it for the best user experience, from scratch.
If we have to update the API to make sure the user experience is great, we should probably do it.
Once we have a good model, we should think of the transition plan.

Last point: I don't think a transition plan to a more limited API will succeed.

It depends. You could always reduce the power of the old API, for instance by only selecting the default devices with the old API (after some deprecation time) and using constraints as ideal in that case.

Also, I doubt that the fact of modifying how we handle constraints will be seen as a more limited API. I might be biased but to me, constraints are over-complex and sub-used. Simplifying constraints would be beneficial.

@henbos
Copy link
Contributor Author

henbos commented Mar 24, 2020

A powerful reason for a potential "user-chooses" API is that it works straight out of the box - no need to partially implement picking logic in the application - and that, if we get it right, we could guarantee consistent prompting behavior across browsers.

Besides, if the prompt is actually good enough to do its job, why would prompting be a problem? If the user is to choose, why would letting the user choose be a problem?

I would argue that there is a difference between an undesirable superflous yes/no re-prompt when the application has already done the choosing for you and a prompt that is actually selecting something that the user wants to select (and would otherwise select inside of application picker logic).

All of my reasoning though is based on sometime in the future deprecating the old way. If we give up on that then making a suboptimal API is an option. But if it's optional, even far down the road, then we haven't really addressed the privacy concerns.

@jan-ivar
Copy link
Member

@youennf Let's not confuse API with user experience. Firefox has had a picker forever. Join us. 😉

@henbos I'd caution against oversimplifying the problem. While it may seem appealing to expose prompt methods wholesale to JS, they're often a bad idea (see permission.request() or roc's old blog).

There's a role for user agents to negotiate permission at an app's point of (media) access.

Thus the media access API design is still an abstraction separate from a user agent's prompting story, even when that API puts requirements on it.

I don't even agree "consistent prompting behavior across browsers" is a general goal. That's a common web developer ask, a different problem from what we're solving (end user privacy).

For instance, Firefox has not committed to removing its prompt on seeing "browser-chooses", and there's no spec language to force it, because that wasn't the goal.

The goal of "user-chooses" was to minimally guarantee a prompt only when the user's choices exceed what the app asks for, allowing apps to replace their "control setting"-type pickers with it.

Like today, the app remains in control of what it wants to ask.

if the prompt is actually good enough to do its job, why would prompting be a problem?

Because users don't want to be prompted for their camera and microphone every time.

@henbos I don't want to dismiss criticisms that constraints were over-designed (they were), and while I'm glad "user-chooses" sparked an opportunity to think further, the two events seem largely orthogonal.

That's important to stress since @youennf seems to suggest UX has to come before API here, even though the workin group agreed last interim to move ahead with w3c/mediacapture-main#667.

@aboba
Copy link
Contributor

aboba commented Mar 25, 2020

@henbos I think you have raised a lot of good questions. Thank you.

Overall, we are really talking about a very different model from the current Media Capture approach, much more like Screen Capture. Just as Screen Capture forced us to re-think the role of constraints, it seems to me that an "in-chrome" approach to Media Capture will require some new thinking.

Given that we are really talking about a new model, I am wondering whether the right way to handle this might be to create a new "Media Capture and Streams Version 2" work item, rather than trying to make all these changes to the existing Media Capture and Streams document before bringing it to PR.

This doesn't imply a new API, just that we use separate documents for the old approach and the new one, so that we can clearly document each one. Otherwise, I am concerned that we could confuse the reader, who will not be able to distinguish "old" from "new" approaches.

@youennf
Copy link
Contributor

youennf commented Mar 26, 2020

Maybe we are trying to fit too many things in a single getUserMedia method.

The main/sole use case for MediaDeviceInfo.label is a a web-based device picker.
Given labels are only available after page starts to capture, what we might actually want is a way to change the device being used for a given live capture track.
We could use a browser picker that could already be in use and invocable from browser UI, similarly to what Chrome is apparently implementing for getDisplayMedia.

If we attach this API to a MediaStreamTrack, we could add a new API or try extending applyConstraints, given it can potentially already be used to switch between user and environment cameras. We probably want user activation whenever changing the device.

This would also be conceptually consistent with what we are trying to do for speakers, where speaker authorisation is a simple user click, and, in the case user wants to change the selected speaker, a new API would trigger a browser device picker.

As an extra bonus, no new track means no need to call replaceTrack, update MediaStreams, WebAudio nodes... This might simplify things for web developers.

One potential worry is the handling of cloned tracks. Maybe that would be the web app job to update any cloned track with the newly selected device (no user prompt needed here, could be done with applyConstraints).

@jan-ivar
Copy link
Member

jan-ivar commented Mar 26, 2020

We can do a lot of things, but I think we need to start with problems we want to solve.

A couple of red flags for me here: to me it's not the goal of this spec to express all these things, but to create a model within which user agents can experiment in a web compatible way, and JS can express its needs. That model is:

  1. User agents decide the units exposed as unique media input devices
  2. Apps express their constraints for a media input device it wants
  3. User agents pick device within those constraints
  4. A (source) device is shared as one or more tracks.
  5. "Once selected, the source of the MediaStreamTrack MUST NOT change."
  6. A track's label "MUST return the label of the object's corresponding source"
  7. Tracks can be cloned
  8. (Cloned) tracks may manipulate (output from) the source through applyConstraints (but not change source)
  9. Tracks can end (which may terminate a permission envelope)

what we might actually want is a way to change the device being used for a given live capture track.

That's expressly forbidden in this model.

Also, that's just one use case (needing to replacing a source with another). The general use case is adding a second source. The latter more general use case supports the former.

applyConstraints, given it can potentially already be used to switch between user and environment

It cannot. applyConstraints cannot change the source of a track. Not unless the user agent exposes a single unique camera that returns:

console.log(track.getCapabilities().facingMode); // ["user", "environment"]

E.g. like a motorized pivot camera.

Sure, a user agent could in theory expose a bunch of virtual device all with the capability to mimic every other device of its kind, but it undermines the value of the model by doing so.

What you describe sounds like an entirely different model. That's fine, but I think I'm going to need a convincing problem we don't solve today, to justify spending time considering a new model at this point.

@youennf
Copy link
Contributor

youennf commented Mar 27, 2020

We can do a lot of things, but I think we need to start with problems we want to solve.

The problem we want to solve here is removing label info either entirely or for devices not used by the given web page.

Also, that's just one use case (needing to replacing a source with another). The general use case is adding a second source. The latter more general use case supports the former.

I am not sure how common it is to add a second source of the same device type.
Can you be more specific about the use case?

Anyway, let's say we want that. The way we are doing this right now (and this proposal does not change anything) is for the web app to call getUserMedia a second time to pick a second device, potentially using enumerateDevices information to pass some specific constraints, like a deviceId.

It seems one underlying goal that you might have is to allow a user agent to only expose granted devices as part of enumerateDevices. This is a fine goal and maybe we can go there one day. We probably want to expose some information anyway to let know the page that other devices can be used for instance. This would need more effort and can already be experimented by user agents by gradually exposing enumerateDevices information instead with the devicechange event.

  1. User agents decide the units exposed as unique media input devices

OK

  1. Apps express their constraints for a media input device it wants

This is the current model.
I would rephrase it to: "apps express the constraints for a media input data it wants, not a media input device". Most apps do not care about a particular device as long as it provides audio or video the user actually wants to use.

Some apps might want to select the same device as last time. Current API allows that. In general, I think the user agent is best suited to do that job.

  1. User agents pick device within those constraints
  2. A (source) device is shared as one or more tracks.
  3. "Once selected, the source of the MediaStreamTrack MUST NOT change."

As long as the user gives consent to use the new device, I do not see any real issue here.
Can you be more specific about what will break here for the user? Or the web page, given the web page asked to change the source?

On the phone, several apps have a simple button to switch from the user facing camera to the environment facing camera. From the user point of view, the feed remains the same and is expected to go wherever it goes, only the source is changing.

  1. A track's label "MUST return the label of the object's corresponding source"

Sounds fine. Aren't we somehow trying to deprecate label though?

  1. Tracks can be cloned

Similar to 4 somehow, I do not see any difference between two cloned tracks and two getUserMedia tracks using the same underlying device.

  1. (Cloned) tracks may manipulate (output from) the source through applyConstraints (but not change source)

'but not change source' seems like an artificial limitation.
As long as a web page is granted access to both devices and they have the same media type, I do not see any real issue in changing the underlying source, at least when web page is aware of the change.

For a phone, a user agent can decide to expose one camera device, supporting both environment and user or two camera devices. Why shouldn't the user agent be allowed to have some UI allowing the user to switch cameras on the fly outside of the web page control?

Also, if the page is capturing with both devices, there is no difference between applyConstraints(use_the_other_source) and clone-the-other-source-track-then-applyConstraints-then-replace-track-wherever-needed.

  1. Tracks can end (which may terminate a permission envelope)

OK.

What you describe sounds like an entirely different model. That's fine, but I think I'm going to need a convincing problem we don't solve today, to justify spending time considering a new model at this point.

Maybe I am missing how different this model differs from today's model. Can you be more explicit?

This change seems to me like an incremental change, which targets the issue of 'removing that offending label' or 'removing that device picker'.
In particular, this does not require existing web sites to change anything about their current flow to enter a call, grant the prompt... The only adoption needed is in the 'device picker' pane, which might be less crucial.

The other problem that it could solve is the fact that, apparently, user agents are not allowed to change the source of a capture track. I question this limitation.
A web page might actually want to opt-in to a behavior where the audio input source matches the audio output so that plugging in a headset would automatically start using the headset microphone if audio output goes to the headset speakers.

This change also has a potential good story for migrating web sites. First implement it, then sanitize labels for not granted devices to things like microphone1, camera2...

@jan-ivar
Copy link
Member

jan-ivar commented Mar 27, 2020

So we have some experience with in-browser camera & mic pickers in Firefox to draw on.

The problem we want to solve here is removing label info either entirely or for devices not used by the given web page.

I agree on the problem statement, but I see no reason to change our whole model over it. Conservatively if we look at how labels are used today, sites build rudimentary pickers. All they need to replace that effort is a tool to provoke an in-browser picker. This already almost works in Firefox by default, which is why I find it hard to comprehend what a leap this is for some:

await getUserMedia({video: {deviceId: {exact: [...allOtherDeviceIds]}});

Admittedly, the above has API problems and UX problems, but we need to separate them:

The API problems:

  1. Won't prompt once you check ☑ Remember this decision. It's still a permission prompt.
  2. For web compat, we can't prompt if site already has permission to one of the choices.

The UX problems (which are orthogonal i.e. already exist today on second-device requests!):

  1. Canceling a second-device request is overly harsh on the site. Bug 1609578
  2. Our UX heavily biases toward a default choice, too many clicks to change device.
  3. Our (lack of) preview is biased toward initial prompt (where it might freak people out)
  4. We don't do a good job of simplifying our UX when there's just one choice.

If you work on (or predominantly use) a browser without per-device permission (that doesn't tell you which device you're sharing), you'll be forgiven for thinking these problems as intrinsically linked. They are not.

We solve the API problems with:

await navigator.mediaDevices.getUserMedia({video: true, semantics: "user-chooses"});

This would be enough to force all user agents to show a picker of all devices.

This seems to solve the API problem with no change to the model, with near-parity with all existing in-content device selection I've seen.

That wins in my book. Leave UX to user agents.

Now there are interesting UX-related corner cases here we can discuss in the interest of sharing, but I want to leave the pie in the sky first.

allow a user agent to only expose granted devices as part of enumerateDevices.

We have a basket for that. Just like w3c/mediacapture-main#646 would prevent sites from optimizing out camera- and mic-launching buttons, removing info of other devices would prevent sites from optimizing out camera- and mic-changing buttons in their config panel(s).

The "interesting" UX-related corner cases I alluded to, include what to do e.g. when there's only one choice.

@jan-ivar
Copy link
Member

@henbos Specifically on removing required constraints, note that Chrome today implements info.getCapabilities() which gives the site capability information about all devices after gUM.

That API exists to allow a site to enforce its constraints while building a picker, or choosing another device outright. Most sites enforce some constraints.

That API is also a trove of fingerprinting information.

Luckily, "user-chooses" provides feature-parity with this, without the massive information leak:

await getUserMedia({video: constraints, semantics: "user-chooses"});

So merging w3c/mediacapture-main#667 would let us retire info.getCapabilities() provided we leave constraints alone. 🎉

@alvestrand
Copy link
Contributor

Exact constraints were added to the spec because participants thought that in some cases, for some apps, the app would prefer not to work at all rather than have to work beyond those requirements.
If the app wants the user to have the widest range of choice, the app should use ideal constraints.

My position at the moment is that we don't need to change this; removing required constraints now would only increase the uncertainty for developers, and have zero benefit for the users.

@jan-ivar jan-ivar transferred this issue from w3c/mediacapture-main Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants