Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify track.label #128

Closed
jan-ivar opened this issue Dec 13, 2019 · 18 comments
Closed

Specify track.label #128

jan-ivar opened this issue Dec 13, 2019 · 18 comments
Assignees
Labels
privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. TPAC 2020

Comments

@jan-ivar
Copy link
Member

Right now, this spec says nothing about what the track.label should contain:

const stream = await navigator.mediaDevices.getDisplayMedia({video: true});
console.log(stream.getVideoTracks()[0].label); // ???

Implementations differ so we should specify this:

  • Firefox says: Specify track.label · Issue #128 · w3c/mediacapture-screen-share or Primary Monitor
  • Chrome says: window:123:0 or screen:2067749241:0
  • Safari and Edge say: ""

The Firefox behavior uses the window title or monitor name, which seems helpful for the visually impaired. OTOH it might be a privacy issue if a window title is longer than what is visibly shown in the actual shared video (e.g. a user carefully resized a window to conceal an account number in the title). Wrinkle: most desktop browser windows use a layout without title these days, so this might actually be new information and surprising.

How do we balance these benefits?

One solution may be to limit the label length to reveal only the part of the window title that is currently, or was previously, visible in the video, whichever is longer. This might be hard to calculate (fonts etc) on some OSes, and the length may grow over time.

Whatever we come up with, we should probably say something.

@youennf
Copy link
Collaborator

youennf commented Dec 13, 2019

The Firefox behavior uses the window title or monitor name, which seems helpful for the visually impaired.

Do we know how websites are using the label field?
Would it not be better for the User Agent to easily list what is being shared by the user to a given page?

Related to getDisplayMedia, the spec allows dynamically changing the source of a display track.
I don't think implementations expect track labels to change.

@jan-ivar
Copy link
Member Author

jan-ivar commented Dec 13, 2019

Would it not be better for the User Agent to easily list what is being shared by the user to a given page?

How do you mean? Like Firefox does with label above, or something else?

Related to getDisplayMedia, the spec allows dynamically changing the source of a display track.

Yeah @henbos wanted that, I think for active-tab capture? I actually worry that contradicts Mediacapture-main: "Once selected, the source of the MediaStreamTrack MUST NOT change."

I argued the right way to do active-tab capture was to create a virtual "active tab" device source that did just that, and expose that as a separate choice (which is the obvious UX anyway). That would preserve the model (the exposed device does not change to one of the other exposed ones). Much like the "Default - External Microphone (Built-in)" default device Chrome still exposes.

I don't think implementations expect track labels to change.

I don't see an immediate problem with that. The Chrome default device already does that when I yank my earbuds (on the slightly related MediaDeviceInfo.label anyway, not the ended track.label).

@youennf
Copy link
Collaborator

youennf commented Dec 13, 2019

Would it not be better for the User Agent to easily list what is being shared by the user to a given page?

How do you mean? Like Firefox does with label above, or something else?

I am trying to think of the main uses for track.label in web pages.
One use case for labels is device picker, but this does not apply to getDisplayMedia, nor track.label.

The visually impaired user scenario seems best handled by the User Agent itself.
User Agents provide capture indicators that could more properly identify the sources than what web pages could do.

One use case is presenting multiple (say remote) tracks and the website allows switch between them using the track labels. I guess most websites would use "Alice Camera" or "Alice Screen" instead of "Built-in camera" or "Primary Monitor".

There is the case of multi display sharing, in which case having "Alice Primary Monitor" or "Alice Presentation XX" might somehow help.

Any other scenario in mind?

@jan-ivar
Copy link
Member Author

One use case for labels is device picker, but this does not apply to getDisplayMedia

Site may want to put what's currently being captured in a button:

[Sharing: Alice's Presentation XX]

Whether clicking on this button brings up an in-content selector or in-chrome one (or other choices like "stop sharing") seems irrelevant. This also seems true regardless of how many sources are being shared concurrently.

@youennf
Copy link
Collaborator

youennf commented Dec 13, 2019

Site may want to put what's currently being captured in a button:

For camera/microphone sources, I do not see track labels as really useful to show to users.
Do we know websites using it this way?

I would be interested in understanding what native applications are doing and what would be needed to offer the same feature set. Might be interesting to know what web app developers are requesting in that area as well.

[Sharing: Alice's Presentation XX]

Sure, so track.label would be 'XX' I guess.

Probably track.label would be used according the display type.
And would make most sense for "window", "application" and "browser".
In some cases, the label for "window" might be useful but not always, which might be an issue for websites to adopt this.

In case of "window" and maybe "browser", the name can change (say you change the name of the presentation). If it becomes an important part of the UI, it would need to be kept in sync, hence some eventing mechanism.

In terms of specification, we could define label values according the display type.
Or we could beef-up MediaTrackSettings with new information since the web application might anyway need to query the settings to generate its own label.

I am not against the idea, at least as long as we are not leaking new information.
I am wondering how much websites will actually use it and how much effort we should put there.

This also seems true regardless of how many sources are being shared concurrently.

Not really, if there is only one source from Alice, [Alice] is probably the most meaningful information.

@henbos
Copy link
Contributor

henbos commented Dec 16, 2019

(Me commenting without having followed the entire discussion, just in response to this one quote, forgive me if irrelevant...)

Once selected, the source of the MediaStreamTrack MUST NOT change.

This quote seems up for interpretation. Some cameras are able to zoom in. Is using the zoom changing the source? Probably not. With regards to changing tab, you could say this is changing source because it is showing something else. But you could also say that the source was always the "browser", and the tab is "zooming in" on a different part of the browser's many surfaces.

I feel like I probably sound like I'm trying to find a loop-hole in the text but I'm genuinely curious about the intent here or more importantly what we want it to say.

Allowing the browser to take responsibility for reconfiguring what is shared, whether that is "which camera?" or "which tab?", sounds like a good thing. In my (poorly informed) opinion, letting the application handle device selection logic was a mistake.

@jan-ivar
Copy link
Member Author

jan-ivar commented Dec 16, 2019

@henbos Seems clear to me in context: "The provided media MUST include precisely one track of each media type in requestedMediaTypes. The devices chosen MUST be the ones determined by the user. Once selected, the source of a MediaStreamTrack MUST NOT change."

It says once a user has picked a source, the user agent cannot pick a different source.

E.g. if a user picks "PowerPoint", the user agent is not allowed to change it to "My Calendar" or "Entire Desktop" later. This may seem obvious, but specs need to spell out the obvious.

It doesn't matter what the choices are. This goes to user trust:

If, given tabs A, B, and a well-explained "Follow active tab" device C, a user chooses C, I see no conflict in the model, provided C is always C, even if it sometimes looks like B.

letting the application handle device selection logic was a mistake

We're in screen-capture where we didn't! But there are three parties involved:

  1. application
  2. user agent
  3. user

This spec is super clear only the user (3) chooses.

@jan-ivar
Copy link
Member Author

jan-ivar commented Dec 17, 2019

For camera/microphone sources, I do not see track labels as really useful to show to users.
Do we know websites using it this way?

@youennf track.label has been there forever, and is merely short for

(await navigator.mediaDevices.enumerateDevices())
  .find((d => d.deviceId == track.getSettings().deviceId).label

...and is used all the time:

I'd argue, to my point, that when users click above they don't yet know whether they'll get a drop-down or a selector popup window. Of course, on the back-end, if it's the former, then using deviceInfo.label is obvious; if it's the latter, then track.label is obvious.

E.g. for symmetry (more than great design) one might imagine another line in the above UX:

Presentation
Little League Fundraiser - Powerpoint

Of course, an application may allow any number of concurrent streams (for camera or screenshare), that's not the point. The point is: showing the current selection you've made is an enforcer in any selection model, and serves as a reminder of what you're sharing (which may not be obvious otherwise, even with a tiny self-view).

I see no inherent difference here between specs, just because one button is backed by in-content selection https://github.com/w3c/mediacapture-main/issues/652, and the other by in-chrome selection. Labeling current choices seems like a good web principle to me, not just for the hearing and/or visually impaired.

@youennf
Copy link
Collaborator

youennf commented Dec 19, 2019

@youennf track.label has been there forever, and is merely short for

(await navigator.mediaDevices.enumerateDevices())
  .find((d => d.deviceId == track.getSettings().deviceId).label

This algorithm is only true for capture tracks, but there are many different types of tracks.
A single getter with different meanings for each track type does not seem really useful to me if you have to know the track type to apply the correct processing on track.label.

track.label is also conveying a lot of information, all in one string, which has drawbacks.
For instance, as a website I might want to localise my web page. If my website user wants a 'French' version, I would like to provide it, even if the OS is English based. This would mean translating 'Main screen' to 'Ecran principal' for instance...
More focused values (like is a track associated with a default device?) seem more appropriate in that case and could be more easy to use for other representations (speech synthesis for instance).

One case where we could think track.label is useful is remote tracks.
In practice, remote track labels do not seem all that useful, especially in cases where there is an SFU (content switching, pre-reserved tracks...).
Localisation might make things even more complex here.

@alvestrand
Copy link
Contributor

Speaking personally, I think adding a label attribute to "track" was a dumb idea, and we should use it as little as possible.
The device label is (comparatively) well defined and has a well defined purpose.
As far as I remember, track.label doesn't surivive transmission across a PeerConnection either.

@jan-ivar
Copy link
Member Author

jan-ivar commented Dec 30, 2019

Neither does contentHint, but lets focus on screen-capture, which is where this issue is filed (if we want to question the whole model, I suggest we file a new issue on mediacapture-main).

For better or worse, Mediacapture-main does define track.label, which is useful for tracks backed by user selection, like here. Thus, this spec (screen-capture) needs to define how to handle it, returning "" at minimum.

I'd like to do better. I think it would be useful to return the (already visible part of the) name of a window being shared, if a window is being shared. I see no localization issue here, since the goal is to enforce what the user chose in the selector, where the same window titles were not localized.

If my website user wants a 'French' version, I would like to provide it, even if the OS is English based.

The user agent is already free to show a french picker with 'Ecran principal', so provided track.label matches that, I don't see a localization issue.

The primary goal is to show the user what they chose. If this seems redundant in the typical case, imagine an app juggling multiple shares at once. How is a user supposed to remember which is which? Previews of screen-capture often look alike unless they're big, and not all apps can afford big previews in their design.

Even with a single share, I think we're assuming too much about how apps work today. Imagine an app that holds on to my window share, using track.enabled = false, takes me through an unrelated task, and brings me back 5 minutes later. Will I remember what I was sharing (what I picked >5 minutes ago)? Is there anything the site could do to remind me, short of screen-scraping the title or showing me a giant preview?

@youennf
Copy link
Collaborator

youennf commented Dec 31, 2019

The primary goal is to show the user what they chose.

I see, so this is all about the user doing the capture, not about remote users.

imagine an app juggling multiple shares at once

I think it is pretty rare for a user to share more than two video streams at the same time.

Imagine an app that holds on to my window share, using track.enabled = false, takes me through an unrelated task, and brings me back 5 minutes later.

If screen capture is disabled by the website after 5 minutes, I am not sure we can expect users to even remember that screen capture is on. I wonder whether we should protect users from that.
We have that debate elsewhere, but the privacy indicator may remain on even if enabled is false, which might be a good enough defence.

Is there anything the site could do to remind me, short of screen-scraping the title or showing me a giant preview?

Since it is all about the local user, the OS can provide the information of which website is capturing what (or able to capture), for instance as part of the privacy indicator.

In any case, track.label does not seem like the best way to expose that information.
It currently mixes device type (camera/microphone) and device name for getUserMedia.
If we follow this principle, it would mix the display type with the display title.
It seems like a dedicated field with just the display title would be a better fit.

@jan-ivar
Copy link
Member Author

jan-ivar commented Jan 1, 2020

It currently mixes device type (camera/microphone) and device name for getUserMedia.

@youennf I'm not following. What do you mean "mixes"? The device type is in track.kind, the device name is in track.label.

If we follow this principle, it would mix the display type with the display title.

Not really. The display surface type is in track.getSettings().displaySurface, the display surface name is in track.label.

@youennf
Copy link
Collaborator

youennf commented Jan 2, 2020

@youennf I'm not following. What do you mean "mixes"? The device type is in track.kind, the device name is in track.label.

Capture track labels are often something like: "iMac Microphone" or "FaceTime Camera", which contain the kind information. It would seem consistent to have something like "iMac Screen" or "My super presentation Window".

The display surface type is in track.getSettings().displaySurface, the display surface name is in track.label.

As I said earlier, if we decide to expose this information, it seems more natural to me to use something like track.getSettings().displayName.

@jan-ivar
Copy link
Member Author

jan-ivar commented Jan 2, 2020

Capture track labels are often something like: "iMac Microphone"

For built-in devices maybe, but that's a red herring I claim. "Logitech BRIO" is a counter example (its camera and microphone labels are identical). We ultimately don't control that namespace, so I don't think we can draw anything structural from lack of specificity in (some) names (types embedded in names is a classic problem/pattern in many domains).

@jan-ivar jan-ivar self-assigned this Oct 8, 2020
@aboba aboba added the TPAC 2020 label Oct 8, 2020
@dontcallmedom dontcallmedom added the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Nov 27, 2020
@amiregelz
Copy link

Is there any update on this? this feature could be super useful for multi-screen sharing

@eladalon1983
Copy link
Member

I'd like to do better. I think it would be useful to return the (already visible part of the) name of a window being shared, if a window is being shared.

Specifying the "already visible part of a window name" sounds difficult even if only thinking of a single platform, let alone different ones. I'm on a Mac atm. What is the visible part of my current window? The word "Chrome" is not part of the window; it's part of the macOS menu bar. And on Windows, what is the window name if various interesting customizations are employed? How does one read the "visible part" of this name?


I'm triaging the issues in this repo. I see this issue has been inactive for ~1.5 years now. Shall we close it?

@alvestrand
Copy link
Contributor

There seems to be no consensus that making a normative statement on what the label should be is either necessary or very useful. Suggest closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. TPAC 2020
Projects
None yet
Development

No branches or pull requests

8 participants