Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update getusermedia.html #651

Closed
wants to merge 1 commit into from

Conversation

guest271314
Copy link

Include referece to "It also allows the manipulation of audio output devices (speakers and headphones)." https://w3c.github.io/mediacapture-main/#privacy-and-security-considerations and

MediaDeviceKind Enumeration description
audiooutput Represents an audio output device; for example a pair of headphones.

at "sources" definition https://w3c.github.io/mediacapture-main/#dfn-source.

Fixes #650.

Include referece to "It also allows the manipulation of audio output devices (speakers and headphones)." https://w3c.github.io/mediacapture-main/#privacy-and-security-considerations and 

`MediaDeviceKind` Enumeration description
`audiooutput` Represents an audio output device; for example a pair of headphones.

 at "sources" definition https://w3c.github.io/mediacapture-main/#dfn-source.
@guest271314
Copy link
Author

“This work is dedicated to the public domain”
/guest271314/

@jan-ivar
Copy link
Member

While this spec does define enumeration of "audiooutput" output devices, they're not "sources", as defined in this terminology section. They're sinks, and defined in detail in https://w3c.github.io/mediacapture-output/

@guest271314
Copy link
Author

@jan-ivar

It should be made unequivocally clear in the specification that it is possible to capture audio output - not exclusively microphone input - even if the terminology used is "input" and "sink" - with an example of the canonical means to do so using the API's in the specification.

The suggestion to use getDisplayMedia() to capture system sound outpu lacks a concrete minimum example. The expected result is not possible based on the tests that have tried.

Whether the term of art used is "sink" or "source", "input" or "output", when the code at #650 (comment) is run at Firefox 70 and Nightly 73 the obbservable result is that only audio output is captured and recorded, not microphone.

Is your position that capturing audio output using the above linked code is still considered to be capturing a "sink"?

@jan-ivar
Copy link
Member

Whether the term of art used is "sink" or "source", "input" or "output", when the code at #650 (comment) is run at Firefox 70 and Nightly 73 the obbservable result is that only audio output is captured and recorded, not microphone.

Is your position that capturing audio output using the above linked code is still considered to be capturing a "sink"?

No. The model is "Browsers provide a media pipeline from sources to sinks". "A MediaStreamTrack object represents a media source". Looking at your code:

    }) => label === "Monitor of Built-in Audio Analog Stereo" // Firefox
           || kind === "audiooutput" && groupId !== "default" // Chromium

The device.kind of the "Monitor of Built-in Audio Analog Stereo" device here in Firefox is "audioinput"—a virtual audio input that Firefox (or the underlying Linux OS) has chosen to expose as a source.

The spec allows us to do this, saying "User Agents MAY allow users to use any media source, including pre-recorded media files."

Importantly, it is not an "audiooutput". This is true even if you enable media.setsinkid.enabled in Firefox (i.e. not some side-effect or oversight of Firefox not exposing "audiooutput" devices by default).

"audiooutput" devices are sinks for use with setSinkId:

const devices = await navigator.mediaDevices.enumerateDevices();
const speaker = devices.find(({kind}) => kind == "audiooutput");
await element.setSinkId(speaker.deviceId);

A physical device that is both a source and a sink, like a mic'd headset, shows as two devices: one "audioinput" device and a separate "audiooutput". They share the same groupId.

This supports having devices that are only sinks, only sources, and both. Makes sense?

@guest271314
Copy link
Author

The device.kind of the "Monitor of Built-in Audio Analog Stereo" device here in Firefox is "audioinput"—a virtual audio input that Firefox (or the underlying Linux OS) has chosen to expose as a source.

The spec allows us to do this, saying "User Agents MAY allow users to use any media source, including pre-recorded media files."

This supports having devices that are only sinks, only sources, and both. Makes sense?

The source in that case is audio output to headphones or speakers, correct?

Your attempt to clarify the capabilities described in the specification does provide clarity to some extent.

The point of this PR and linked issue is to make it unambiguously clear that it is possible to capture audio output, as a "source", to speakers or headphones using the methods defined in this specification, that is, precisely under which cases a "sink" becomes a "source", with accompanying canonical code example to do so added to the specification - or, make it clear that is not possible per this specification, so that users who have that requirement can abandon all hope of achieving that requirement using enumerateDevices() and getUserMedia(). That requirement is possible given certain prerequisites, e.g., certain *nix OS's and Firefox. If clearly specified, should be possible by default at all implementations of the specification, so that implementers are not able to state "the specification does not provide for that functionality" or like statement when the use case and spec are referenced. Currently, there appears to be room for different interpretations of whether it is specified or not if and when a "sink" can become a source, per this specification.

@guest271314
Copy link
Author

@jan-ivar

Per your previous comment

"User Agents MAY allow users to use any media source, including pre-recorded media files."

incorporates by reference any media source, and supercedes the restrictive language

Note that this document describes the use of microphone and camera type sources only

which is what this PR initially changes to recognize the case of

The device.kind of the "Monitor of Built-in Audio Analog Stereo" device here in Firefox is "audioinput"—a virtual audio input that Firefox (or the underlying Linux OS) has chosen to expose as a source.

where it cannot logically be the case that capture of "Monitor of Built-in Audio Analog Stereo" as a source is equivalent or exclusive to

microphone and camera type sources only

"Monitor of Built-in Audio Analog Stereo" is not capturing microphone input. Thus that line is not accurate.

@jan-ivar
Copy link
Member

The source in that case is audio output to headphones or speakers, correct?

"That case" being the special "Monitor" audio input device in Firefox on Linux? Sure.

However, this is achieved at the OS/device driver level—Specs generally can't, nor should they, attempt to restrict what an OS or user agent can provide as audio input in the form of virtual devices.

Importantly, the existence of such a device, in no way changes the fact that no general mechanism for capturing system audio output exists in this spec.

The point ... is to make it unambiguously clear that it is possible to capture audio output, as a "source", to speakers or headphones using the methods defined in this specification, that is, precisely under which cases a "sink" becomes a "source",

I think the spec is clear: "Sinks" never become "sources" in this model. They're entirely separate devices from the viewpoint of this API. Any real-world connection between browser-provided devices in this API exists outside of this spec. E.g. there's no constraint or property for this.¹

make it clear that is not possible per this specification, so that users who have that requirement can abandon all hope

This spec is written with extension specs in mind, so it rarely rules things out, only in. As such, if something isn't mentioned in this spec, it isn't covered by this spec.


1. The only relationship between exposed "devices" in this spec is groupId, but I would even hesitate using that here. That's because its stated use case is the headset (with mic and speakers). I.e. the assumption is users might want to use these together. I don't think the same holds for a speaker and the monitor of said speaker, because using them together would feedback.

@guest271314
Copy link
Author

@jan-ivar

"That case" being the special "Monitor" audio input device in Firefox on Linux? Sure.

Technically, that case is also possible at Chromium on Linux following the procedure described at guest271314/SpeechSynthesisRecorder#14 (comment) once.

Can use getDisplayMedia() and MediaRecorder() to record the input and output at Firefox and Chromium to demonstrate the functionality is already possible at each implementation, though underspecified at this specification nonetheless.

Your reply appears to indicate that users in the field should abandon all hope of this specification making it clear that capturing audio output is possible per the specification even though the functionality is already possible in the field, given adequate experimentation and testing with an end to achieve that goal.

Given the inclusion of the term "audiooutput" in the specification, it is reasonable to expect audio output to be able to be captured per this specification.

Am banned from W3C indefinitely and from WICG for 1,000 years so am not able to propose a specification under the auspices of those entities to unequivocally capture audio output.

This specification provides the infrastructure to do so, yet for the reasons you are relying on, are essentially stating to abandon all hope of this specification being amended to reflect the state of the art in the field, rather that this specification has implicit foundational restrictions forbidding such functionality (direct, unambiguous capture of audio output devices) which prevent extensibility with regard to capturing audio output; can you kindly state the above or similar language specifically?

@guest271314
Copy link
Author

@jan-ivar Since the inference is that implementers and users of the resulting API of this specification should abandon all hope of explicitly capturing audio output under the language of this specification - even though such requirement is already possible at Firefox, Nightly and Chrome, Chromium at *nix - how do you suggest to proceed to realize that goal (either per specification or not) in the field?

@jan-ivar
Copy link
Member

jan-ivar commented Dec 16, 2019

I think step 1 is providing a compelling use case.

Frankly, capturing browser audio output at that late a stage only to bring it back into the browser, smells like a workaround for something that should be doable directly using components like MediaStreamTrack and maybe web audio.

For instance—did I see you mention web speech earlier?—I recently made some design recommendations in mozilla/standards-positions#170 (comment) to make SpeechRecognition play better with tracks. If your need here stems from an API shortcoming like that, then I'd try to fix that API instead first.

I'd hope we could do better for any use case short of media device stack testing (which is what we use the Monitor device for in Firefox). If you do find compelling use cases, and can get vendor interest in solving those use cases, then I guess an extension spec would be the way to go. However, IMHO specs don't always drive buy-in, buy-in drives specs.

@guest271314
Copy link
Author

I think step 1 is providing a compelling use case.

Frankly, capturing browser audio output at that late a stage only to bring it back into the browser, smells like a workaround for something that should be doable directly using components like MediaStreamTrack and maybe web audio.

For instance—did I see you mention web speech earlier?—I recently made some design recommendations in mozilla/standards-positions#170 (comment) to make SpeechRecognition play better with tracks. If your need here stems from an API shortcoming like that, then I'd try to fix that API instead first.

I'd hope we could do better for any use case short of media device stack testing (which is what we use the Monitor device for in Firefox). If you do find compelling use cases, and can get vendor interest in solving those use cases, then I guess an extension spec would be the way to go. However, IMHO specs don't always drive buy-in, buy-in drives specs.

Can compile a list of issues and bugs relevant to Web Speech API and Web Audio API working in conjunction. Those requests have been ongoing for several years now. In brief,

It depends on how thorough and lengthy you want the list to be to satisfy the as-yet undefined term "compelling". Notice the local direct text to MediaStreamTrack conversion implication of the last list item.

MediaStreamTrack can certainly be useful in the domain of TTS/SST.

@guest271314
Copy link
Author

Essentially, it is nearly impossible to improve SST/TTS technology, for example, screen readers, accessibility for persons that might not have all of their faculties functioning adequately, etc. ( use see https://lists.w3.org/Archives/Public/public-speech-api/2017Jul/0004.html for some use cases), locally, without the ability to capture, compare, modify input and ouput.

Currently TTS is not specified to be captured anywhere. SST is not specified to be exclusive to local processing.

When last checked Chrome, Chromium send users' biometric data (their voice) to a remote service.

The web platform should provide a means for users to analyze input and output TTS/SST locally. The code at the linked repositories re capturing speak() output and parsing SSML are workarounds.

@jan-ivar
Copy link
Member

jan-ivar commented Dec 19, 2019

I sympathize with the frustration of APIs that don't (yet) work well with tracks, but we seem to be in agreement this PR is not the way to fix it, so I'm closing this.

@jan-ivar jan-ivar closed this Dec 19, 2019
@guest271314
Copy link
Author

@jan-ivar How do you suggest to proceed to specify that audio output can be captured? Can #629 be incorporated into https://github.com/w3c/mediacapture-main/issues/652, and #640? Or, is your closure of this PR effectively the closure of #629 and "Abandon all hope" of capturing audio output only under this specification?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants