Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify "audiooutput" does not mean capture of audio output to headphones or speakers #720

Closed
guest271314 opened this issue Sep 7, 2020 · 16 comments
Assignees

Comments

@guest271314
Copy link

Per Issue 1114422: enumerateDevices() listing device kind "audioouput" is incorrect and misleading https://bugs.chromium.org/p/chromium/issues/detail?id=1114422

"audiooutput" refers to audio playback via a media element. It does not refer to microphone input or audio capture of anything.

This specification states https://w3c.github.io/mediacapture-main/#idl-def-MediaDeviceKind.audiooutput

audiooutput | Represents an audio output device; for example a pair of headphones.

Audio Output Devices API https://w3c.github.io/mediacapture-output/ does not actually use or define the term "audiooutput".

There is a section of Audio Output Devices API that addresses getUserMedia() relations to that specification

4.2 Obtaining Consent
The user agent may explicitly obtain user consent to play audio out of non-default output devices using selectAudioOutput.

Implementations MUST also support implicit consent via the getUserMedia() permission prompt; when an audio input device is permitted and opened via getUserMedia(), this also permits access to any associated audio output devices (i.e., those with the same groupId). This conveniently handles the common case of wanting to route both input and output audio through a headset or speakerphone device.

the remainder of comment from the above-linked Chromium bug

selectAudioOutput() is to be used in combination with setSinkId() and it has nothing to do with audio capture or microphone inputs; only playback (on media elements). It's a way to get access to a deviceId of kind "audiooutput" without using enumerateDevices. This deviceId is useful only with setSinkId since it's of kind "audiooutput".

Wrt the getUserMedia() UI in Chromium, its purpose is to request permission and not to select device. The device is selected based on the constraints passed to getUserMedia(). This might change in the future, following some spec changes, but for now it is totally spec compliant. Even if the prompt allowed for device selection, it wouldn't support capturing from monitor devices since those devices are not supported by Chromium. It would only list the devices reported as audioinput by enumerateDevices.

Problems

The first problem is that it that Chromium implementation of MediaStreamTrack set the label a microphone device to "audiooutput". Firefox labels monitor devices, which Chromium refuses to capture or list at enumerateDevices() as "audioinput".

The second problem is the device with the kind "audiooutput" is actually not headphones or speakers at all, where this specification does not explicitly define capture of audio output to headphones or speakers.

The third problem, which is the consequence of first and second problems, it is reasonable for users in the field attempting to achieve capture of actual audio output (a reasonable interpretation of headphones or speakers) to expect a kind denoted as "audiooutput" to be the plain meaning of that term, as eluded to in this specification. See This is again recording from microphone, not from audiooutput device #14

Since this was not working on latest chrome 71, I downgraded to chrome 60. I see that this program is recording from microphone instead from speechSynthesis.speak(). I feel the reason is because both audioinput and audiooutput have same deviceId="default". So how can I make it record from speak() ?

illustrating the potential for and reality of confusion where a device kind from enumerateDevices() is filtered for "audiooutput" with the expectation of capturing audio output to headphones or speakers https://github.com/guest271314/SpeechSynthesisRecorder/blob/master/SpeechSynthesisRecorder.js#L63

return navigator.mediaDevices.getUserMedia({
        audio: true
      })
      // set `getUserMedia()` constraints to "auidooutput", where avaialable
      // see https://bugzilla.mozilla.org/show_bug.cgi?id=934425, https://stackoverflow.com/q/33761770
      .then(stream => navigator.mediaDevices.enumerateDevices()
        .then(devices => {
          const audiooutput = devices.find(device => device.kind == "audiooutput");
          stream.getTracks().forEach(track => track.stop())
          if (audiooutput) {
            const constraints = {
              deviceId: {
                exact: audiooutput.deviceId
              }
            };
            return navigator.mediaDevices.getUserMedia({
              audio: constraints
            });
          }
          return navigator.mediaDevices.getUserMedia({
            audio: true
          });
        }))

where if nothing changes at Chromium implementation the device kind will be "audiooutput" yet headphones or speakers output will never be captured, only microphone will ever be captured.

Thus, why the kind "audiooutput" at all where both "audioinput" and "audiooutput" refer to the exact same device?

If the current language is clear to an author of this specification, kindly explain to the users above and below exactly why "audiooutput" really does not mean capture of audio output to speakers or headphones at all, and really just means the same as "audioinput", a microphone, an input device; to avoid any further confusion as to why the code that selects "audiooutput" device is working as intended - "audiooutput" and "audioinput" are intended to refer to the exact same device - and never to the headphones described in the specification: abandon all hope of capturing actual headphones or speakers per this specification.

Am relatively certain the confusion is not imagined and can be eliminated.

Comments to initial proof-cof-concept of capturing speechSynthesis.speak() output https://stackoverflow.com/a/45003549

2 Hi @guest271314, isn't this recording the user's mic - and not the actual synthesized speech? Is that what you intended? – Ronen Rabinovici Dec 1 '17 at 11:26

Thanks for this great example. I'm not sure if it is currently working in the latest Chrome (non beta). I have forked here to try it. I can see the audio player, but with no audio file in: jsfiddle.net/k1q07rsy – loretoparisi Dec 7 '17 at 9:36

@RonenRabinovici Yes, the original code at answer did record the device microphone. The original code is a workaround for the requirement to record speech synthesis by default at modern browsers. Updated code to set "audioouput" as device to record github.com/guest271314/SpeechSynthesisRecorder/commit/… – guest271314 Jan 10 '18 at 3:18

2 @loretoparisi See updated code which sets media device to record to "audiooutput" plnkr.co/edit/PmpCSJ9GtVCXDhnOqn3D?p=preview – guest271314 Jan 10 '18 at 3:22

2 @guest271314, I used the code at plnkr.co/edit/PmpCSJ9GtVCXDhnOqn3D?p=preview but it still recorded from my microphone. – Jeff Baker Aug 15 '18 at 22:54

This doesn't record speaker output. I tried capturing tab audio using chrome extension but still failed. It seems speechSynthesis is not using HTMLmediaElement for audio hence we shall not be able to capture at tab/browser level. The audiooutput mentioned above returns "default " for both mic and speaker since there is no way to set "kind" field while setting constraints in getUsermedia, it always captures "mic". Let me know in case more details required. – Gaurav Srivastava Mar 4 '19 at 1:13

Confirming that it records from microphone rather than speech synthesis - at least in Chrome 84. – joe Aug 13 at 11:15

precisely how "audioinput" and "audiooutput" are intended by this specification and derivatives to only refer to the same device, a microphone.

Such an explanation would require defying logic given that there is absolutely no difference between the device with kind set to "audioinput" and device set to "audiooutput" at Chromium browser. That only serves to create and maintain confusion, which is completely avoidable by clearly stating that this specification does not capture actual audio output to heaphones or speakers whatsoever; then users can know that fact for certain and not expect such behaviour at all from either this or derivative specifications

Solutions

Implementers must not use the term "audiooutput" set at kind of MediaStreamTrack where the captured stream is actually microphone input. "audioinput" must be used as kind for microphone input devices.

Do not set "audiooutput" on devices at enumerateDevices().

Make it clear in this specification does not specify capture of audio being output at headphones or speakers. This necessarily means that "audiooutput" kind cannot be true and correct as the specification does not currently define that procedure at all (the original intent of this specification was evidently limited to microphone input capture, not headphones or speaker capture).

@guest271314
Copy link
Author

The only viable fix appears to be to remove "audiooutput" from this specification entirely.

@youennf
Copy link
Contributor

youennf commented Sep 10, 2020

WebIDL does not allow partial enums, hence why it is there.

@henbos
Copy link
Contributor

henbos commented Sep 10, 2020

I think this is a limitation in WebIDL. For webrtc-extensions we worked around it kind of ugly:
https://w3c.github.io/webrtc-extensions/#rtcicecredentialtype-enum

@youennf
Copy link
Contributor

youennf commented Sep 10, 2020

whatwg/webidl#184

@henbos
Copy link
Contributor

henbos commented Sep 10, 2020

Blocked on?

@guest271314
Copy link
Author

How is WebIDL related to this issue? Am stating that "audiooutpiut" should be removed from this specification entirely as this specification does not define capturing audio output to headphones or speakers.

@guidou
Copy link
Contributor

guidou commented Sep 14, 2020

How is WebIDL related to this issue? Am stating that "audiooutpiut" should be removed from this specification entirely as this specification does not define capturing audio output to headphones or speakers.

audiooutput is not intended for capturing.
audiooutput is intended to identify devices that can output (as in play) audio. In this spec, they're useful if you want getUserMedia to prefer a microphone (type audioinput) associated with a certain output device by specifying the group ID of the audiooutput device using the groupId constrainable property.

I see no need to remove them from this spec.

@guest271314
Copy link
Author

audiooutput is not intended for capturing.

That language needs to be in the specification to avoid confusion. There are more than one individual that has interpreted that term literally when the only specified definition is

audiooutput | Represents an audio output device; for example a pair of headphones.

without the accompanying explanation.

audiooutput is intended to identify devices that can output (as in play) audio.

That is what have not ever understood about what setSinkId() is intended to do and does. Again, all of that language needs to be in the specification if the label is useful to some.

At https://stackoverflow.com/a/45003549 and guest271314/SpeechSynthesisRecorder#14 the term only sows confusion when users actually want to capture heaphones and speakers. Kindly tell those users that "audiooutput" does not mean capture of headphones or speakers at all. No one else has said that in any specification.

In this spec, they're useful if you want getUserMedia to prefer a microphone (type audioinput) associated with a certain output device by specifying the group ID of the audiooutput device using the groupId constrainable property.

I still am having difficulty understanding exactly what setSinkId() does. What is meant by "associated with a certain output device"? The user does not need "audiooutput" at all to set MediaStream as srcObject of HTMLMediaElement or as mediaStream at MediaStreamAudioSourceNode.

In this specification `"audiooutput" is useless.

As far as I can tell Chromium is the only browser that uses that label. Which refers to the same device as "audioinout". There is no difference at Chromium - unless you can demonstrate that difference in code.

I see no need to remove them from this spec.

I do. To avoid confusion. There is not capture of audio output occurring. That requirement is still being requested and developed.

If that needs to be kept around for whatever purpose individuals use it for, then all of the language used outside of the specification in these issues needs to be in the actual specification - and Audio Output Devices API - so that no user ever again has the interpretation that either of these specifications intends to actualy capture the audio output to headphones or speakers.

And rename the bit "audioreroute" or "microphone=to-headphones-or-speakers or something like that, as the prupose actually has nothing to do with actual capture of heaphones or speakers, rather, routing captured microphone input to headphones.

@youennf
Copy link
Contributor

youennf commented Oct 1, 2020

Given this is a WebIDL limitation, we keep it like this and might reopen issue when WebIDL allows us to do os.

@guest271314
Copy link
Author

This is not a WebIDL issue or limitation. This issue is strictly for the purpose of printing in the specification that this specification does not define any means or algorithms for capturing audio output to headphones or speakers.

@youennf
Copy link
Contributor

youennf commented Oct 2, 2020

The spec is saying for getUserMedia (just above https://w3c.github.io/mediacapture-main/#dom-mediadevices-getusermedia):
Prompts the user for permission to use their Web cam or other video or audio input.

This seems clear to me that "audio output" is not in scope of getUserMedia.

@guest271314
Copy link
Author

The spec is saying for getUserMedia (just above https://w3c.github.io/mediacapture-main/#dom-mediadevices-getusermedia):
Prompts the user for permission to use their Web cam or other video or audio input.

This seems clear to me that "audio output" is not in scope of getUserMedia.

Then why is

audiooutput | Represents an audio output device; for example a pair of headphones.

in the specification given neither the term "audioouput" nor the definition provided are applicable whatsoever?

Kind remove that term and definition from this specification to avoid sowing further confusion.

@youennf
Copy link
Contributor

youennf commented Oct 6, 2020

We need to define https://w3c.github.io/mediacapture-main/#dom-mediadevicekind in this spec.
We cannot move MediaDeviceKind.audiooutput in https://w3c.github.io/mediacapture-output/ due to current WebIDL limitation which does not allow to define partial enumerations.

Given getUserMedia scope is clear, I think the confusion is limited.
When/if WebIDL allows partial enumerations, we should move audiooutput to https://w3c.github.io/mediacapture-output/.
Closing issue based on this rationale.

@youennf youennf closed this as completed Oct 6, 2020
@guest271314
Copy link
Author

Given getUserMedia scope is clear, I think the confusion is limited.
When/if WebIDL allows partial enumerations, we should move audiooutput to https://w3c.github.io/mediacapture-output/.
Closing issue based on this rationale.

Perhaps you do not understand what am stating.

Am stating directly that neither this specification nor https://w3c.github.io/mediacapture-output/ actually capture any audio output whatsoever. The term used needs to be "microphone-reroute-to-speakers" or something like that. Users in the field are expecting "audiooutput" to mean just that, though these specifications are massaging terms of art which result in confusion.

Therefore "audiooutput" must not exist in either specification. There is no "move" that should occur. That term needs to be repealed from these specifications until capture of audio output is actually specified, which it is not in either specification.

@guest271314
Copy link
Author

The rationale for closing this issue is flawed based on the simple fact that "audiooutput" actually does occur at media capture audio output either, thus this issue w3c/mediacapture-output#111.

These specifications are using the term "audiooutput" as if audio output to speakers can be captured, which is not the case by default language in either specification.

However, users do want actual audio output capture to headphones and speakers, thus if that is ever specified, clearly the "audiooutput" that is implemented now, that is, re-routing microphone input to speakers, must take precedence over the as-applied usage currently, which has nothing to do with actually capturing output to headphones or speakers. Thus, for disambiguity, this issue and the one of the same subatnce in audio output devices, makes it directly clear that the term "audiooutput" needs to be replaced with a term that describes what is actually specified and expected to occur - re-routing of microphone input to speakers - not capture of speakers.

Am not sure why this is proving difficult to convey or comprehend at the specification level?

@guest271314
Copy link
Author

We need to define https://w3c.github.io/mediacapture-main/#dom-mediadevicekind in this spec.
We cannot move MediaDeviceKind.audiooutput in https://w3c.github.io/mediacapture-output/ due to current WebIDL limitation which does not allow to define partial enumerations.

Given getUserMedia scope is clear, I think the confusion is limited.
When/if WebIDL allows partial enumerations, we should move audiooutput to https://w3c.github.io/mediacapture-output/.
Closing issue based on this rationale.

@youennf Do you understand what am stating here?

This specification uses the term "audiooutout" implying or inferring that audio output capture is possible, where that feature is not possible.

The term "audiooutput" does not belong in either this specification or mediacapture-output. Again, at best what occurs at mediacapture-output is re-routing of microphone input through an HTMLMediaElement.

That is not capture of headphones or speakers.

However, no distinction is made in the actual specification between capture of "audioinput" devices and the fact that devices labelled "audiooutput" (as spurriously defined in any of these W3C specifications) does not actually capyure the media to or from that devices - leading to the confusion illustrated at the comments at this answer https://stackoverflow.com/a/45003549.

And no authors of this specification nor mediacapture-output have notified the users there that their in-house definition of the term of art "audiooutput" does not mean capture of audio output, rather some specialize re-routing out audio input to an HTMLMediaElement.

@w3c w3c locked as resolved and limited conversation to collaborators Oct 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants