Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should mute and unmute events of MediaStreamTrack be allowed to fire based on user non-action? #141

Closed
guest271314 opened this issue Jun 25, 2020 · 29 comments · Fixed by #206

Comments

@guest271314
Copy link
Contributor

The specification uses the term "user activation" exactly once

When the getDisplayMedia() method is called, the User Agent MUST run the following steps:

If the method call is not triggered by user activation, return a promise rejected with a DOMException object whose name attribute has the value InvalidStateError.

The terms "user action" and "user gesture" are not included in the language of the specification.

Chromium 85 appears to fire mute and unmute events on MediaStreamTrack from getDisplayMedia() directly corresponding to user non-action, for example, not moving the cursor on the captured screen, or user action, moving the cursor on the captured screen.

      onclick = async e => {
        var input,
          recorder,
          audioTrack,
          videoTrack,
          stream,
          mediaStream,
          chunks = [];

        onclick = null;

        stream = await navigator.mediaDevices.getDisplayMedia({ video: true });

        [videoTrack] = stream.getVideoTracks();
        videoTrack.onended = videoTrack.onmuted = e => console.log(e);
        await videoTrack.applyConstraints({
          resizeMode: 'none',
          cursor: 'never',
          width: window.innerWidth * 0.7,
          height: window.innerHeight,
        });
        video = document.createElement('video');
        video.currentTime = 0;
        video.autoplay = true;
        videoTrack.onunmute = videoTrack.onmute = e => console.log(e);

        video.ontimeupdate = _ => {
          console.log(video.currentTime, 60 * 7 + 60 * 0.5);
          if (video.currentTime > 60 * 7 + 60 * 0.5) {
            recorder.stop();
            videoTrack.stop();
          }
        };
        video.onplay = _ => {
          recorder = new MediaRecorder(stream);
          recorder.start(0);
          recorder.onstop = async e => {
            console.log(
              URL.createObjectURL(new Blob(chunks, { type: 'video/webm' }))
            );
          };
          recorder.ondataavailable = e => {
            if (e.data.size > 0) chunks.push(e.data);
          };
        };
        video.srcObject = stream;
      };

The result is a series of unintended consequences impacting other API's, including media file produced by MediaRecorder, timeupdate event of HTMLMediaElement not firing every 50 to 250ms per HTML Standard. The resulting bugs that have observed so far downstream https://bugs.chromium.org/p/chromium/issues/detail?id=1099280.

Kindly include language in the specification which prohibits implementations from firing mute and unmute events based on user non-action or user action.

@henbos
Copy link
Contributor

henbos commented Jun 25, 2020

Does this happen due to screen saver or lock screen, or does it happen even though the screen is on and unlocked?

@guest271314
Copy link
Contributor Author

No screen locks.

A challenging bug to narrow down. Only when used <video> and timeupdate event was able to gather one possible reason why MediaRecorder was producing videos having total duration up to 44 seconds less than was intended to be captured.

@guest271314
Copy link
Contributor Author

First attached timeupdate event and noticed the event was not being fired every 50-250ms, then attached mute and unmute events and observed both being fired, directly corresponding to moving the cursor on the captured screen.

@guest271314
Copy link
Contributor Author

Arrived at this bug after attempting hundreds of times to capture 7:42 of video to merge in a MediaStream with previously captured audio as .opus, .wav, .webm file using mkvmerge with --enable-durations and ffmpeg and converting WebM to MP4 on the resulting video trying to get the video to the complete duration of the audio, where the result is the audio is always clipped at the end of the video. Evenutally got the video to play locally to 7:42, however, when uploading the video to YouTube or Google Drive the result is 6:56, to wit have not yet succeed by any means yet in producing a video at Chromium that the two services decode or remux to 7:42 duration https://bugs.chromium.org/p/chromium/issues/detail?id=1099003.

@guest271314
Copy link
Contributor Author

@henbos Interestingly, cannot reproduce the bug where mute and unmute events were being fired based on mouse movement. Was either running code at console at MDN or at https://plnkr.co running code. Saved the output just for the case of not being able to reproduce the next day

0.251276 450
VM1128:28 0.284608 450
VM1128:28 1.251236 450
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 16.185637 450
VM1128:28 16.509598 450
VM1128:25 Event {isTrusted: true, type: "unmute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 16.767282 450
VM1128:28 16.834284 450
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 18.814115 450
VM1128:25 Event {isTrusted: true, type: "unmute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 256.726469 450
VM1128:25 Event {isTrusted: true, type: "unmute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 256.808175 450
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 259.73112 450
VM1128:25 Event {isTrusted: true, type: "unmute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 261.68968 450
VM1128:25 Event {isTrusted: true, type: "unmute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 261.959337 450
VM1128:28 262.207464 450
VM1128:28 262.434713 450
VM1128:28 262.717973 450
VM1128:28 262.964394 450
VM1128:28 263.204343 450
VM1128:28 263.469493 450
VM1128:28 263.705097 450
VM1128:28 263.95417 450
VM1128:28 264.204803 450
VM1128:28 264.369143 450
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 266.357293 450
VM1128:25 Event {isTrusted: true, type: "unmute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 401.9193 450
VM1128:25 Event {isTrusted: true, type: "unmute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 402.206537 450
VM1128:28 402.29217 450
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 429.950754 450
VM1128:28 430.084082 450
VM1128:25 Event {isTrusted: true, type: "unmute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 437.250462 450
VM1128:28 437.358342 450
VM1128:25 Event {isTrusted: true, type: "unmute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 439.287269 450
VM1128:25 Event {isTrusted: true, type: "unmute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:25 Event {isTrusted: true, type: "mute", target: MediaStreamTrack, currentTarget: MediaStreamTrack, eventPhase: 2, …}
VM1128:28 525.750049 450
VM1128:38 blob:https://path/to/site/4a0393ec-afed-44e0-ae1f-847afeff3f6f
VM1128:28 0 450

Chromium froze twice today trying to reproduce the issue. Will re-open this issue if am able to retrace steps and reproduce the issue. MediaRecorder still is not creating video with duration to the correct length. Firefox does not have that issue.

@guest271314
Copy link
Contributor Author

@henbos Reviewed code that tested and isolated the case where MediaStreamTrack mute event is fired at Chromium, when "Chromium tab" capture is selected.

Steps To Reproduce:

Open console, run the following code, click on the window, then move the mouse pointer back to the open console.

onclick = async _ => { 
  onclick = null;
  var recorder, videoTrack, stream
  stream = await navigator.mediaDevices.getDisplayMedia({video: true});
  [videoTrack] = stream.getVideoTracks();
  videoTrack.onmute = videoTrack.onunmute = videoTrack.onended = e => console.log(e.type, performance.now() - now);
  await videoTrack.applyConstraints({
    resizeMode: 'none',
    cursor: 'never',
    width: window.innerWidth * 0.7,
    height: window.innerHeight,
  });
  recorder = new MediaRecorder(stream);
  recorder.onstop = e => console.log(e);
  recorder.ondataavailable = e => { 
   if (e.data.size) { 
    console.log(URL.createObjectURL(e.data)); 
   } 
  };  
  let now = performance.now();
  recorder.start();
}

Tested at two separate sites, mute event of MediaStreamTrack is fired at 4171.33499996271 at MDN website and 4171.41999991145 at Screen Capture website.

If the cursor is moved to the active document unmute event is fired, then the mute and unmute event toggle in succession

mute 4171.33499996271
VM134:6 unmute 13353.739999933168
VM134:6 mute 16691.22499995865
VM134:6 unmute 38397.969999932684
VM134:6 mute 41735.51499994937
VM134:6 unmute 42569.71499999054
VM134:6 mute 43403.89499999583
VM134:6 unmute 45072.60499999393
VM134:6 mute 45907.00000000652
VM134:6 unmute 60099.010000005364
VM134:6 mute 60933.09499998577
VM134:6 unmute 70114.04999997467
VM134:6 mute 70947.97999993898
VM134:6 unmute 74286.1649999395
VM134:6 ended 75370.05499994848

@guest271314 guest271314 reopened this Jun 27, 2020
@guest271314
Copy link
Contributor Author

To capture the steps to reproduce demonstrating effect of non-user action and user action used the same code at Firefox to capture Chromium procedure. The first two videos the result is consistent at different domains. The last video the cursor is moving on the active document, delaying the mute event being fired until 10002.794999978505.

mute event should not be fired in any of the
chromium_screen_capture_mute_unmute.tar.gz
cases.

@guest271314
Copy link
Contributor Author

The issue occurs when capturing "Your Entire Screen", "Application Window" or "Chromium Tab", not just "Chromium Tab". The video track simply abruptly goes mute in 4 seconds at Chromium 85.0.4173.0, where inactive event is still defined that event is fired. The MediaStreamTrack appears to be tied to some user activation scheme or algorithm, though per the specification should not be. Whether user action occurs or not mute event is eventually fired for no clear reason.

blob:https://run.plnkr.co/3371ff9b-c93e-4185-8ecc-714337f7da81
(index):31 blob:https://run.plnkr.co/b1904285-775d-412f-a76c-76305718d7f8
(index):32 Uncaught DOMException: Failed to execute 'stop' on 'MediaRecorder': The MediaRecorder's state is 'inactive'.
    at MediaRecorder.recorder.ondataavailable (https://run.plnkr.co/preview/ckbxq5ngg00072v6tkrwi1gcs/:32:22)
recorder.ondataavailable @ (index):32
(index):28 Event {isTrusted: true, type: "stop", target: MediaRecorder, currentTarget: MediaRecorder, eventPhase: 2, …}
(index):17 inactive

@guest271314
Copy link
Contributor Author

Removing MediaRecorder from the equation, this code

onclick = async _ => { 
  onclick = null;
  var recorder, videoTrack, stream
  stream = await navigator.mediaDevices.getDisplayMedia({video: true});
  stream.oninactive = e => console.log(e.type);
  [videoTrack] = stream.getVideoTracks();
  let now = performance.now();
  videoTrack.onmute = videoTrack.onunmute = videoTrack.onended = e => console.log(e.type, performance.now() - now);
}

does not fire mute event when "Your Entire" is selected at screen capture UI prompt, mute event does fire when "Application Window" or "Chromium Tab" is selected for capture.

@guest271314
Copy link
Contributor Author

"internal" Chromium code is at least one of the causes of this bug. Chromium "User Activation" project marked their side of the bug as WontFix https://bugs.chromium.org/p/chromium/issues/detail?id=1100053#c3. Therefore Screen Capture specification must put a halt to the bug, given that Chromium User Activation evidently are convinced User Activation works in all cases even when that internal Chromium project is not specified to be implemented, which in this case break getDisplayMedia() and consistently produces unexpected results at MediaRecorder

If you don't want a specific API to be removed from this intervention, file a bug for that specific API.

What appears to be meant is for Screen Capture implementers at Chromium to unequivocally remove the internal Chromium User Activation code from this API, by brute force if necessary.

@henbos
Copy link
Contributor

henbos commented Oct 8, 2020

We could list examples where the user agent might want to mute or unmute without user interaction to clarify. For example a call is being received on a mobile device, screensaver is starting, etc.

@jan-ivar
Copy link
Member

jan-ivar commented Oct 8, 2020

I propose we add clarifying language similar to mediacapture-main.

@henbos
Copy link
Contributor

henbos commented Oct 8, 2020

However the proposal "should not be fired based on user non-action" is not accepted by the working group.

@henbos henbos changed the title mute and unmute events of MediaStreamTrack should not be fired based on user non-action Should mute and unmute events of MediaStreamTrack be allowed to fire based on user non-action? Oct 8, 2020
@henbos
Copy link
Contributor

henbos commented Oct 8, 2020

Changed the title to reflect the discussion rather than the conclusion

@guest271314
Copy link
Contributor Author

@henbos

However the proposal "should not be fired based on user non-action" is not accepted by the working group.

Do not file issues here for the purpose of being accepted.

Reject your conclusion that user-action should be massaged into the specification based on Chromium bug.

The issue is filed to precisely reach that conclusion. Chromium has implemented said conditions for "Tab" capture and "Application" capture which caused numerous downstream issues. Thus, changes have recently been made to at least try to correct the issue https://bugs.chromium.org/p/chromium/issues/detail?id=1100746#c32.

@guest271314
Copy link
Contributor Author

Consider the concrete use case of capturing a primary source document, in this case an historic document Denying Free Blacks the Right to Vote (1724, 1735), and adding an audio track that reads the document, to reproduce the primary resource, without addition or subtraction. Use technologies shipped with the browser, getUserMedia(), getDisplayMedia(), speechSynthesis.speak() to provide audio reading of the tet displayed on screen. Simple enough. The technology exists to achieve the requirement. Do the API's exist and have the capabilities to work together within the single browser to produce a clinical reproducation of the primary resource? Will leave it to the reader to explore such a project for themselves to determine the simplicity in the field.

After multiple attempts to record the document with audio track, found that Chromium was firing mute and unmute events at "Tab" and "Application" capture selections when no user action was taking place on the page. Recall that the requirement is to read the document as an audio track with the document being captured. That means to synchronize the video with the audio the document must remain displayed until that page of the document is read. That ultimately resulted in MediaRecorder not producing correct output.

For this use case including animation in the document to satisfy Chromium muting the track when no activity on renderer is spurrious. The requirement is to reproduce primary source documents precisely by audio and visual tracks as a single media, not to create activity on the page that is not remotely related to reading the document by means of sight and sound.

The track from getDisplayMedia() at "Entire screen" capture does not toggle mute and unmute events.

The user must expect the capture of application and tab to be captured in same algorithm steps as entire screen capture; same frame rate, until instructed to do otherwise.

Downstream what happens what a MediaStreamTrack communicated via RTCPeerConnection mutes and unmutes every N seconds, even when the use case is to display N frames necessary to complete reading a page of a document (scientific; historic; academic; etc.)?

Toggling mute and unmute of MediaStreamTrack at getDisplayMedia() eliminates the use case of audio books, narration. The developer would need to create some form of animation on the page, which is extra, unnecessary code.

@henbos
Copy link
Contributor

henbos commented Oct 9, 2020

I wouldn't expect tracks to be muted and unmuted sporadically for no apparent reason, but I would expect there to be legitimate use cases where a track gets muted without user interaction, such as if the screen gets locked due to inactivity.

@guest271314
Copy link
Contributor Author

@henbos The screen lock theory fails when then user specifically configures the OS to not suspend or lock the screen due to inactivity; see w3c/mediacapture-main#670 (comment), w3c/mediacapture-main#668 (comment). Even if the theory of default settings of a machine suspending or locking the screen without user activity were to be seriously explored we would need to test all devices to determine what really occurs, which is not practical and ecludes the very real possibility that the user has changed default settings that this specification has no way of observing all of the possible OS and hardware configuration. I could have several cameras connected to a machine using motion or a different program to capture still life, perform experiments, or general security; for that I need no screen open or user action, perhaps for hours or days or weeks, yet I still expect my code that runs through the browser to capture what I tell it to without an absentee specification authors trying to insert their defensive coding criteria into my projects. If I want the MediaStreamTrack turned off I set enabled to false or call stop() or define my own mechanism to turn off the stream. Trying to state what should happen when a machine suspends or screen is locked is beyond the scope of what the specification can actually evaluate or control - and I do not want a specification trying to control that anyway.

@guest271314
Copy link
Contributor Author

@henbos

The concrete use case described is for a very brief screen capture of a screen displaying a primary resource, four (4) total pages of primary resource material.

Consider creating a video of Dred Scott v. Sandford, 60 U.S. (19 How.) 393 (1857), which is well over 100 pages. I do not need to concern myself with mute and unmute events (or creating totally unrealted transparent animation in the document) at all when creating such a video (audio; visual) reproduction of primary source material during the process, which can take a considerable amount of time to read the entire case aloud - and actually read the text at the same time.

@henbos
Copy link
Contributor

henbos commented Oct 9, 2020

Is Chrome muting the track due to inactivity even though there is no screen saver, lock screen or screen turning off, etc?
Or what exactly is the issue? I haven't had time to follow all your links

@guest271314
Copy link
Contributor Author

Chromium is muting the track when there is no activity on the rendering thread. The way I see it, I had to move the mouse in order for the track to not mute. Since cursor never constraint does not work have to set cursor to none at the video. The muting occurred at 1 second. That is not specified, obviously. Of course that means that MediaRecorder can miss frames, or stop recording altogether. I just happended to find out what was happening by experimenting.

@guest271314
Copy link
Contributor Author

The muting occurs only at "Tab" and "Application" capture, not "Entire screen" capture. Tab and application capture should behave the same as entire screen capture or getUserMedia(), capture at 30 or 60 FPS until instructed to do otherwise.

@henbos
Copy link
Contributor

henbos commented Oct 9, 2020

Muting within a second definitely sounds like an implementation bug.

Does it work as long as you move the mouse and gets muted as soon as you stop moving the mouse, or is this an issue about it always muting when you try to record it?

@guest271314
Copy link
Contributor Author

Have to head off to a gig. Will post results of experiments after the change later today or tomorrow.

@henbos
Copy link
Contributor

henbos commented Oct 9, 2020

I don't need to read 100 pages of Dred Scott v. Sandford, 60 U.S. (19 How.) 393 (1857) to figure out that muting a track within a second is a bug. I think Chromium implementers can follow up on fixing this issue based on their priorities. That is not a spec issue.

Here on the spec side of things, it is enough to add a clarifying section similar to https://w3c.github.io/mediacapture-main/getusermedia.html#life-cycle-and-media-flow.

@henbos
Copy link
Contributor

henbos commented Oct 9, 2020

This is editorial and ready for PR

@w3c w3c deleted a comment from guest271314 Oct 9, 2020
@w3c w3c locked and limited conversation to collaborators Oct 9, 2020
@eladalon1983
Copy link
Member

I'm still reading through the thread and following all the links. I'm currently making my way through Dred Scott v. Sandford, 60 U.S. (19 How.) 393 (1857). I'll keep y'all posted.

@eladalon1983 eladalon1983 self-assigned this May 27, 2021
@eladalon1983
Copy link
Member

I propose we add clarifying language similar to mediacapture-main.

This issue appears to have a non-trivial amount of unwritten history. @jan-ivar, could you please add the clarifying language which you suggest, or otherwise advise what was agreed upon?

@eladalon1983 eladalon1983 assigned jan-ivar and unassigned eladalon1983 Jun 3, 2021
@jan-ivar
Copy link
Member

Here on the spec side of things, it is enough to add a clarifying section similar to https://w3c.github.io/mediacapture-main/getusermedia.html#life-cycle-and-media-flow.

We seem to already have such a section in https://w3c.github.io/mediacapture-screen-share/#hidden-display-surfaces. I'll try adding a note to it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants