It has come to my attention that some applications wish to capture multiple display surfaces at the same time. Some examples include:
- Streamers presenting multiple surfaces. [*]
- Managed devices recording for compliance/training/billing reasons.
Capturing multiple display surfaces is presently achievable using existing APIs - it is possible to call getDisplayMedia() multiple times. However, this is not very ergonomic, and creates serious friction for the user:
- The user has to interact with the browser's media-picker multiple times.
- The user has to interact with the application multiple times, signaling that they want to capture yet another surface, and providing a new transient activation each time.
- The user is liable to make mistakes when trying to remember which surfaces they've already started capturing, and which surfaces remain for them to capture.
Ideally, a single transient activation could be used for single API invocation, providing the user with a media-picker with functionality akin to checkboxes (mentioned here by way of example; we don't need to mandate specific UX elements). The user would be allowed to choose all of the display surfaces that they want to capture, then click OK once. It is clear from context that these are all of the surfaces the user was aiming to capture, and that no additional API calls to gDM or the like are necessary.
As a straw-man proposal, imagine getDisplayMedia({video: true, ..., maxSurfaces: N}). The default value of maxSurfaces is 1, and would trigger the current behavior, returning a single MediaStream. A higher value would trigger the new behavior, and return an array, [MediaStream].

Finer points off the bat:
- The UA may impose a limit on how many streams may be captured concurrently and prevent the user from choosing more.
- If a
maxSurfaces greater than 1 is specified, an array will be returned even if the user chooses one surface, to simplify things for the application.
Interesting points to discuss:
- MUST/SHOULD/MAY limit the user to choose only one type of display-surface? (Without influencing which.) That is to say, maybe the user can choose any N tabs, any N windows, or any N monitors, but not a combination of K tabs and N-K screens.
CC @shangl, whose use-case prompted this.
--
[*] Imagine an instructor streaming multiple tabs, and individual viewers independently choosing which one to focus on. I mention this so as to discourage solutions involving stitching together of multiple surfaces on a logical surface.
It has come to my attention that some applications wish to capture multiple display surfaces at the same time. Some examples include:
Capturing multiple display surfaces is presently achievable using existing APIs - it is possible to call
getDisplayMedia()multiple times. However, this is not very ergonomic, and creates serious friction for the user:Ideally, a single transient activation could be used for single API invocation, providing the user with a media-picker with functionality akin to checkboxes (mentioned here by way of example; we don't need to mandate specific UX elements). The user would be allowed to choose all of the display surfaces that they want to capture, then click OK once. It is clear from context that these are all of the surfaces the user was aiming to capture, and that no additional API calls to gDM or the like are necessary.
As a straw-man proposal, imagine
getDisplayMedia({video: true, ..., maxSurfaces: N}). The default value ofmaxSurfacesis 1, and would trigger the current behavior, returning a singleMediaStream. A higher value would trigger the new behavior, and return an array,[MediaStream].Finer points off the bat:
maxSurfacesgreater than 1 is specified, an array will be returned even if the user chooses one surface, to simplify things for the application.Interesting points to discuss:
CC @shangl, whose use-case prompted this.
--
[*] Imagine an instructor streaming multiple tabs, and individual viewers independently choosing which one to focus on. I mention this so as to discourage solutions involving stitching together of multiple surfaces on a logical surface.