Merge pull request #9 from jan-ivar/doc

Initial version w/user-chooses semantics from w3c/mediacapture-main#667
w3c · Nov 12, 2020 · 83c2805 · 83c2805
2 parents 31c8d9e + 55bff73
commit 83c2805
Showing 1 changed file with 351 additions and 0 deletions.
diff --git a/index.html b/index.html
@@ -0,0 +1,351 @@
+<!doctype html>
+<html lang="en-us">
+<head>
+  <title>Media Capture and Streams Extensions</title>
+  <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
+  <script src="https://www.w3.org/Tools/respec/respec-w3c" class="remove"></script>
+  <script class='remove'>
+  "use strict";
+  // See https://github.com/w3c/respec/wiki/ for how to configure ReSpec
+  var respecConfig = {
+    group: "webrtc",
+    xref: ["html", "infra", "permissions", "dom", "mediacapture-streams", "webidl"],
+    edDraftURI: "https://w3c.github.io/mediacapture-extensions/",
+    editors:  [
+      {name: "Jan-Ivar Bruaroey", company: "Mozilla Corporation", w3cid: 79152},
+    ],
+    shortName: "mediacapture-extensions",
+    specStatus: "UD",
+    subjectPrefix: "[mediacapture-extensions]",
+    github: "https://github.com/w3c/mediacapture-extensions",
+  };
+  </script>
+</head>
+
+<body>
+  <section id="abstract">
+    <p>This document refines behaviors of the {{MediaDevices}} functions
+    {{MediaDevices/enumerateDevices()}} and {{MediaDevices/getUserMedia()}} to
+    reduce their fingerprinting surface and enable user selection of camera and
+    microphone through device pickers implemented in the user agent.
+    </p>
+  </section>
+  <section id="sotd">
+    <p>This is an unofficial proposal.</p>
+  </section>
+  <section id="introduction">
+    <h2>Introduction</h2>
+    <p>This document contains proposed extensions and modifications to the
+    [[GETUSERMEDIA]] specification.</p>
+    <p>New features and modifications to existing features proposed here may be
+    considered for addition into the main specification post Recommendation.
+    Deciding factors will include maturity of the extension or modification,
+    consensus on adding it, and implementation experience.</p>
+    <p>A concrete long-term goal is reducing the fingerprinting surface of
+    {{MediaDevices/enumerateDevices()}} by deprecating exposure of the device
+    {{MediaDeviceInfo/label}} in its results. This requires relieving
+    applications of the burden of building user interfaces to select cameras and
+    microphones in-content, by offering this in user agents as part of
+    {{MediaDevices/getUserMedia()}} instead.</p>
+    <p>Miscellaneous other smaller features are under consideration as well,
+    such as constraints to control multi-channel audio beyond stereo.</p>
+  </section>
+  <section>
+    <h2>Terminology</h2>
+    <p>
+      This document uses the definitions {{MediaDevices}}, {{MediaStreamTrack}},
+      {{MediaStreamConstraints}} and {{ConstrainablePattern}}
+      from [[!GETUSERMEDIA]].
+      <p>The terms [=permission state=], [=request permission to use=], and
+      <a data-cite="permissions">prompt the user to choose</a> are defined in
+      [[!permissions]].</p>    </p>
+  </section>
+  <section id="conformance">
+  </section>
+  <section id="camera and microphone picker">
+    <h2>In-browser camera and microphone picker</h2>
+    <p>The existing {{MediaDevices/enumerateDevices()}} function exposes camera
+    and microphone {{MediaDeviceInfo/label}}s to let applications build
+    in-content user interfaces for camera and microphone selection. Applications
+    have had to do this because {{MediaDevices/getUserMedia()}} did not offer a
+    web compatible in-agent device picker. This specification aims to rectify
+    that.</p>
+    <p>Due to the significant fingerprinting vector caused by device
+    {{MediaDeviceInfo/label}}s, and the well-established nature of the existing
+    APIs, the scope of this particular effort is limited to removing
+    {{MediaDeviceInfo/label}}, leaving the overall constraints-based model
+    intact. This helps ensure a migration path more viable than to a
+    less-powerful API.</p>
+    <p>This specification augments the existing {{MediaDevices/getUserMedia()}}
+    function instead of introducing a new less-powerful API to compete with it,
+    for that reason as well.</p>
+    <section id="new getusermedia semantics">
+      <h3>getUserMedia "user-chooses" semantics</h3>
+      <p>This specification introduces
+      slightly altered semantics to the {{MediaDevices/getUserMedia()}}
+      function called <code>"user-chooses"</code> that guarantee a picker will
+      be shown to the user in cases where the user agent would otherwise choose
+      for the user (that is: when application constraints do not narrow down
+      the choices to a single device). This is orthoginal to permission, and
+      offers a better and more consistent user experience across applications
+      and user agents.
+      </p>
+      <p>Unfortunately, since the <code>"user-chooses"</code> semantics may
+      produce user agent prompts at different times and in different situations
+      compared to the old semantics, they are somewhat incompatible with
+      expectations in some existing web applications that tend to call
+      {{MediaDevices/getUserMedia()}} repeatedly and lazily instead of using
+      e.g. <code>stream.clone()</code>.</p>
+    </section>
+    <section id="web compatibility">
+      <h3>Web compatibility and migration</h3>
+      <p>User agents are encouraged to provide the new semantics as opt-in
+      initially for web compatibility. User agents MUST deprecate (remove)
+      {{MediaDeviceInfo/label}} from {{MediaDeviceInfo}} over time, though specific migration strategies
+      are left to user agents. User agents SHOULD migrate to offering the new
+      semantics by default (opt-out) over time.</p>
+      <p>Since the constraints-model remains intact, web compatibility problems
+      are expected to be limited to:</p>
+      <ul>
+        <li>
+          Sites that never migrated show e.g. "Camera 1", "Camera 2" etc.
+          instead of descriptive device labels
+        </li>
+        <li>
+          Sites with no device management strategy provoke a picker in the
+          user agent every visit for users with more than a singular choice
+          of camera or microphone (a feature of sorts)
+        </li>
+      </ul>
+    </section>
+  </section>
+  <section id="mediadevices-interface">
+    <h3>MediaDevices Interface Extensions</h3>
+    <div>
+      <pre class="idl"
+>partial interface MediaDevices {
+  readonly attribute GetUserMediaSemantics defaultSemantics;
+};</pre>
+    </div>
+    <section>
+      <h2>Attributes</h2>
+      <dl data-link-for="MediaDevices" data-dfn-for="MediaDevices"
+      class="attributes">
+        <dt id="def-mediadevices-defaultSemantics"><dfn><code>defaultSemantics</code></dfn>
+        of type <span class="idlAttrType"><a>GetUserMediaSemantics</a></span>, readonly</dt>
+        <dd>
+          <p>The default semantics of {{MediaDevices/getUserMedia()}} in this
+          user agent.</p>
+          <p>User agents SHOULD default to <code>"browser-chooses"</code>
+          for backwards compatibility, until a transition plan has been
+          enacted where a majority of user agents collectively switch their
+          defaults to <code>"user-chooses"</code> for improved user privacy,
+          and usage metrics suggest this transition is feasible without
+          major breakage.</p>
+        </dd>
+      </dl>
+    </section>
+  </section>
+  <section id="mediastreamconstraints-dictionary-extensions">
+    <h3>MediaStreamConstraints dictionary extensions</h3>
+    <div>
+      <pre class="idl"
+>partial dictionary MediaStreamConstraints {
+  GetUserMediaSemantics semantics;
+};</pre>
+      <section>
+        <h2>Dictionary {{MediaStreamConstraints}} Members</h2>
+        <dl data-link-for="MediaStreamConstraints" data-dfn-for=
+        "MediaStreamConstraints" class="dictionary-members">
+          <dt><dfn><code>semantics</code></dfn> of type <span class=
+          "idlMemberType">{{GetUserMediaSemantics}}</span></dt>
+          <dd>
+            <p>In cases where the specified constraints do not narrow
+            multiple choices between devices down to one per kind, specifies
+            how the final determination of which devices to pick from the
+            remaining choices MUST be made. If not specified, then the
+            <a data-link-for="MediaDevices">defaultSemantics</a> are used.
+            </p>
+          </dd>
+        </dl>
+      </section>
+    </div>
+  </section>
+  <section id="getusermediasemantics-enum">
+    <h3>GetUserMediaSemantics enum</h3>
+    <div>
+      <pre class="idl"
+>enum GetUserMediaSemantics {
+  "browser-chooses",
+  "user-chooses"
+};</pre>
+      <table data-link-for="GetUserMediaSemantics" data-dfn-for=
+      "GetUserMediaSemantics" class="simple">
+        <tbody>
+          <tr>
+            <th colspan="2"><dfn>GetUserMediaSemantics</dfn> Enumeration
+            description</th>
+          </tr>
+          <tr>
+            <td><dfn><code id=
+            "idl-def-GetUserMediaSemantics.browser-chooses">browser-chooses</code></dfn></td>
+            <td>
+              <p>When application-specified constraints do not narrow multiple
+              choices between devices down to one per kind, the user agent is
+              allowed to make the final determination between the remaining
+              choices.
+              </p>
+            </td>
+          </tr>
+          <tr>
+            <td><dfn><code id=
+            "idl-def-GetUserMediaSemantics.user-chooses">user-chooses</code></dfn></td>
+            <td>
+              <p>When application-specified constraints do not narrow
+              multiple choices between devices down to one per kind, the user
+              agent MUST
+              <a href="prompt-the-user-to-choose">prompt the user to choose</a>
+              between the remaining choices, even if the application already
+              has permission to some or all of them.</p>
+            </td>
+          </tr>
+        </tbody>
+      </table>
+    </div>
+  </section>
+  <section>
+    <h2>Algorithms</h2>
+    <p>When the {{MediaDevices/getUserMedia()}} method is invoked, run the
+    following steps before invoking the {{MediaDevices/getUserMedia()}}
+    algorithm:</p>
+    <ol>
+        <li>
+          <p>Let <var>mediaDevices</var> be the object on which this method was
+          invoked.</p>
+        </li>
+        <li>
+          <p>Let <var>constraints</var> be the method's first argument.</p>
+        </li>
+        <li>
+          <p>Let <var>semanticsPresent</var> be <code>true</code> if
+          <var>constraints</var><code>.semantics</code> [= map/exists =],
+          otherwise <code>false</code>.</p>
+        </li>
+        <li>
+          <p>Let <var>semantics</var> be
+          <var>constraints</var><code>.semantics</code>
+          if <a href="https://heycam.github.io/webidl/#dfn-present">present</a>,
+          or the value of <var>mediaDevices</var><code>.<a data-link-for="MediaDevices">defaultSemantics</a></code>
+          otherwise.</p>
+        </li>
+        <li>
+          <p>Replace step 6.5.1. of the {{MediaDevices/getUserMedia()}}
+          algorithm in its entirety with the following two steps:</p>
+          <ol>
+            <li>
+              <p>Let <var>descriptor</var> be a {{PermissionDescriptor}}
+              with its {{PermissionDescriptor/name}} member set to the permission name
+              associated with <var>kind</var> (e.g. {{PermissionName/"camera"}} for
+              <code>"video"</code>, {{PermissionName/"microphone"}} for <code>"audio"</code>), and,
+              optionally, consider its {{DevicePermissionDescriptor/deviceId}} member set to any appropriate
+              device's <var>deviceId</var>.</p>
+            </li>
+            <li>
+              <p>If the number of unique devices sourcing tracks of
+              media type <var>kind</var> in <var>candidateSet</var>
+              is greater than <code>1</code> and
+              <var>semantics</var> is <code>"user-chooses"</code>,
+              then <a>prompt the user to choose</a> a device with
+              <var>descriptor</var>, resulting in provided media.
+              Otherwise, <a>request permission to use</a> a
+              device with <var>descriptor</var>, while considering
+              all devices being attached to a live and
+              <a>same-permission</a> MediaStreamTrack in the current
+              [=browsing
+              context=] to mean having permission status {{PermissionState/"granted"}},
+              resulting in provided media.</p>
+              <p><dfn>Same-permission</dfn> in this context means a
+              {{MediaStreamTrack}} that required the same level of
+              permission to obtain as what is being requested.</p>
+              <p>When asking the user’s permission, the user agent
+              MUST disclose whether permission will be granted only to
+              the device chosen, or to all devices of that
+              <var>kind</var>.</p>
+              <p>Let <var>track</var> be the provided media, which
+              MUST be precisely one track of type <var>kind</var> from
+              <var>finalSet</var>. If <var>semantics</var> is
+              <code>"browser-chooses"</code> then the decision of
+              which track to choose from <var>finalSet</var> is up
+              to the User Agent, which MAY use the value of the computed
+              "fitness distance" from the <a href=
+              "https://www.w3.org/TR/mediacapture-streams/#dfn-selectsettings">
+              SelectSettings</a>
+              algorithm, the value of <var>semanticsPresent</var>,
+              or any other internally-available information about
+              the devices, as inputs to its decision.
+              If <var>semantics</var> is <code>"user-chooses"</code>,
+              and the application has not narrowed down the choices
+              to one, then the user agent MUST ask the user to make
+              the final selection.</p>
+              <p>Once selected, the source of the
+              {{MediaStreamTrack}} MUST NOT change.</p>
+              <p>User Agents are encouraged to default to or present
+              a default choice based primarily on fitness distance,
+              and secondarily on the user's primary or system default
+              device for <var>kind</var> (when possible). User Agents
+              MAY allow users to use any media source, including
+              pre-recorded media files.</p>
+            </li>
+          </ol>
+        </li>
+    </ol>
+  </section>
+  <section>
+    <h2>Examples</h2>
+    <div>
+      <p>This example shows a setup with a start button and a camera selector
+      using the new semantics (microphone is not shown for brievity but is
+      equivalent).</p>
+      <pre class="example">
+&lt;button id="start"&gt;Start&lt;/button&gt;
+&lt;button id="chosenCamera" disabled&gt;Camera: none&lt;/button&gt;
+&lt;script&gt;
+
+let cameraTrack = null;
+
+start.onclick = async () => {
+  try {
+    const stream = await navigator.mediaDevices.getUserMedia({
+      video: {deviceId: localStorage.cameraId}
+    });
+    setCameraTrack(stream.getVideoTracks()[0]);
+  } catch (err) {
+    console.error(err);
+  }
+}
+
+chosenCamera.onclick = async () => {
+  try {
+    const stream = await navigator.mediaDevices.getUserMedia({
+      video: true,
+      semantics: "user-chooses"
+    });
+    setCameraTrack(stream.getVideoTracks()[0]);
+  } catch (err) {
+    console.error(err);
+  }
+}
+
+function setCameraTrack(track) {
+  cameraTrack = track;
+  const {deviceId, label} = track.getSettings();
+  localStorage.cameraId = deviceId;
+  chosenCamera.innerText = `Camera: ${label}`;
+  chosenCamera.disabled = false;
+}
+&lt;/script&gt;
+      </pre>
+    </div>
+  </section>
+</body>
+</html>