index.bs

<pre class='metadata'>
Title: Open Screen Protocol
Shortname: openscreenprotocol
Level: 1
Status: w3c/ED
ED: https://webscreens.github.io/openscreenprotocol/
Canonical URL: ED
Editor: Mark Foltz, Google, https://github.com/mfoltzgoogle, w3cid 68454
Repository: webscreens/openscreenprotocol
Abstract: The Open Screen Protocol is a suite of network protocols that allow
          user agents to implement the [[PRESENTATION-API|Presentation API]] and
          the [[REMOTE-PLAYBACK|Remote Playback API]] in an interoperable
          fashion.
Group: Second Screen Community Group
Mailing List: public-webscreens@w3c.org
Mailing List Archives: https://lists.w3.org/Archives/Public/public-webscreens/
Markup Shorthands: markdown yes, dfn yes, idl yes
</pre>

<p boilerplate="copyright">
<a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">
  Copyright</a> © [YEAR] the Contributors to the [TITLE] Specification,
  published by the <a href="https://www.w3.org/community/webscreens/">
  Second Screen Community Group</a> under the
  <a href="https://www.w3.org/community/about/agreements/cla/">
  W3C Community Contributor License Agreement (CLA)</a>.  A human-readable
  <a href="http://www.w3.org/community/about/agreements/cla-deed/">summary</a>
  is available.
</p>

Issue: Add short names to Presentation API spec, so that BS autolinking works as designed.


Issue: Can autolinks to HTML51 be automatically generated?


<pre class="anchors">
urlPrefix: https://w3c.github.io/presentation-api/#dfn-; type: dfn; spec: PRESENTATION-API
    text: available presentation display
    text: controller
    text: controlling user agent
    text: controlling browsing context
    text: presentation
    text: presentation display
    text: presentation display availability
    text: presentation id
    text: presentation request url
    text: receiver
    text: receiving browsing context
    text: receiving user agent
    text: available presentation display
urlPrefix: https://w3c.github.io/presentation-api/; type: interface; spec: PRESENTATION-API
    text: PresentationConnection
urlPrefix: https://w3c.github.io/remote-playback/#dfn-; type: dfn; spec: REMOTE-PLAYBACK
    text: availability sources set
    text: compatible remote playback device
    text: initiate remote playback
    text: media element state
    text: media resources
    text: remote playback devices
    text: remote playback source
urlPrefix: https://www.w3.org/TR/html51/single-page.html; type: dfn; spec: HTML51
    text: media element
</pre>

<h2 class='no-num no-toc no-ref' id='status'>Status of this document</h2>

This specification was published by the [Second Screen Community
Group](https://www.w3.org/community/webscreens/). It is not a W3C Standard nor
is it on the W3C Standards Track. It should not be viewed as a stable
specification, and may change in substantial ways at any time. A future version
of this document will be published as a Community Group Report.

Please note that under the [W3C Community Contributor License Agreement
(CLA)](https://www.w3.org/community/about/agreements/cla/) there is a limited
opt-out and other conditions apply.

Learn more about [W3C Community and Business
Groups](http://www.w3.org/community/).

Introduction {#introduction}
============================

The Open Screen Protocol connects browsers to devices capable of rendering Web
content for a shared audience.  Typically, these are devices like
Internet-connected TVs, HDMI dongles, or "smart" speakers.

The protocol is a suite of subsidiary network protocols that enable two user
agents to implement the [[PRESENTATION-API|Presentation API]] and
[[REMOTE-PLAYBACK|Remote Playback API]] in an interoperable fashion.  This means
that a user can expect these APIs work as intended when connecting two devices
from independent implementations of the Open Screen Protocol.

The Open Screen Protocol is a specific implementation of these two APIs, meaning
that it does not handle all possible ways that browsers and presentation
displays could support these APIs.  The Open Screen Protocol specifically
supports browsers and displays that are connected via the same local area
network, and that initiate presentation or remote playback by sending a URL
from the browser to the target display.

The Open Screen Protocol is intended to be extensible, so that additional
capabilities can be added over time.  This may include new implementations of
existing APIs, or new APIs.

Terminology {#terminology}
--------------------------

We use the term "agent" to mean any implementation of this protocol
(browser, device, or otherwise), acting as a controller or a receiver.

We borrow terminology from the [[PRESENTATION-API|Presentation API]]. We call
the agent that is used to discover and initiate presentation of Web content on
another device the [=controller=] (or [=controlling user agent=] when it is a
browser).  We call the agent on the device rendering the Web content the
[=receiver=] or [=presentation display=] (or [=receiving user agent=] when it is
a browser). [=presentation display availability=] refers to whether or not a
[=receiver=] is compatible with a [=presentation request URL=].  However, in the
[[PRESENTATION-API|Presentation API]], a "controller" refers to as a specific
browsing context within the browser, whereas here the "controller" refers to the
browser itself, although it may be acting on behalf of a browsing context.

We borrow terminology from the [[REMOTE-PLAYBACK|Remote Playback API]].  The
agent responsible for rendering media of a remote playback is called the
[=remote playback device=]. In this document, we also refer to it as the
*receiver* because it is shorter and keeps terminology consistent between
presentations and remote playbacks. Similarly, we use the term "controller"
(referred to as the "user agent" in the [[REMOTE-PLAYBACK|Remote Playback API]])
to refer to the agent that starts, terminates, and controls the remote playback.

For media streaming, we refer to the agent sending media as the media
*sender* and the agent receiving the media as the media *receiver*.
Note that a media *receiver* may or may not be a *receiver* or
*controller* as defined by the Presentation API or Remote Playback
API.  Also note that an agent may be both a *sender* and *receiver*.

For additional terms and idioms specific to the [[PRESENTATION-API|Presentation
API]] or [[REMOTE-PLAYBACK|Remote Playback API]], please consult the respective
specifications.

Issue(144): Receiver/Controller/Agent terminology.

Requirements {#requirements}
============================

Presentation API Requirements {#requirements-presentation-api}
--------------------------------------------------------------

1.  A controlling user agent must be able to discover the presence of a
    presentation display connected to the same IPv4 or IPv6 subnet and reachable
    by IP multicast.

2.  A controlling user agent must be able to obtain the IPv4 or IPv6 address of
    the display, a friendly name for the display, and an IP port number for
    establishing a network transport to the display.

3.  A controlling user agent must be able to determine if the receiver is
    reasonably capable of rendering a specific [=presentation request URL=].

4.  A controlling user agent must be able to start a new presentation on a receiver given a
    [=presentation request URL=] and [=presentation ID=].

5.  A controlling user agent must be able to create a new
    {{PresentationConnection}} to an existing presentation on the
    receiver, given its [=presentation request URL=] and [=presentation ID=].

6.  It must be possible to close a {{PresentationConnection}} between a
    controller and a presentation, and signal both parties with the
    reason why the connection was closed.

7.  Multiple controllers must be able to connect to a single presentation
    simultaneously, possibly from from one or more [=controlling user agents=].

8.  Messages sent by the controller must be delivered to the presentation (or
    vice versa) in a reliable and in-order fashion.

9.  If a message cannot be delivered, then the controlling user agent must be
    able to signal the receiver (or vice versa) that the connection should be
    closed with reason `error`.

10. The controller and presentation must be able to send and receive `DOMString`
    messages (represented as `string` type in ECMAScript).

11. The controller and presentation must be able to send and receive binary
    messages (represented as `Blob` objects in HTML5, or `ArrayBuffer` or
    `ArrayBufferView` types in ECMAScript).

12. The controlling user agent must be able to signal to the receiver to
    terminate a presentation, given its [=presentation request URL=] and [=presentation
    ID=].

13. The receiver must be able to signal all connected controlling user agents
    when a presentation is terminated.


Remote Playback API Requirements {#requirements-remote-playback}
----------------------------------------------------------------

1.  A [=controlling user agent=] must be able to find out whether there is at
    least one compatible [=remote playback device=] available for a given
    {{HTMLMediaElement}}, both instantaneously and continuously.

2.  A controlling user agent must be able to to [=initiate remote playback=] of
    an {{HTMLMediaElement}} to a compatible remote playback device.

3.  The controlling user agent must be able send media sources as URLs and text
    tracks from an {{HTMLMediaElement}} to a compatible remote playback device.

4.  During remote playback, the controlling user agent and the remote playback
    device must be able to synchronize the [=media element state=] of the
    {{HTMLMediaElement}}.

5.  During remote playback, either the controlling user agent or the
    remote playback device must be able to disconnect from the other party.

6.  The controlling user agent should be able to pass locale and text
    direction information to the remote playback device to assist in rendering
    text during remote playback.


Non-Functional Requirements {#requirements-non-functional}
----------------------------------------------------------

1.  It should be possible to implement an Open Screen presentation display using
    modest hardware requirements, similar to what is found in a low end
    smartphone, smart TV or streaming device. See the [Device
    Specifications](device_specs.md) document for expected presentation display
    hardware specifications.

2.  It should be possible to implement an Open Screen controlling user agent on a
    low-end smartphone. See the [Device Specifications](device_specs.md) document
    for expected controlling user agent hardware specifications.

3.  The discovery and connection protocols should minimize power consumption,
    especially on the controlling user agent which is likely to be battery
    powered.

4.  The protocol should minimize the amount of information provided to a passive
    network observer about the identity of the user, activity on the controlling
    user agent and activity on the receiver.

5.  The protocol should prevent passive network eavesdroppers from learning
    presentation URLs, presentation IDs, or the content of presentation messages
    passed between controllers and presentations.

6.  The protocol should prevent active network attackers from impersonating a
    display and observing or altering data intended for the controller or
    presentation.

7.  The controlling user agent should be able to discover quickly when a
    presentation display becomes available or unavailable (i.e., when it connects
    or disconnects from the network).

8.  The controlling user agent should present sensible information to the user
    when a protocol operation fails.  For example, if a controlling user agent is
    unable to start a presentation, it should be possible to report in the
    controlling user agent interface if it was a network error, authentication
    error, or the presentation content failed to load.

9.  The controlling user agent should be able to remember authenticated
    presentation displays.  This means it is not required for the user to
    intervene and re-authenticate each time the controlling user agent connects
    to a pre-authenticated display.

10.  Message latency between the controller and a presentation should be minimized
    to permit interactive use.  For example, it should be comfortable to type in
    a form in the controller and have the text appear in the presentation in real
    time.  Real-time latency for gaming or mouse use is ideal, but not a
    requirement.

11. The controlling user agent initiating a presentation should communicate its
    preferred locale to the receiver, so it can render the presentation content
    in that locale.

12. It should be possible to extend the control protocol (above the discovery and
    transport levels) with optional features not defined explicitly by the
    specification, to facilitate experimentation and enhancement of the base
    APIs.


Discovery with mDNS {#discovery}
===============================

Agents may discover one another using [[RFC6763|DNS-SD]] over [[RFC6762|mDNS]].
To do so, agents must use the service name "_openscreen._udp.local".

Issue(107): Define suspend and resume behavior for discovery protocol.

Advertising Agents must use an instance name that is a prefix of the agent's
display name. If the instance name is not the complete display name (if it has
been truncated), it must be terminated by a null character.  It is prefix so
that the name displayed to the user pre-verification can be verified later.  It
is terminated by a null character in the case of truncation so that the
listening agent knows it has been truncated.  This complexity is necessary to
all for display names that exceed the size allowed in an instance name and for
such (possibly  truncated) display names to be visible to the user sooner
(before a QUIC connection is made).  Listening agents must treat instance names
as unverified and must verify that the instance name is a prefix of the verified
display name before showing the user a verified display name.

Agents should use the complete display name to the user rather than a
truncated display name.

Advertising agents must include DNS TXT records with the following
keys and values:

: fp
:: The certificate fingerprint of the advertising agent.
    The format of the fingerprint is defined by [RFC 8122 section
    5](https://tools.ietf.org/html/rfc8122#section-5), excluding the
    "fingerprint:" prefix and including the hash function, space, and hex-encoded
    fingerprint.  The fingerprint value also functions as an ID for the agent.
    All agents must support the following hash functions: "sha-256", "sha-512".
    Agents must not support the following hash functions: "md2", "md5".

Issue: Include cross references to the specs for these hash functions.

: mv
:: An unsigned integer value that indicates that
    metadata has changed.   The advertising agent must update it to a greater
    value.  This signals to the listening agent that it should connect to the
    advertising agent to discover updated metadata.

Issue: Add examples of sample mDNS records.


Future extensions to this QUIC-based protocol can use the same metadata
discovery process to indicate support for those extensions, through a
capabilities mechanism to be determined. If a future version of the Open Screen
Protocol uses mDNS but breaks compatibility with the metadata discovery process,
it should change the DNS-SD service name to a new value, indicating a new
mechanism for metadata discovery.


Transport and metadata discovery with QUIC {#transport}
=======================================================

If an agent wants to connect to or learn further metadata about another agent,
it initiates a [[QUIC]] connection to the IP and port from the SRV record.
Prior to authentication, a message may be exchanged (such as further metadata),
but such info should be treated as unverified (such as indicating to a user that
a display name of an unauthenticated agent is unverified).

The connection IDs used both by the [[QUIC]] client and server should
be zero length.  If zero length connection IDs are chosen, agents are
restricted from changing IP or port without establishing a new QUIC
connection.  In such cases, clients and servers must establish a new
QUIC connection in order to change IP or port.

To learn further metadata, an agent may send an agent-info-request
message (see [[#appendix-a]]) and receive back an agent-info-response message.
Any agent may send this request to learn about the capabilities of
another device.

The agent-info-response message contains the following properties:

: display-name (required)
:: The display name of the agent, intended to be displayed to a user by the
     requester. The requester should indicate through the UI if the responder
     is not authenticated or if the display name changes.

: model-name (optional)
:: If the agent is a hardware device, the model name of
    the device.  This is used mainly for debugging purposes, but may be
    displayed to the user of the requesting agent.

: capabilities (required)
:: The control protocols, roles, and media types the agent supports.
    Presence indicates a capability and absence indicates lack of a
    capability.  Capabilities should should affect how an agent is
    presented to a user, such as drawing a different icon depending on
    the media types it supports.

The various capabilities have the following meanings:

: receive-audio
:: The agent can generally receive audio for the control protocols it
    supports.  Each control protocol can have more specific capability
    mechanisms, such as support for specific URLs in the presentation
    protocol.

: receive-video
:: The agent can generally receive video for the control protocols it
    supports.  Each control protocol can have more specific capability
    mechanisms, such as support for specific URLs in the presentation
    protocol.

: receive-presentation
:: The agent can receive presentations using the presentation protocol.

: control-presentation
:: The agent can control presentations using the presentation protocol.

: receive-remote-playback
:: The agent can receive remote playback using the remote playback
    protocol.

: control-remote-playback
:: The agent can control remote playback using the remote playback
    protocol.

: receive-streaming
:: The agent can receiving streaming using the streaming protocol.

: send-streaming
:: The agent can send streaming using the streaming protocol.


Listening agents act as QUIC clients.  Advertising agents act as QUIC servers.

If a listening agent wishes to receive messages from an advertising agent or an
advertising agent wishes to send messages to a listening agent, it may wish to
keep the QUIC connection alive.  Once neither side needs to keep the connection
alive for the purposes of sending or receiving messages, the connection should
be closed with an error code of 5139.  In order to keep a QUIC connection alive, an
agent may send an agent-status-request message, and any agent that receives an
agent-status-request message should send an agent-status-response message. Such
messages should be sent more frequently than the QUIC idle_timeout transport
parameter (see section 18 of [[QUIC]]) and QUIC PING
frames should not be used.  An idle_timeout transport parameter of 25 seconds is
recommended.  The agent should behave as though a timer less than the
idle_timeout were reset every time a message is sent on a QUIC stream.  If the
timer expires, a agent-status-request message should be sent.

If a client agent wishes to send messages to a server agent, the client
agent can connect to the server agent "on demand"; it does not need to
keep the connection alive.

Issue(108): Define suspend and resume behavior for connection protocol.

The agent-info-response message and agent-status-response messages may be
extended to include additional information not defined in this spec.  If done
ad-hoc by applications and not in future specs, keys should be chosen to avoid
collision, such as by choosing large integers or long strings.  Agents must
ignore keys in the agent-info-message that it does not understand to allow
agents to easily extend this message.

Messages delivery using CBOR and QUIC streams {#messages}
========================================================

Messages are serialized using [[RFC7049|CBOR]].  To
send a group of messages in order, that group of messages must be sent in one
QUIC stream.  Independent groups of messages (with no ordering dependency
across groups) should be sent in different QUIC streams.  In order to put
multiple CBOR-serialized messages into the the same QUIC stream, the following
is used.

For each message, the sender must write to the QUIC stream the following:

1.  A type key representing the type of the message, encoded as a variable-length
    integer (see [[#appendix-a]] for type keys)

2.  The message encoded as CBOR.

If an agent receives a message for which it does not recognize a
type key, it must close the QUIC connection with an application error
code of 404 and should include the unknown type key in the reason phrase
(see [[QUIC#section-19.4|QUIC transport section 19.4]]).

Variable-length integers are encoded in the same format as defined by
[[QUIC#section-16| QUIC transport section 16]].

Many messages are requests and responses, so a common format is defined for
those.  A request and a response includes a request ID which is an unsigned
integer chosen by the requester.  Responses must include the request ID of the
request they are associated with.

Issue(139): Clarify scoping/uniqueness of request IDs.

Authentication {#authentication}
================================

Each supported authentication method is implemeted via authentication messages
specific to that method.  The authentication method is explicitly specified by
the message itself.  The authentication status message is common for all authentication
methods.  Any new authentication method added must define new authentication messages.
The default authentication method is a challenge-response authentication with
auth-request-hkdf-scrypt-psk and auth-response-hkdf-scrypt-psk-result.

Prior to authentication, agents exchange auth-capabilities messages specifying
pre-shared key (PSK) ease of input for the user and supported PSK input methods.
The agent with the lowest PSK ease of input presents a PSK to the user when the agent
either sends or receives an authentication request.  In case both agents have the same
PSK ease of input value, the server presents the PSK to the user.  The same pre-shared key
is used by both agents.  The agent presenting the PSK to the user is the PSK presenter,
the agent requiring the user to input the PSK is the PSK consumer.

PSK ease of input is an integer in the range from 0 to 100 inclusive, where 0 means
it is not possible for the user to input PSK on this device and 100 means
that it's easy for the user to input PSK on the device.  Supported PSK input methods
are numeric and scanning a QR-code.  Devices with non-zero PSK ease of input must
support the numeric PSK input method.

For all messages and objects defined in this section, see Appendix A for
the full CDDL definitions.

The default authentication method is
\[SPAKE2](https://tools.ietf.org/html/draft-irtf-cfrg-spake2-08) with
the following cipher suite:

1. Elliptic curve is [Curve25519](https://tools.ietf.org/html/rfc7748#page-4).
2. Hash function is \[SHA-512](https://tools.ietf.org/html/rfc6234).
3. Key derivation function is \[HKDF](https://tools.ietf.org/html/rfc5869).
4. Message authentication code is \[HMAC](https://tools.ietf.org/html/rfc2104).
5. Password hash function is SHA-512.

Open Screen Protocol does not use a memory-hard hash function to hash PSKs
with SPAKE2 and uses SHA-512 instead as the PSK is one-time use and
is not stored in any form.

SPAKE2 provides explicit mutual authentication.

This authentication method assumes the agents share a low-entropy secret,
such as a number or a short password that could be entered by a user on a
phone, a keyboard or a TV remote control.

SPAKE2 is not symmetric and has two roles, Alice (A) and Bob (B).
The client acts as Alice, the server acts as Bob.

The messages used in this authentication method are: auth-spake2-need-psk,
auth-spake2-message, auth-spake2-confirmation and auth-status.
SPAKE2 describes in detail how auth-spake2-message and auth-spake2-confirmation
are computed.

If the PSK presenter wants to perform authentication, the PSK presenter
starts the authentication process by presenting the PSK to the user and sending
a auth-spake2-message message. When the PSK consumer receives
the auth-spake2-message message, the PSK consumer prompts the user for the PSK
input if it has not done so yet.

If the PSK consumer wants to perform authentication, the PSK consumer
sends a auth-spake2-need-psk message to the PSK presenter to start authentication
process and prompts the user to input the PSK. If the PSK presenter receives
a auth-spake2-need-psk message after starting authentication from their side,
the PSK presenter ignores the auth-spake2-need-psk message.

After the user inputs the PSK into the PSK consumer, the PSK consumer computes
and sends a auth-spake2-message.

When either agent both knows the PSK and has received a auth-spake2-message
message, the agent computes and sends a auth-spake2-confirmation message.

When either agent has received both auth-spake2-message and
auth-spake2-confirmation messages, the agent validates the confirmation message
and sends the auth-status authenticated message.

Control Protocols {#control-protocols}
============================

Presentation Protocol {#presentation-protocol}
---------------------------------------------

This section defines the use of the Open Screen Protocol for starting,
terminating, and controlling presentations as defined by
[[PRESENTATION-API|Presentation API]]. [[#presentation-api]]
defines how APIs in [[PRESENTATION-API|Presentation API]] map to the
protocol messages defined in this section.

For all messages defined in this section, see [[#appendix-a]] for the full
CDDL definitions.

Issue(123): Add a capability that indicates support for the presentation protocol.


Issue(160): Refinements to Presentation API protocol.

To learn which receivers are [=available presentation displays=] for a
particular URL or set of URLs, the controller may send a
presentation-url-availability-request message with the following values:

: urls
:: A list of presentation URLs.  Must not be empty.

: watch-duration
:: The period of time that the controller is interested in receiving updates
    about their URLs, should the availability change.

: watch-id
:: An identifier the receiver must use when sending updates about URL
    availability so that the controller knows which URLs the receiver is referring
    to.  The controller must choose a value that is unique across all
    presentation URL availability watches to the same receiver.

Issue(145): Watch ID Uniqueness.

In response, the receiver should send one presentation-url-availability-response
message with the following values:

: url-availabilities
:: A list of URL availability states.  Each state must correspond to the matching URL
    from the request by list index.


While the watch is valid (the watch-duration has not expired), the receivers
should send remote-playback-availability-event messages when URL availabilities change.
Such events contain the following values:

: watch-id
:: The watch-id given in the presentation-url-availability-response,
    used to refer to the presentation URLs whose availability has changed.

: url-availabilities
:: A list of URL availability states.  Each state must correspond to the URLs from the
    request referred to by the watch-id.

Note that these messages are not broadcasted to all controllers. They are sent
individually to controllers that have requested availability for the URLs that
have changed in availability state within the watch duration of the original
availability request.

To save power, the controller may disconnect the QUIC connection and
later reconnect to send availability requests and receive availability
responses and updates.  The QUIC connection ID may or may not be the same
when reconnecting.  Note that the lifetime of a watch-id is not limited
to one QUIC connection.  The receiver must continue sending updates for watches
even if the QUIC connection changes, and thus controller need not send
new URL availability requests if the QUIC connection changes.


To start a presentation, the controller may send a
presentation-start-request message to the receiver with the following
values:

: presentation-id
:: The presentation identifier

: url
:: The selected presentation URL

: headers
:: headers that the receiver should use to fetch the
    presentationUrl.  For example,
    [[PRESENTATION-API#establishing-a-presentation-connection|section 6.6.1]] of
    the Presentation API says that the Accept-Language header should be
    provided.

The presentation ID must follow the restrictions defined by
[[PRESENTATION-API#common-idioms|section 6.1]] of the Presentation API, in that
it must consist of at least 16 ASCII characters.


When the receiver receives the presentation-start-request, it should send back a
presentation-start-response message after either the presentation URL has been
fetched and loaded, or the receiver has failed to do so. If it has failed, it
must respond with the appropriate result (such as invalid-url or timeout).  If
it has succeeded, it must reply with a success result.  Additionally, the
response must include the following:

: connection-id
:: An ID that both agents can use to send connection messages
    to each other.  It is chosen by the receiver for ease of implementation: if
    the message receiver chooses the connection-id, it may keep the ID unique
    across connections, thus making message demuxing/routing easier.

To send a presentation message, the controller or receiver may send a
presentation-connection-message with the following values:

: connection-id
:: The ID from the presentation-start-response or
    presentation-connection-open-response messages.

: message
:: The presentation message data.


To terminate a presentation, the controller may send a
presentation-termination-request message with the following values:

: presentation-id
:: The ID of the presentation to terminate.

: reason
:: The reason the presentation is being terminated.


When a [=receiver=] receives a presentation-termination-request, it should
send back a presentation-termination-response message to the requesting
controller.  It should also notify other controllers about the termination by sending
a presentation-termination-event message.  And it can send the same message if
it terminates a presentation without a request from a controller to do so. This
message contains the following values:

: presentation-id
:: The ID of the presentation that was terminated.

: reason
:: The reason the presentation was terminated.

To accept incoming connections requests from controller, a receiver
must receive and process the presentation-connection-open-request
message which contains the following values:

: presentation-id
:: The ID of the presentation to connect to.

: url
:: The URL of the presentation to connect to.

The receiver should, upon receipt of a
presentation-connection-open-request message, send back a
presentation-connection-open-response message which contains the
following values:

: result
:: a code indicating success or failure, and the reason for the failure

: connection-id
:: An ID that both agents can use to send connection messages
    to each other.  It is chosen by the receiver for ease of implementation (if
    the message receiver chooses the connection-id, it may keep the ID unique
    across connections, thus making message demuxing/routing easier).


A controller may terminate a connection without terminating the presentation by
sending a presentation-connection-close-request message with the following
values:

: connection-id
:: The ID of the connection to close.

Issue(124): Is a Presentation close/terminate from a controller a request/response or event?

The receiver should, upon receipt of a presentation-connection-close-request,
send back a presentation-connection-close-response message with the following
values:

: result
:: If the close succeed or failed, and if it failed why it failed.

Issue(138): Remove presentation-connection-close-response message.

The receiver may also close a connection without a request from the controller
to do so and without terminating a presentation.  If it does so, it should send
a presentation-connection-close-event to the controller with the following
values:

: connection-id
:: The ID of the connection that was closed

: reason
:: The reason the connection was closed

: error-message
:: A debug message suitable for a log or perhaps presented to
    the user with more explanation as to why it was closed.


Presentation API {#presentation-api}
---------------------------------------------

This section defines how the [[PRESENTATION-API|Presentation API]] uses the
[[#presentation-protocol]].

When [[PRESENTATION-API#the-list-of-available-presentation-displays|section
6.4.2]] says "This list of presentation displays ... is populated based on an
implementation specific discovery mechanism", the [=controlling user agent=] may
use the mDNS, QUIC, agent-info-request, and
presentation-url-availability-request messages defined previously in this spec
to discover receivers.

When [[PRESENTATION-API#the-list-of-available-presentation-displays|section
6.4.2]] says "To further save power, ... implementation specific discovery of
presentation displays can be resumed or suspended.", the [=controlling user
agent=] may use the power saving mechanism defined in the previous section.

When [[PRESENTATION-API#starting-a-presentation-connection|section 6.3.4]] says
"Using an implementation specific mechanism, tell U to create a receiving
browsing context with D, presentationUrl, and I as parameters.", U (the
[=controlling user agent=]) may send a presentation-start-request message to D
(the receiver), with I for the presentation identifier and presentationUrl for
the selected presentation URL.

Issue: Once the Presentation API has text about reconnecting via an
implementation specific mechanism, quote that here and map it to a message.

When [[PRESENTATION-API#sending-a-message-through-presentationconnection|section
6.5.2]] says "Using an implementation specific mechanism, transmit the contents
of messageOrData as the presentation message data and messageType as the
presentation message type to the destination browsing context", the
[=controlling user agent=] may send a presentation-connection-message with
messageOrData for the presentation message data.  Note that the messageType is
embedded in the encoded CBOR type and does not need an additional value in the
message.

When
[[PRESENTATION-API#terminating-a-presentation-in-a-controlling-browsing-context|section
6.5.6]] says "Send a termination request for the presentation to its receiving
user agent using an implementation specific mechanism", the [=controlling user
agent=] may send a presentation-termination-request message.

When [[PRESENTATION-API#monitoring-incoming-presentation-connections|section
6.7.1]]
says "it MUST listen to and accept incoming connection requests from a
controlling browsing context using an implementation specific
mechanism", the [=receiving user agent=] must receive and process the
presentation-connection-open-request.

When [[PRESENTATION-API#monitoring-incoming-presentation-connections|section
6.7.1]] says "Establish the connection between the controlling and receiving
browsing contexts using an implementation specific mechanism.", the [=receiving
user agent=], must send a presentation-connection-open-response message.


Remote Playback Protocol {#remote-playback-protocol}
----------------------------------------------------

This section defines the use of the Open Screen Protocol for starting, terminating,
and controlling remote playback of media as defined by the
[[REMOTE-PLAYBACK|Remote Playback API]].  [[#remote-playback-api]] defines how
APIs in [[REMOTE-PLAYBACK|Remote Playback API]] map to the protocol messages
defined in this section.

For all messages defined in this section, see Appendix A for the full
CDDL definitions.

Issue(123): Add a capability that indicates support for the remote playback protocol.


Issue(148): Make a required/default remote playback state table.


Issue(159): Refinements to Remote Playback protocol.

To learn which receivers are [=compatible remote playback device=]s (also called
available [=remote playback devices=]) for a particular URL or set of URLs, the
controller may send a remote-playback-availability-request message with the
following values:

: urls
:: A list of [=media resources=].  Must not be empty.

Issue(146): Remote Playback HTTP headers.

: headers
:: headers that the receiver should use to fetch the
    urls.  For example,
    [[REMOTE-PLAYBACK#establishing-a-connection-with-a-remote-playback-device|section 6.2.4 of
    the Remote Playback API]] says that the Accept-Language header should be
    provided.

: watch-duration
:: The period of time that the controller is interested in receiving updates
    about their URLs, should the availability change.

: watch-id
:: An identifier the receiver must use when sending updates about URL
    availability so that the controller knows which URLs the receiver is referring
    to. The controller must choose a value that is unique across all
    remote playback availability watches to the same receiver.

In response, the receiver should send a remote-playback-availability-response
message with the following values:

: url-availabilities
:: A list of URL availability states.  Each state must correspond to the matching URL
    from the request by list index.


The receivers should later (up to the current time plus request
watch-duration) send remote-playback-availability-event  messages if
URL availabilities change.  Such events contain the following values:

: watch-id
:: The watch-id given in the remote-playback-url-availability-response,
    used to refer to the remote playback URLs whose availability has changed.

: url-availabilities
:: A list of URL availability states.  Each state must correspond to the URLs from the
    request referred to by the watch-id.

Note that these messages are not broadcasted to all controllers. They are sent
individually to controllers that have requested availability for the URLs that
have changed in availability state within the watch duration of the original
availability request.

To save power, the controller may disconnect the QUIC connection and
later reconnect to send availability requests and receive availability
responses and updates. The QUIC connection ID may or may not be the same
when reconnecting.  Note that the lifetime of a watch-id is not limited
to one QUIC connection.  The receiver must continue sending updates for watches
even if the QUIC connection changes, and thus controller need not send
new URL availability requests if the QUIC connection changes.


To start remote playback, the controller may send a
remote-playback-start-request message to the receiver with the following
values:

: remote-playback-id
:: An identifier that uniquely identifies the remote playback from the
    controller to the receiver.  It does not need to be unique across all remote
    playbacks from that controllers to all receivers nor unique across all remote
    playbacks from all controllers to that receivers.

: urls
:: The [=media resources=] that the controller has selected for playback on the
    receiver.

: text-track-urls
:: URLs of text tracks associated with the [=media resources=].

: controls
:: Initial controls for modifying the initial state of the remote playback, as
    defined in [[#remote-playback-state-and-controls]].  The controller may send
    controls that are optional for the receiver to support before it knows the
    receiver supports them.  If the receiver does not support them, it will
    ignore them and the controller will learn that it does not support them from
    the remote-playback-start-response message.

Issue(147): Remote playback ID uniqueness.

When the receiver receives a remote-playback-start-request message, it should
send back a remote-playback-start-response message.  It should do so quickly,
usually before the [=media resource=] has been loaded and instead give updates
of the progress of  loading with remote-playback-state-event messages, unless
the receiver decides to not attempt to load the resource at all.  If it chooses
not to, it must respond with the appropriate failure result (such as timeout or
invalid-url).  Additionally, the response must include the following:

: state
:: The initial state of the remote playback, as defined in
    [[#remote-playback-state-and-controls]].

If the controller wishes to modify the state of the remote playback (for
example, to pause, resume, skip, etc), it may send a
remote-playback-modify-request message with the following values:

: remote-playback-id
:: The ID of the remote playback to be modified.

: controls
:: Updated controls as defined in {#remote-playback-state-and-controls}

When a receiver receives a remote-playback-modify-request it should send a
remote-playback-modify-response message in reply with the following values:

: state
:: The updated state of the remote playback as defined in
    [[#remote-playback-state-and-controls]].

When the state of remote playback changes without request for modification from
the controller (such as when the skips or pauses due to user user interaction on
the receiver), the receiver may send a remote-playback-state-event to the
controller.

: remote-playback-id
:: The ID of the remote playback whose state has changed.

: state
:: The updated state of the remote playback, as defined in
    [[#remote-playback-state-and-controls]].


To terminate the remote playback, the controller may send a
remote-playback-termination-request message with the following values:

: remote-playback-id
:: The ID of the remote playback to terminate.

: reason
:: The reason the remote playback is being terminated.

When a receiver receives a remote-playback-termination-request, it should
send back a remote-playback-termination-response message to the controller.

If a receiver terminates a remote playback without a request from the controller
to do so, it must send a remote-playback-termination-event message to the
controller with the following values:

: remote-playback-id
:: The ID of the remote playback that was terminated.

: reason
:: The reason the remote playback was terminated.

As mentioned in
[[REMOTE-PLAYBACK#disconnecting-from-a-remote-playback-device|Remote Playback
API section 6.2.7]], terminating the remote playback means the controller is no
longer controlling the remote playback and does not necessarily stop media from
rendering on the receiver.  Whether or not the receiver stops rendering media depends
upon the implementation of the receiver.

Remote Playback State and Controls {#remote-playback-state-and-controls}
------------------------------------------------------------------------

In order for the controller and receiver to stay in sync with regards to the
state of the remote playback, the controller may send controls to modify the state
(for example, via the remote-playback-modify-request message) and the receiver
may send updates about state changes (for example, via the
remote-playback-state-event message).

The controls sent by the controller include the following individual control
values, each of which is optional.  This allows the controller to change one
control value or many control values at once without having to specify all
control values every time.  A non-present control value indicates no change.  A
present control value indicates the change defined below. These controls
intentionally mirror settable attributes and methods of the
[HtmlMediaElement](https://html.spec.whatwg.org/multipage/media.html#htmlmediaelement).

: source
:: Change the [=media resource=] URL. See
    [HtmlMediaElement.src](https://html.spec.whatwg.org/multipage/media.html#dom-media-src)
    for more details. Must not be used in the initial controls of the
    remote-playback-start-request message (which already contains a list of URLs).

: preload
:: Set how aggressively to preload media. See
    [HtmlMediaElement.preload](https://html.spec.whatwg.org/multipage/media.html#dom-media-preload)
    for more details. Should only be used in the initial controls of the
    remote-playback-start-request message or when the source is changed.  If not
    set in the initial controls, it is left to the receiver to decide.  This is
    optional for the receiver to support and if not supported, the receiver will
    behave as though it were never set.

: loop
:: Set whether or not to loop media. See
    [HtmlMediaElement.loop](https://html.spec.whatwg.org/multipage/media.html#dom-media-loop)
    for more details. Should only be used in the initial control of the
    remote-playback-start-request.  If not set in the initial controls, it is
    assumed to be false.

: paused
:: If true, pause; if false, resume. See
    [HtmlMediaElement.pause()](https://html.spec.whatwg.org/multipage/media.html#dom-media-pause).
    and
    [HtmlMediaElement.play()](https://html.spec.whatwg.org/multipage/media.html#dom-media-play)
    for more details.  If not set in the initial controls, it is left to the
    receiver to decide.

: muted
:: If true, mute; if false, unmute. See
    [HtmlMediaElement.muted](https://html.spec.whatwg.org/multipage/media.html#dom-media-muted)
    for more details.  If not set in the initial controls, it is left to the
    receiver to decide.

: volume
:: Set the audio volume in the range from 0.0 to 1.0 inclusive. See
    [HtmlMediaElement.volume](https://html.spec.whatwg.org/multipage/media.html#dom-media-volume)
    for more details.  If not set in the initial controls, it is left to the
    receiver to decide.

: seek
:: Seek to a precise time. See
    [HtmlMediaElement.currentTime](https://html.spec.whatwg.org/multipage/media.html#dom-media-currenttime)
    for more details.

: fast-seek
:: Seek to an approximate time as fast as possible. See
    [HtmlMediaElement.fastSeek()](https://html.spec.whatwg.org/multipage/media.html#dom-media-fastseek)
    for more details.

: playback-rate
:: Set the rate a which the media plays. See
    [HtmlMediaElement.playbackRate](https://html.spec.whatwg.org/multipage/media.html#dom-media-playbackrate)
    for more details.  If not set in the initial controls, it is left to the
    receiver to decide.  This is optional for the receiver to support and if not
    supported, the receiver will behave as though it were never set.

: poster
:: Set the URL of an image to show when video data is not available. See
    [HtmlMediaElement.poster](https://html.spec.whatwg.org/multipage/media.html#dom-media-poster)
    for more details. If not set in the initial controls, no poster is used and
    the receiver can choose what to render when video data is unavailable.  This
    is optional for the receiver to support and if not supported, the receiver
    will behave as though it were never set.

: enabled-audio-track-ids
:: Enable included audio tracks by ID and disable all other audio tracks. See
    [HtmlMediaElement.audioTracks](https://html.spec.whatwg.org/multipage/media.html#dom-media-audiotracks)
    for more details.

: select-video-track-id
:: Select the given video track by ID and unselect all other video tracks. See
    [HtmlMediaElement.videoTracks](https://html.spec.whatwg.org/multipage/media.html#dom-media-videotracks)
    for more details.

: added-text-tracks
:: Add text tracks with the given kinds, labels, and languages. See
    [HtmlMediaElement.addTextTrack](https://html.spec.whatwg.org/multipage/media.html#dom-media-addtexttrack)
    for more details.  This is optional for the receiver to support and if not
    supported, the receiver will behave as though it were never set.

: changed-text-tracks
:: Change text tracks by ID.  All other text tracks are left
    unchanged.  Set the mode, add cues, and remove cues by id. See
    [HtmlMediaElement.textTracks](https://html.spec.whatwg.org/multipage/media.html#dom-media-texttracks)
    for more details.  Note that future specifications or extensions to this
    specifications are expected to add new properties to the text-track-cue
    (such as text size, alignment, position, etc).  Adding and removing
    cues is optional for the receiver to support and if not supported, the
    receiver will behave as though no cues were added or removed (both adding
    and removing are indicated via the support for "added-cues").  As specified in
    [HtmlMediaElement.textTracks](https://html.spec.whatwg.org/multipage/media.html#dom-media-texttracks),
    if a cue ID is invalid (removing an un-added ID or adding an ID twice, for example),
    the receiver may reject the text track change.

Issue: Add a table for whether it's required and what the default is.

The states sent by the receiver include the following individual state values,
each of which is optional.  This allows the receiver to update the controller
about more than one state value at once without having to specify all
state values every time.  A non-present state value indicates the state has not
changed.

: supports
:: The controls the receiver supports.  These may differ for different [=media
    resource=]s and should not changes unless the [=media resource=] changes.
    The default is empty (support for nothing)
    for the initial state in the remote-playback-start-response message.

: source
:: The current [=media resource=] URL. See
    [HtmlMediaElement.currentSrc](https://html.spec.whatwg.org/multipage/media.html#dom-media-currentsrc).
    Must be present in the initial state in the remote-playback-start-response message.

: loading
:: The state of network activity for loading the [=media resource=]. See
    [HtmlMediaElement.networkState](https://html.spec.whatwg.org/multipage/media.html#dom-media-networkstate).
    The default is empty (NETWORK_EMPTY)
    for the initial state in the remote-playback-start-response message.

: loaded
:: The state of the loaded media (whether enough is loaded to play). See
    [HtmlMediaElement.readyState](https://html.spec.whatwg.org/multipage/media.html#dom-media-readystate).
    The default is nothing (HAVE_NOTHING)
    for the initial state in the remote-playback-start-response message.

: error
:: A major error occurred which prevents the remote playback from continuing. See
    [HtmlMediaElement.error](https://html.spec.whatwg.org/multipage/media.html#dom-media-error) and
    [HtmlMediaElement media error codes](https://html.spec.whatwg.org/multipage/media.html#concept-mediaerror-code).
    The default is no error
    for the initial state in the remote-playback-start-response message.

: epoch
:: The "zero time" of the media timeline. See
    [HtmlMediaElement's timeline offset](https://html.spec.whatwg.org/multipage/media.html#timeline-offset) and
    [HtmlMediaElement.getStartDate()](https://html.spec.whatwg.org/multipage/media.html#dom-media-getstartdate).
    The default is an unknown epoch
    for the initial state in the remote-playback-start-response message.

: duration
:: The duration of the media timeline. See
    [HtmlMediaElement.duration](https://html.spec.whatwg.org/multipage/media.html#dom-media-duration).
    The default is an unknown duration
    for the initial state in the remote-playback-start-response message.

: buffered-time-ranges
:: The time ranges for which media has been buffered. See
    [HtmlMediaElement.buffered](https://html.spec.whatwg.org/multipage/media.html#dom-media-buffered).

: played-time-ranges
:: The time ranges reached by the playback position during normal playback. See
    [HtmlMediaElement.played](https://html.spec.whatwg.org/multipage/media.html#dom-media-played).

: seekable-time-ranges
:: The time ranges for which media is seekable by the controller or the receiver. See
    [HtmlMediaElement.seekable](https://html.spec.whatwg.org/multipage/media.html#dom-media-seekable).

: position
:: The playback position. See
    [HtmlMediaElement's official playback
    position](https://html.spec.whatwg.org/multipage/media.html#official-playback-position)
    and
    [HtmlMediaElement.currentTime](https://html.spec.whatwg.org/multipage/media.html#dom-media-currenttime).
    The default is 0
    for the initial state in the remote-playback-start-response message.

: playbackRate
:: The current rate of playback on a scale where 1.0 is "normal speed". See
    [HtmlMediaElement.playbackRate](https://html.spec.whatwg.org/multipage/media.html#dom-media-playbackrate).
    The default is 1.0
    for the initial state in the remote-playback-start-response message.

: paused
:: Whether media is paused or not. See
    [HtmlMediaElement.paused](https://html.spec.whatwg.org/multipage/media.html#dom-media-paused).
    The default is false
    for the initial state in the remote-playback-start-response message.

: seeking
:: Whether the receiver is seeking or not. See
    [HtmlMediaElement.seeking](https://html.spec.whatwg.org/multipage/media.html#dom-media-seeking).
    The default is false
    for the initial state in the remote-playback-start-response message.

: stalled
:: If true, media is not playing because not enough media is loaded, and false otherwise. See
    [HtmlMediaElement.stalled](https://html.spec.whatwg.org/multipage/media.html#event-media-stalled).
    The default is false
    for the initial state in the remote-playback-start-response message.

: ended
:: Whether media has reached the end or not. See
    [HtmlMediaElement.ended](https://html.spec.whatwg.org/multipage/media.html#dom-media-ended).
    The default is false
    for the initial state in the remote-playback-start-response message.

: volume
:: The current volume of playback on a scale of 0.0 to 1.0. See
    [HtmlMediaElement.volume](https://html.spec.whatwg.org/multipage/media.html#dom-media-volume).

: muted
:: True if audio is muted (overriding the volume value) and false otherwise.
    See
    [HtmlMediaElement.muted](https://html.spec.whatwg.org/multipage/media.html#dom-media-muted).

: resolution
:: The "intrinsic width" and "intrinsic width" of the video. See
    [HtmlMediaElement.videoWidth](https://html.spec.whatwg.org/multipage/media.html#dom-media-videowidth)
    and
    [HtmlMediaElement.videoHeight](https://html.spec.whatwg.org/multipage/media.html#dom-media-videoheight).

: audio-tracks
:: The available audio tracks, which can individually enabled or disabled. See
    [HtmlMediaElement.audioTracks](https://html.spec.whatwg.org/multipage/media.html#dom-media-audiotracks)

: video-tracks
:: The available video tracks.  Only one may be selected. See
    [HtmlMediaElement.videoTracks](https://html.spec.whatwg.org/multipage/media.html#dom-media-videotracks)

: text-tracks
:: The available text tracks, which can be individually shown, hidden, or disabled. See
    [HtmlMediaElement.textTracks](https://html.spec.whatwg.org/multipage/media.html#dom-media-videotracks).
    The controller can also add cues to and remove cues from text tracks.

All times, time ranges, and durations (such as position, duration, and
seekable-time-ranges) used above use a common media-time value (see Appendix A)
which includes a time scale.  This allows time values which work on different
time scales to be expressed without loss of precision.  The scale is represented
in hertz, such as 90000 for 90000hz, a common time scale for video.


Remote Playback API {#remote-playback-api}
------------------------------------------

This section defines how the [[REMOTE-PLAYBACK|Remote Playback API]] uses the
messages defined in [[#remote-playback-protocol]].

When [[REMOTE-PLAYBACK#the-list-of-available-remote-playback-devices|section
6.2.1.2]] says "This list contains remote playback devices and is populated
based on an implementation specific discovery mechanism" and
[[REMOTE-PLAYBACK#the-list-of-available-remote-playback-devices|section
6.2.1.4]] says "Retrieve available remote playback devices (using an
implementation specific mechanism)", the user agent may use the
mDNS, QUIC, agent-info-request, and remote-playback-availability messages
defined previously in this spec to discover [=remote playback devices=].  The
remote-playback-availability urls must contain the [=availability sources set=].

When
[[REMOTE-PLAYBACK#establishing-a-connection-with-a-remote-playback-device|section
6.2.4]] says "Request connection of remote to device. The implementation of this
step is specific to the user agent." and  "Synchronize the current media element
state with the remote playback state", the user agent may send the
remote-playback-start-request message to start remote playback.  The
remote-playback-start-request urls must contain the [=remote playback source=].
The current [[REMOTE-PLAYBACK|Remote Playback API]] only allows a single source,
but the protocol allows for several and future versions of
[[REMOTE-PLAYBACK|Remote Playback API]] may allow for several.

When
[[REMOTE-PLAYBACK#establishing-a-connection-with-a-remote-playback-device|section
6.2.4]] says "The mechanism that is used to connect the user agent with the
remote playback device and play the remote playback source is an implementation
choice of the user agent. The connection will likely have to provide a two-way
messaging abstraction capable of carrying media commands to the remote playback
device and receiving media playback state in order to keep the media element
state and remote playback state in sync", the user agent may send
remote-playback-modify-request messages to change the remote playback state
based on changes to the local media element and receive
remote-playback-modify-response and remote-playback-state-event messages to
change the local media element based on changes to the remote playback state.

Issue(158): Algorithm for what messages to send when local/remote media element changes.

When
[[REMOTE-PLAYBACK#establishing-a-connection-with-a-remote-playback-device|section
6.2.7]] says "Request disconnection of remote from the device. The
implementation of this step is specific to the user agent.", the controlling
user agent may send the remote-playback-termination-request message.


Streaming Protocol {#streaming-protocol}
========================================

This section defines the use of the Open Screen Protocol for streaming
media from a media sender to a media receiver.

Capabilities {#streaming-capabilities}
--------------------------------------------
If the advertiser is already authenticated, the requester has the ability to
request additional information by sending an streaming-capabilities-request
message, and receive back a streaming-capabilities-response message with the
following properties:

: receive-audio (required)
:: A list of capabilities for receiving audio. For an explanation of fields, see below.

: receive-video (required)
:: A list of capabilities for receiving video. For an explanation of fields, see below.


The format type is used as the basis for audio and video capabilities.
Formats are composed of the following properties:

<!-- TODO, specify where names are defined. -->

: name (required)
:: The name of the format. Expected values include "vp8", "h264", "opus."

<!-- TODO, specify where codec-specific parameters are defined. -->

: parameters (required)
:: A list of (key, value) parameters that can be used to pass fields that are
    properties of a specific format, and not shared by other formats of that type
    (audio, video, etc.).

Audio capabilities are composed of the above format type, with the following
additional fields:

: max-audio-channels (optional)
:: An optional field indicating the maximum amount of audio
    channels the receiver is capable of supporting. Default value is "2," meaning
    a stereo speaker channel setup.

: min-bit-rate (optional)
:: An optional field indicating the minimum audio bit rate that
    the receiver can handle, in kilobits per second. Default is no minimum.

Video capabilities are similarly composed of the above format type, with the
following additional fields:

: max-resolution (optional)
:: An optional field indicating the maximum video-resolution (width, height)
    that the receiver is capable of processing. Default is no maximum.

: max-frames-per-second (optional)
:: An optional field indicating the maximum frames-per-second the receiver is
    capable of processing. Default is no maximum.

: max-pixels-per-second (optional)
:: An optional field indicating the maximum pixels-per-second the receiver is
    capable of processing, in pixels per second. Default is no maximum.

: min-video-bit-rate (optional)
:: An optional field indicating the minimum video bit rate the device is
    capable of processing, in kilobits per second. Default is no minimum.

: aspect-ratio (optional)
:: An optional field indicating what its ideal aspect ratio is, e.g. a 16:10
    display could return this value as 1.6 to indicate its preferred content
    scaling. Default is none.

: color-profiles (optional)
:: An optional field indicating what color profiles are understood.
    The listener may use these values to determine how to encode
    video. Some examples include: sRGBv4, Rec709, DciP3. The default value is sRGBv4.

: native-resolutions (optional)
:: An optional field indicating what video-resolutions the receiver supports and
    considers to be "native," meaning that scaling is not required.
    The default value is none.

: supports-scaling (optional)
:: An optional boolean field indicating whether the receiver can scale content
    provided in a video-resolution not listed in the native-resolutions list
    (if provided) or of a different aspect ratio. The default value is true.

<!-- TODO: Add max-bit-rate, color profiles, HDR -->

Sessions {#streaming-sessions}
------------------------------------

To start a streaming session, a sender may send a
streaming-session-start-request message with the following properties:

: streaming-session-id
:: Identifies the streaming session.  Must be unique for the (sender,
    receiver) pair.  Can be used later to modify or terminate a
    streaming session.

: stats-interval
:: Indicates the frequency the receiver should send stats messages to
    the sender.

: stream-offers
:: Indicates the streams that the receiver can request from the sender.

Each stream offer contains the following properties:

: media-stream-id
:: Identifies the media stream b	eing offered.  Must be unique within
    the streaming session.  Can be used by the receiver to request the
    media session.

: friendly-name
:: An optional name intended to be shown to a user, such that the
    receiver may allow the user to choose which media streams to
    receive, or if they are received automatically by the receiver,
    give the user some information about what th e media stream is.

: audio
:: A list of audio encodings offered.  An audio encoding is a series
    of encoded audio frames.  Encodings define properties needed by
    the receiver to know how to decode the encoding, such as codec and
    sample rate.  They can differ by codec and related properties,
    but should be different encodings of the same audio.

: video
:: A list of video encodings offered.  A video encoding is a series of
    encoded video frames.  Encodings define properties needed by the
    receiver to know how to decode the encoding, such as codec and
    default duration.  They can differ by codec and potentially other
    properties, but should be different encodings of the same video.


Each audio encoding offered defines the following properties:

: encoding-id
:: Identifies the audio encoding being offered.  Must be unique within
    the media stream.  Can be the receiver to request an encoding.

: codec
:: The name of the codec used by the encoding.

: time-scale
:: The time scale used by all audio frames.  This allows senders to
    make audio-frame messages smaller by not including the time scale
    in each one.

: default-duration:
:: The duration of an audio frame .  This allows senders to make
    audio-frame messagse smaller by not including the duration for
    audio-frame messages that have the default duration.

Each video encoding offered defines the following properties:

: encoding-id
:: Identifies the video encoding being offered.  Must be unique within
    the media stream.  Can be the receiver to request an encoding.

: codec
:: The name of the codec used by the encoding.

: time-scale
:: The time scale used by all video frames.  This allows senders to
    make video-frame messages smaller by not including the time scale
    in each one.

: default-duration:
:: The default duration of a video frame .  This allows senders to make
    video-frame messagse smaller by not including the duration for
    video-frame messages that have the default duration.

: default-rotation:
:: The default rotation of a video frame .  This allows senders to make
    video-frame messagse smaller by not including the rotation for
    video-frame messages that have the default rotation.

After receiving a streaming-session-start-request message, a receiver
should send back a streaming-session-start-response message with the
following properties:

: stats-interval
:: Indicates the frequency the sender should send stats messages to
    the receiver.

: stream-requests
:: Indicates which media streams the receiver would like to receiver
    from the sender.

Each stream request contains the following properties:

: media-stream-id
:: The ID of the stream reqeusted.

: audio (optional)
:: The requested audio encoding, by encoding ID

: video (optional)
:: The requested video encoding, by encoding ID.  It may
    include a target resolution and maximum frame rate.  The sender
    should not exceed the maximum frame rate and should attempt to
    send at the target bitrate, possibly exceeding it by a small amount.


During a streaming session, the receiver can modify the requests it
made for encodings by sending a streaming-session-modify-request
containing a modify list of stream-requests.  When the sender receives
a streaming-session-modify-request, it should send back a
streaming-session-modify-response.

Finally, either sender may terminate the streaming session by sending
a streaming-session-terminate-request command.  When the receiver
receives the streaming-session-terminate-request, it should send back
a streaming-session-terminate-response.  The receiver can terminate at
any point and notify the sender by sending a
streaming-session-terminate-event message.

Audio {#streaming-audio}
------------------------------

Senders may send audio to receivers by sending audio-frame messages (see
[[#appendix-a]]) with the following keys and values.  An audio frame message
contains a set of encoded audio samples for a range of time. A series of
encoded audio frames that share a codec, codec parameters and a timeline form an
audio encoding.

Unlike most Open Screen Protocol messages, this one uses an
array-based grouping rather than a struct-based grouping.  For
required fields, this allows for a more efficient use of bytes on the
wire, which is important for streaming audio because the payload is
typically so small and every byte of overhead is relatively large.  In
order to accomodate optional values in the array-based grouping, one
optional field in the array is used to hold all optional values in a
struct-based grouping.  This will hopefully provide a good balance of
efficiency and flexibility.

To allow for audio frames to be sent out of order, they should be sent in
separate QUIC streams.

: encoding-id
:: Identifies the media encoding to which this audio frame belongs.  This can be
    used to reference properties of the encoding (from the
    audio-encoding-offer message) such as the codec, codec properties,
    time scale (aka clock rate or sample rate), and default duration.
    Referencing properties of the encoding through the encoding id
    helps to avoid sending duplicate information in every frame.

: start-time
:: Identifies the beginning of the time range of the audio frame.  The time
    scale is inferred from the properties of the encoding (from the
    audio-encoding-offer).  The end time can be inferred from the
    start time and duration.

: duration
:: If present, the duration of the audio frame.  The time
    scale is inferred from the properties of the encoding.  Likewise, if not
    present, the duration is inferred from the properties of the encoding.

: sync-time
:: If present, a time used to synchronize the start time of this audio frame (and
    thus, this encoding) with that of other media encodings on
    different timelines.  It may be wall clock time, but it need not
    be; it can be any clock chosen by the sender.

: payload
:: The data.  The type of data is inferred from the properties of the encoding.

Video {#streaming-video}
--------------------------------------------

Senders may send video to receivers by sending video-frame messages (see
[[#appendix-a]]) with the following keys and values.  A video frame message
contains an encoded video frame (an encoded image) at a specific point in time
or over a specfic time range (if the duration is known).  A series of encoded
video frames that share a codec, codec parameters and a timeline form a video
encoding.

To allow for video frames to be sent out of order, they may be sent in
separate QUIC streams.  If the encoding is a long chain of encoded video frames
dependent on the previous one back until an independent frame, it may make sense
to send them in a single QUIC stream starting at the indepdendent frame and
ending at the last dependent frame.

: encoding-id
:: Identifies the media encoding to which this video frame belongs.  This can be
    used to reference properties of the encoding such as the codec, codec
    properties, time scale, and default rotation.  Referencing properties of the
    encoding through the encoding id helps to avoid sending duplicate
    information in every frame.

: sequence-number
:: Identifies the frame and its order in the encoding.
    Within an encoding, larger sequence numbers mean later start times.
    Within an encoding, gaps in sequence numbers mean frames are missing.

: depends-on
:: If present, the sequence numbers of the frames this frame depends on.
    If a sequence numbers is negative, it is treated as a relative sequence numbers
    and the sequence numbers is calculated by adding it to the sequence number of this frame.
    If empty, this is an independent frame (a key frame).
    If not present, the default value is [-1].

: start-time
:: Identifies the beginning of the time range of the video frame.  The time
    scale is inferred from the properties of the encoding (from the
    video-encoding-offer).  The end time can be inferred from the
    start time and duration.

: duration
:: If present, the duration of the video frame.  The time
    scale is inferred from the properties of the encoding.  If not
    present, that means duration is unknown.

: sync-time
:: If present, a time used to synchronize the start time of this frame (and
    thus, this encoding) with that of other media encodings on different
    timelines.

: rotation
:: If present, indicates how the frame should be rotated after decoding but
    before rendering.  Rotation is clockwise in increments of 90 degrees.
    The default is 0 (no rotation).

: payload
:: The encoded video frame (encoded image).  The codec and codec parameters are
    inferred from the properties of the encoding.


Feedback {#streaming-feedback}
------------------------------------

The receiver can send feedback to the sender, such as key frame requests.

A video key frame is requested by sending a video-request message with
the following keys and values.

To allow for video frames to be sent out of order, they may be sent in separate
QUIC streams.

: encoding-id
:: The encoding for which the sender should send a new key frame.

: sequence-number
:: Gives the order in the encoding.
    Within an encoding, larger sequence numbers invalidate previous ones.
    A sender may ignore smaller sequence numbers after a larger one has been processed.
    This it to prevent out-of-order requests from generating more key frames than necessary.

: highest-decoded-frame-sequence-number: uint
:: If set, the sender may generate a video frame dependent on the last decoded
    frame.  If not set, the sender must generate an indepdendent (key) frame.

Stats {#streaming-stats}
------------------------------

Durings a streaming session, the sender should send stats with the
streaming-session-sender-stats-event at the interval the receiver
requested.  The streaming-session-sender-stats-event message contains
the following properties:

: streaming-session-id
:: The ID of the streaming session these stats apply to.

: system-time
:: The time when the stats were calculated, using a monotonic system
    clock.

: audio
:: Stats specific to audio.  Stats for multiple encodings can be sent
    at once, but encodings need not be included if the stats haven't
    changed.  See below.

: video
:: Stats specific to video.  Stats for multiple encodings can be sent
    at once, but encodings need not be included if the stats haven't
    changed.  See below.

Audio encoding sender stats include the following properties:

: encoding-id
:: The ID of the encoding for which the stats apply.

: cumulative-sent-frames
:: The total number of frames sent.

: cumulative-encode-delay
:: The sum of the time spent encoding frames sent.

Video encoding sender stats include the following properties:

: encoding-id
:: The ID of the encoding for which the stats apply.

: cumulative-sent-duration
:: The sum of all of the durations of all of the audio frames sent.

: cumulative-encode-delay
:: The sum of the time spent encoding frames sent.

: cumulative-dropped frames
:: The total number of frames that were not sent due to network, CPU,
    or other contraints.


Durings a streaming session, the receiver should send stats with the
streaming-session-receiver-stats-event at the interval the sender
requested.  The streaming-session-receiver-stats-event message contains
the following properties:

: streaming-session-id
:: The ID of the streaming session these stats apply to.

: system-time
:: The time when the stats were calculated, using a monotonic system
    clock.

: audio
:: Stats specific to audio.  Stats for multiple encodings can be sent
    at once, but encodings need not be included if the stats haven't
    changed.  See below.

: video
:: Stats specific to video.  Stats for multiple encodings can be sent
    at once, but encodings need not be included if the stats haven't
    changed.  See below.

Audio encoding receiver stats include the following properties:

: encoding-id
:: The ID of the encoding for which the stats apply.

: cumulative-decoded-frames
:: The total number of audio frames received and decoded.

: cumulative-received-duration
:: The sum of all of the durations of all of the audio frames received.

: cumulative-lost-duration
:: The sum of all of the durations of all of the audio frames detected as lost.

: cumulative-jitter-buffer-delay
:: The sum of the time frames spend in the jitter buffer.

: cumulative-decode-delay
:: The sum of the time spent decoding frames received.

Video encoding receiver stats include the following properties:

: encoding-id
:: The ID of the encoding for which the stats apply.

: cumulative-decoded-frames
:: The total number of video frames received and decoded.

: cumulative-lost-frames
:: The total number of video frames detected as lost.

: cumulative-jitter-buffer-delay
:: The sum of the time frames spent in the jitter buffer.

: cumulative-decode-delay
:: The sum of the time spent decoding frames received.


Security and Privacy {#security-privacy}
====================

The Open Screen Protocol allows two networked agents to discover each other
and exchange user and application data.  As such, its security and privacy
considerations should be closely examined.  We first evaluate the protocol
itself using the W3C [[SECURITY-PRIVACY-QUESTIONNAIRE|Security and Privacy
Questionnaire]].  We then examine whether the security and privacy guidelines
recommended by the [[PRESENTATION-API|Presentation API]] and the
[[REMOTE-PLAYBACK|Remote Playback API]] are met.  Finally we discuss recommended
mitigations that agents can use to meet these security and privacy
requirements.

Threat Models {#threat-models}
--------------------------------

### Passive Network Attackers ### {#passive-network-attackers}

The Open Screen Protocol should assume that all parties that are connected to
the same LAN, either through a wired connection or through WiFi, are able to
observe all data flowing between Open Screen Protocol agents.

These parties will be able collect any data exposed through unencrypted
messages, such as mDNS records and the QUIC handshakes.

These parties may attempt to learn cryptographic parameters by observing data
flows on the QUIC connection, or by observing cryptographic timing.

### Active Network Attackers ### {#active-network-attackers}

Active attackers, such as compromised routers, will be able to manipulate data
exchanged between agents.  They can inject traffic into existing QUIC
connections and attempt to initiate new QUIC connections.  These abilities can
be used to attempt the following:

*   Impersonate an agent or one already trusted by the user, in an attempt
    to convince the user to authenticate to it.
*   Connect to an agent and query its capabilities.
*   Connect to and control a presentation or remote playback, or extract data
    from the application state of the presentation or remote playback.

One particular attack of concern is misconfigured or compromised routers that
expose local network devices (such as Open Screen Protocol agents) to the
Internet.  This vector of attack has been used by malicious parties to take
control of printers and smart TVs by connecting to local network services that
would normally be inaccessible from the Internet.

### Denial of Service ### {#denial-of-service}

Parties with connected to the LAN may attempt to deny access to Open Screen
Protocol agents.  For example, an attacker my attempt to open
a large number of QUIC connections to an agent in an attempt to block
legitimate connections or exhaust the agent's system resources.  They may
also multicast spurious DNS-SD records in an attempt to exhaust the cache
capacity for mDNS listeners, or to get listeners to open a large number of bogus
QUIC connections.

### Same-Origin Policy Violations ### {#same-origin-policy-violations}

The Presentation API allows cross-origin communication between controlling pages
and presentations with the consent of each origin (through their use of the
API).  This is similar to cross-origin communication via
{{Window/postMessage(message, targetOrigin, transfer)|postMessage()}} with a
`targetOrigin` of `*`.  However, the Presentation API does not convey source
origin information with each message.  Therefore, the Open Screen Protocol does
not convey origin information between its agents.

The [=presentation ID=] carries some protection against unrestricted
cross-origin access; but, rigorous authentication of the parties connected by a
{{PresentationConnection}} must be done at the application level.

Open Screen Protocol Security and Privacy Considerations {#security-privacy-questions}
-----------------------------------

### Personally Identifiable Information & High-Value Data ### {#personally-identifiable-information}

The following data exchanged by the protocol can be personally identifiable
and/or high value data:

1. Presentation URLs and availability results
1. Presentation IDs
1. Presentation connection IDs
1. Presentation connection messages
1. Remote playback URLs
1. Remote playback commands and status messages

Presentation IDs are considered high value data because they can be used in
conjunction with a Presentation URL to connect to a running presentation.

Presentation display friendly names, model names, and capabilities, while not
considered personally identifiable, are important to protect to prevent an
attacker from changing them or substituting other values during the discovery
and authentication process.

The following data cannot be reasonably made confidential and should be
considered public and untrusted data:

1. IP addresses and ports used by the Open Screen Protocol.
1. Data advertised through mDNS, including the display name prefix, the
    certificate fingerprint, and the metadata version.

### Cross Origin State Considerations ### {#cross-origin-state}

Access to origin state across browsing sessions is possible through the
Presentation API by reconnecting to a presentation that was started by a
previous session. This scenario is addressed in
[[PRESENTATION-API#cross-origin-access]].

Presentation display availability and remote playback device availability are
states that are available cross-origin depending on the user's network
context.  Exposure of this data to the Web is also discussed in
[[PRESENTATION-API#personally-identifiable-information]] and
[[REMOTE-PLAYBACK#personally-identifiable-information]].

### Origin Access to Other Devices ### {#origin-access-devices}

By design, the Open Screen Protocol allows access to presentation displays and
remote playback devices from the Web.  By implementing the protocol, these
devices are knowingly making themselves available to the Web and should be
designed accordingly.

Below, we discuss mitigation steps to prevent malicious use of these devices.

### Incognito Mode ### {#incognito-mode}

The Open Screen Protocol does not distinguish between the user agent's normal
browsing and incognito modes, and agents that follow the specification
behave identically regardless of which mode is in use.

It's recommended that user agents use separate authentication contexts and QUIC
connections for normal and incognito profiles from the same user agent instance.
This prevents Open Screen agents from correlating activity among profiles
belonging to the same user (both normal and incognito).

### Persistent State ### {#persistent-state}

An agent is likely to persist the identity of agents that have successfully
completed [[#authentication]].  This may include the public key fingerprints,
metadata versions, and metadata for those parties.

However, this data is not normally exposed to the Web, only through the native
UI of the user agent during the display selection or display authentication
process.  It can be an implementation choice whether the user agent clears or
retains this data when the user clears browsing data.

Issue(132): Fate of metadata / authentication history when clearing browsing data.

### Other Considerations ### {#other-considerations}

The Open Screen Protocol does not grant to the Web additional access to the
following:

* New script loading mechanisms
* Access to the user's location
* Access to device sensors
* Access to the user's local computing environment
* Control over the user agent's native UI
* Security characteristics of the user agent

Presentation API Considerations {#presentation-api-considerations}
-------------------------------

[[PRESENTATION-API#security-and-privacy-considerations]] place these
requirements on the Open Screen Protocol:

1.  Presentation URLs and presentation IDs should remain private among the
    parties that are allowed to connect to a presentation, per the
    cross-origin access guidelines.
1.  Controllers and receivers should be notified when connections representing
    multiple user agent profiles have been made to a presentation, per the user
    interface guidelines.
1.  Messaging between controllers and receivers should be authenticated and
    confidential, per the guidelines for messaging between presentation
    connections.

The Open Screen Protocol addresses these considerations by:

1. Requiring mutual authentication and a TLS-secured QUIC connection before
     presentation URLs, IDs, or messages are exchanged.
1. Adding explicit messages and connection IDs for individual
     {{PresentationConnection|PresentationConnections}} so that agents can track
     the number of active connections.

Issue(143): Notify endpoints when new connection is created.

Remote Playback API Considerations {#remote-playback-considerations}
----------------------------------

The [[REMOTE-PLAYBACK#security-and-privacy-considerations]] also state that
messaging between local and remote playback devices should also be authenticated
and confidential.

This consideration is handled by requiring mutual authentication and a
TLS-secured QUIC connection before any remote playback related messages are
exchanged.

Mitigation Strategies {#security-mitigations}
--------------------------------------------

### Local passive network attackers ### {#local-passive-mitigations}

Local passive attackers may attempt to harvest data about user activities and
device capabilities using the Open Screen Protocol.  The main strategy to address
this is data minimization, by only exposing opaque public key fingerprints
before user-mediated authentication takes place.

Passive attackers may also attempt timing attacks to learn the
cryptographic parameters of the TLS 1.3 QUIC connection.

Issue(130): Review attack and mitigation considerations for TLS 1.3

### Local active network attackers ### {#local-active-mitigations}

Local active attackers may attempt to impersonate a presentation display the
user would normally trust.  The [[#authentication]] step of the Open Screen
Protocol prevents a man-in-the-middle from impersonating an agent, without
knowledge of a shared secret.  However, it is possible for an attacker to
impersonate an existing, trusted display or a newly discovered display that is
not yet authenticated and try to convince the user to authenticate it.

This can be addressed through a combination of techniques.  The first is
detecting and flagging attempts at impersonation; a few of the situations that
should be flagged include:

* Untrusted agents whose public key fingerprint collides with that from an
    already-trusted agent that is concurrently being advertised.
* Untrusted agents whose friendly name differs from the one previously
    advertised under a given public key fingerprint.
* Untrusted agents that fail the authentication challenge a certain number of times.
* Untrusted agents that advertise a friendly name that is similar to that from an
    already-trusted agent.
* Already-trusted agents whose metadata provided through the `agent-info`
    message has changed.

Flagging means that the user is notified of the attempt at impersonation.  In
the last case, the user should be required to re-authenticate to the
already-trusted agent to verify its identity.

Issue(118): UI guidelines for pairing and trusted/untrusted data.

The second is through management of the low-entropy secret during mutual
authentication:

* Rotate the low-entropy secret to prevent brute force attacks.
* Use an increasing backoff to respond to authentication challenges, also to
    prevent brute force attacks.
* Use a cryptographically sound source of entropy to generate the shared secret.
* Require the end user to manually type the shared secret - shown only on the
    display - to prevent the user from blindly clicking through this step.

The active attacker may also attempt to disrupt data exchanged over the QUIC
connection by injecting or modifying traffic.  These attacks should be mitigated
by a correct implementation of TLS 1.3.

Issue(130): Review attack and mitigation considerations for TLS 1.3

### Remote active network attackers ### {#remote-active-mitigations}

Unfortunately, we cannot rely on network devices to fully protect Open Screen
Protocol agents, because a misconfigured firewall or NAT could expose a
LAN-connected agent to the broader Internet.  Open Screen Protocol agents
should be secure against attack from any Internet host.

Issue(131): Mitigations for remote network attackers.

### Denial of service ### {#denial-of-service-mitigations}

It will be difficult to completely prevent denial service of attacks that
originate on the user's local area network.  Open Screen Protocol agents can
refuse new connections, close connections that receive too many messages, or
limit the number of mDNS records cached from a specific responder in an attempt
to allow existing activities to continue in spite of such an attack.

### Malicious input ### {#malicious-input-mitigations}

Open Screen Protocol agents should be robust against malicious input that
attempts to compromise the target device by exploiting parsing vulnerabilities.

CBOR is intended to be less vulnerable to such attacks relative to alternatives
like JSON and XML.  Still, agents should be thoroughly tested using approaches
like [fuzz testing](https://en.wikipedia.org/wiki/Fuzzing).

Where possible, Open Screen Protocol agents (including the content rendering
components) should use defense-in-depth techniques like <a
href="https://en.wikipedia.org/wiki/Sandbox_(computer_security)">sandboxing</a>
to prevent vulnerabilities from gaining access to user data or leading to
persistent exploits.

Appendix A: Messages {#appendix-a}
====================

The following messages are defined with [[CDDL]]. When
integer keys are used, a comment is appended to the line to indicate
the name of the field. Object definitions in this specification have this
unusual syntax to reduce the number of bytes-on-the-wire, while maintaining a
human-readable name for each key. Integer keys are used instead of object arrays
to allow for easy indexing of optional fields.

Each root message (one that can be put into a
QUIC stream without being enclosed by another message) has a comment
indicating the message type key.

Smaller numbers should be reserved for message that will be sent more
frequently or are very small or both and larger numbers should be
reserved for messages that are infrequently sent or large or both
because smaller type keys encode on the wire smaller.

<pre class=include>
path: messages_appendix.html
</pre>

<pre class=include>
path: code-style.html
</pre>