index.bs

<pre class='metadata'>
Title: Open Screen Protocol
Shortname: openscreenprotocol
Level: 1
Status: w3c/ED
ED: https://webscreens.github.io/openscreenprotocol/
Canonical URL: ED
Editor: Mark Foltz, Google, https://github.com/mfoltzgoogle, w3cid 68454
Repository: webscreens/openscreenprotocol
Abstract: The Open Screen Protocol is a suite of network protocols that allow user agents to implement the [[PRESENTATION-API|Presentation API]] and the [[REMOTE-PLAYBACK|Remote Playback API]] in an interoperable fashion.
Group: Second Screen Community Group
Mailing List: public-webscreens@w3c.org
Mailing List Archives: https://lists.w3.org/Archives/Public/public-webscreens/
Markup Shorthands: markdown yes, dfn yes, idl yes
</pre>

<p boilerplate="copyright">
<a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © [YEAR] the Contributors to the [TITLE] Specification, published by the <a href="https://www.w3.org/community/webscreens/">Second Screen Community Group</a> under the <a href="https://www.w3.org/community/about/agreements/cla/">W3C Community Contributor License Agreement (CLA)</a>.
A human-readable <a href="http://www.w3.org/community/about/agreements/cla-deed/">summary</a> is available.
</p>

<!-- TODO: Add short names to Presentation API spec, so that BS autolinking works as designed. -->
<!-- TODO: Can autolinks to HTML51 be automatically generated? -->
<pre class="anchors">
urlPrefix: https://w3c.github.io/presentation-api/#dfn-; type: dfn; spec: PRESENTATION-API
    text: available presentation display
    text: controller
    text: controlling user agent
    text: controlling browsing context
    text: presentation
    text: presentation display
    text: presentation display availability
    text: presentation id
    text: presentation request url
    text: receiver
    text: receiving browsing context
    text: receiving user agent
urlPrefix: https://w3c.github.io/presentation-api/; type: interface; spec: PRESENTATION-API
    text: PresentationConnection
urlPrefix: https://w3c.github.io/remote-playback/#dfn-; type: dfn; spec: REMOTE-PLAYBACK
    text: remote playback device
urlPrefix: https://www.w3.org/TR/html51/single-page.html; type: dfn; spec: HTML51
    text: media element
</pre>

<h2 class='no-num no-toc no-ref' id='status'>Status of this document</h2>

This specification was published by the [Second Screen Community
Group](https://www.w3.org/community/webscreens/). It is not a W3C Standard nor
is it on the W3C Standards Track. It should not be viewed as a stable
specification, and may change in substantial ways at any time. A future version
of this document will be published as a Community Group Report.

Please note that under the [W3C Community Contributor License Agreement
(CLA)](https://www.w3.org/community/about/agreements/cla/) there is a limited
opt-out and other conditions apply.

Learn more about [W3C Community and Business
Groups](http://www.w3.org/community/).

Introduction {#introduction}
============================

The Open Screen Protocol connects browsers to devices capable of rendering Web
content for a shared audience.  Typically, these are devices like
Internet-connected TVs, HDMI dongles, or "smart" speakers.

The protocol is a suite of subsidiary network protocols that enable two user
agents to implement the [[PRESENTATION-API|Presentation API]] and
[[REMOTE-PLAYBACK|Remote Playback API]] in an interoperable fashion.  This means
that a user can expect these APIs work as intended when connecting two devices
from independent implementations of the Open Screen Protocol.

The Open Screen Protocol is a specific implementation of these two APIs, meaning
that it does not handle all possible ways that browsers and presentation
displays could support these APIs.  The Open Screen Protocol specifically
supports browsers and displays that are connected via the same local area
network, and that initiate presentation or remote playback by sending a URL
from the browser to the target display.

The Open Screen Protocol is intended to be extensible, so that additional
capabilities can be added over time.  This may include new implementations of
existing APIs, or new APIs.

Terminology {#terminology}
--------------------------

We borrow terminology from the [[PRESENTATION-API|Presentation API]] and
[[REMOTE-PLAYBACK|Remote Playback API]] for terms used in this document.  These
terms are summarized here.

We call the browser that is used to discover and initiate presentation of Web
content on another device the [=controlling user agent=].  We call the
user agent on the device rendering the Web content the
[=receiving user agent=], or *receiver* for short.  We use
the term [=presentation display=] to refer to the entire platform and
responsible for implementing the *receiver*, including browser, OS, networking,
audio and graphics.

For the [[PRESENTATION-API|Presentation API]], presentation of Web content is
initiated at the request of a [=controlling browsing context=] (or
*controller*), which creates a [=receiving browsing context=] (or
*presentation*) to load a [=presentation request URL=] and exchange messages
with the resulting document.

Before this can happen, the [=controlling user agent=] must determine which
[=receivers=], if any, are compatible with the [=presentation request URL=].  This
happens by determining the [=presentation display availability=] for the
presentation request URL.

For the [[REMOTE-PLAYBACK|Remote Playback API]], the device responsible for
rendering the content of a [=media element=] when remote playback is connected
is called the [=remote playback device=].

For additional terms and idioms specific to the [[PRESENTATION-API|Presentation API]] or
Remote Playback API, please consult the respective specifications.

We also use the term "agent" to mean any implementation of this protocol,
browser, device, or otherwise, acting as a controller or a receiver.

Requirements {#requirements}
============================

Presentation API Requirements {#requirements-presentation-api}
--------------------------------------------------------------

1.  A controlling user agent must be able to discover the presence of a
    presentation display connected to the same IPv4 or IPv6 subnet and reachable
    by IP multicast.

2.  A controlling user agent must be able to obtain the IPv4 or IPv6 address of
    the display, a friendly name for the display, and an IP port number for
    establishing a network transport to the display.

3.  A controlling user agent must be able to determine if the receiver is
    reasonably capable of rendering a specific [=presentation request URL=].

4.  A controlling user agent must be able to start a new presentation on a receiver given a
    [=presentation request URL=] and [=presentation ID=].

5.  A controlling user agent must be able to create a new
    {{PresentationConnection}} to an existing presentation on the
    receiver, given its [=presentation request URL=] and [=presentation ID=].

6.  It must be possible to to close a {{PresentationConnection}} between a
    controller and a presentation, and signal both parties with the
    reason why the connection was closed.

7.  Multiple controllers must be able to connect to a single presentation
    simultaneously, possibly from from one or more [=controlling user agents=].

8.  Messages sent by the controller must be delivered to the presentation (or
    vice versa) in a reliable and in-order fashion.

9.  If a message cannot be delivered, then the controlling user agent must be
    able to signal the receiver (or vice versa) that the connection should be
    closed with reason `error`.

10. The controller and presentation must be able to send and receive `DOMString`
    messages (represented as `string` type in ECMAScript).

11. The controller and presentation must be able to send and receive binary
    messages (represented as `Blob` objects in HTML5, or `ArrayBuffer` or
    `ArrayBufferView` types in ECMAScript).

12. The controlling user agent must be able to signal to the receiver to
    terminate a presentation, given its [=presentation request URL=] and [=presentation
    ID=].

13. The receiver must be able to signal all connected controlling user agents
    when a presentation is terminated.


Remote Playback API Requirements {#requirements-remote-playback}
----------------------------------------------------------------

Issue(3): Requirements for Remote Playback API

Non-Functional Requirements {#requirements-non-functional}
----------------------------------------------------------

1.  It should be possible to implement an Open Screen presentation display using
    modest hardware requirements, similar to what is found in a low end
    smartphone, smart TV or streaming device. See the [Device
    Specifications](device_specs.md) document for expected presentation display
    hardware specifications.

2.  It should be possible to implement an Open Screen controlling user agent on a
    low-end smartphone. See the [Device Specifications](device_specs.md) document
    for expected controlling user agent hardware specifications.

3.  The discovery and connection protocols should minimize power consumption,
    especially on the controlling user agent which is likely to be battery
    powered.

4.  The protocol should minimize the amount of information provided to a passive
    network observer about the identity of the user, activity on the controlling
    user agent and activity on the receiver.

5.  The protocol should prevent passive network eavesdroppers from learning
    presentation URLs, presentation IDs, or the content of presentation messages
    passed between controllers and presentations.

6.  The protocol should prevent active network attackers from impersonating a
    display and observing or altering data intended for the controller or
    presentation.

7.  The controlling user agent should be able to discover quickly when a
    presentation display becomes available or unavailable (i.e., when it connects
    or disconnects from the network).

8.  The controlling user agent should present sensible information to the user
    when a protocol operation fails.  For example, if a controlling user agent is
    unable to start a presentation, it should be possible to report in the
    controlling user agent interface if it was a network error, authentication
    error, or the presentation content failed to load.

9.  The controlling user agent should be able to remember authenticated
    presentation displays.  This means it is not required for the user to
    intervene and re-authenticate each time the controlling user agent connects
    to a pre-authenticated display.

10.  Message latency between the controller and a presentation should be minimized
    to permit interactive use.  For example, it should be comfortable to type in
    a form in the controller and have the text appear in the presentation in real
    time.  Real-time latency for gaming or mouse use is ideal, but not a
    requirement.

11. The controlling user agent initiating a presentation should communicate its
    preferred locale to the receiver, so it can render the presentation content
    in that locale.

12. It should be possible to extend the control protocol (above the discovery and
    transport levels) with optional features not defined explicitly by the
    specification, to facilitate experimentation and enhancement of the base
    APIs.


Discovery with mDNS {#discovery}
===============================

Agents may discover one another using [[RFC6763|DNS-SD]] over [[RFC6762|mDNS]].
To do so, agents must use the service name "_openscreen._udp.local".

Advertising Agents must use an instance name that is a prefix of the agent's
display name. If the instance name is not the complete display name (if it has
been truncated), it must be terminated by a null character.  It is prefix so
that the name displayed to the user pre-verification can be verified later.  It
is terminated by a null character in the case of truncation so that the
listening agent knows it has been truncated.  This complexity is necessary to
all for display names that exceed the size allowed in an instance name and for
such (possibly  truncated) display names to be visible to the user sooner
(before a QUIC connection is made).  Listening agents must treat instance names
as unverified and must verify that the instance name is a prefix of the verified
display name before showing the user a verified display name.

Advertising agents must include DNS TXT records with the following
keys and values:

-   key "fp" with value of the certificate fingerprint of the advertising agent.
    The format of the fingerprint is defined by [RFC 8122 section
    5](https://tools.ietf.org/html/rfc8122#section-5), excluding the
    "fingerprint:" prefix and including the hash function, space, and hex-encoded
    fingerprint.  The fingerprint value also functions as an ID for the agent.
    All agents must support the following hash functions: "sha-256", "sha-512".
    Agents must not support the following hash functions: "md2", "md5".

  <!-- TODO: include cross references to the specs for these hash functions. -->

-   key "mv" with an unsigned integer value that indicates that
    metadata has changed.   The advertising agent must update it to a greater
    value.  This signals to the listening agent that it should connect to the
    advertising agent to discover updated metadata.

  <!-- TODO: Add examples of sample mDNS records. -->


Future extensions to this QUIC-based protocol can use the same metadata
discovery process to indicate support for those extensions, through a
capabilities mechanism to be determined. If a future version of the Open Screen
Protocol uses mDNS but breaks compatibility with the metadata discovery process,
it should change the DNS-SD service name to a new value, indicating a new
mechanism for metadata discovery.


Transport and metadata discovery with QUIC {#transport}
=======================================================

If a listening agent wants to connect to an advertising agent, or to
learn further metadata about it, it initiates a [[!QUIC]] connection to
the IP and port from the SRV record.  Prior to authentication, a message may be
exchanged (such as further metadata), but such info should be treated as
unverified (such as indicating to a user that a display name of an
unauthenticated agent is unverified).

To learn further metadata, an agent may send an agent-info-request
message (see [[#appendix-a]]) and receive back an agent-info-response message.  The
messages may contain the following information with the following meaning:

-   display-name: (required) The display name of the responding agent intended
    to be displayed to a user by the requesting agent.  If the responding agent
    is not yet authenticated, the requesting agent should make UI affordance for
    indicating to the user that the display name is not yet verified.  If the
    responding agent changes its display name, the requesting agent should
    make UI affordance for indicating to the user that the display name has
    changed.

-   model-name: (optional) If the agent is a hardware device, the model name of
    the device.  This is used mainly for debugging purposes, but may be
    displayed to the user of the requesting agent.

<!-- TODO: Add device type and/or capabilities -->

Listening agents act as QUIC clients.  Advertising agents act as QUIC servers.

If a listening agent wishes to receive messages from an advertising agent or an
advertising agent wishes to send messages to a listening agent, it may wish to
keep the QUIC connection alive.  Once neither side needs to keep the connection
alive for the purposes of sending or receiving messages, the connection should
be closed with an error code of 5139.  In order to keep a QUIC connection alive, an
agent may send an agent-status-request message, and any agent that receives an
agent-status-request message should send an agent-status-response message. Such
messages should be sent more frequently than the QUIC idle_timeout transport
parameter (see section 18 of [[!QUIC]]) and QUIC PING
frames should not be used.  An idle_timeout transport parameter of 25 seconds is
recommended.  The agent should behave as though a timer less than the
idle_timeout were reset every time a message is sent on a QUIC stream.  If the
timer expires, a agent-status-request message should be sent.


If a client agent wishes to send messages to a server agent, the client
agent can connect to the server agent "on demand"; it does not need to
keep the connection alive.

The agent-info-response message and agent-status-response
messages may be extended to include additional information not defined
in this spec.  If done ad-hoc by applications and not in future specs,
keys should be chosen to avoid collision, such as by choosing large
integers or long strings.  Agents must ignore keys in the
agent-info-message that it does not understand to allow agents
to easily extend this message.

Messages delivery using CBOR and QUIC streams {#control}
========================================================

Messages are serialized using [[!RFC7049|CBOR]].  To
send a group of messages in order, that group of messages must be sent in one
QUIC stream.  Independent groups of messages (with no ordering dependency
across groups) should be sent in different QUIC streams.  In order to put
multiple CBOR-serialized messages into the the same QUIC stream, the following
is used.

For each message, the sender must write to the QUIC stream the following:

1.  A type key representing the type of the message, encoded as a variable-length
    integer (see [[#appendix-a]] for type keys)

2.  The message length encoded as a variable-length integer

3.  The message encoded as CBOR (whose length must match the value in step 2)

If an agent receives a message for which it does not recognize a
type key, it must close the QUIC connection with an application error
code of 404 and should include the unknown type key in the reason phrase
(see [[!QUIC]] section 19.4).

Variable-length integers are encoded in the same format as defined by [QUIC
transport section
16](https://tools.ietf.org/html/draft-ietf-quic-transport-16#section-16).

Many messages are requests and responses, so a common format is defined for
those.  A request and a response includes a request ID which is an unsigned
integer chosen by the requester.  Responses must include the request ID of the
request they are associated with.

Authentication {#authentication}
================================

In order for one agent (the challenger) to authenticate another (the responder),
the challenger may send an authentication-request message and expect an
authentication-response message to be sent back from the responder.  To
mutually authenticate, this mechanism is used twice, once by each side acting as
the challenger.  This mechanism assumes the agents share a low-entropy secret,
such as a number or a short password that could be entered by a user on a
keyboard or TV remote control.

For all messages and objects defined in this section, see Appendix A for the full
CDDL definitions.

The challenger sends an authentication-request message with the following values:

-   mechanism: The authentication mechanism being used.  This standard only
    defines the mechanism hkdf-of-scrypt-of-psk but this field gives a place for
    other mechanisms to be specified.

-   salt: 32 random bytes.  This salt is used in HKDF, so see
    https://tools.ietf.org/html/rfc5869#section-3.1 for more details on how this
    value should be generated.

-   cost: log base 2 of the cost parameter (N) for scrypt defined in [RFC
    7914 section 2](https://tools.ietf.org/html/rfc7914#section-2).  It must be
    greater than or equal to 14 (to avoid being too weak) and less than or equal
    to 128 (the limit defined by scrypt).  A value of 15 is recommended (an
    scrypt N of 2^15 or 32768).

The responder replies with an authentication-response message with the following values:

-   result:  If the responder was able to calculate proof of possession of the
    shared secret, and if it failed, why it failed.

-   proof: The result of running the authentication mechanism.  The steps for
    hkdf-of-scrypt-of-psk are described below.

The challenger verifies the proof and sends the responder an
authentication-result message with the following values:

-   result:  If the challenger was able to authenticate the responder or not,
    and if not, why not.

The challenger must limit the time the responder has to send a response to 60
seconds (to avoid the possibility of brute-force attacks.)


For hkdf-of-scrypt-of-psk, the proof is calculated using the following steps:

1. Let secret be the pre-shared secret.

2. Let N be 2 to the power of of the cost from the authentication-request
     message.

3. Let r be 8.

4. Let p be 1.

5. Let keyLength be 32.

6. Let scryptResult be the result of running
     [scrypt](https://tools.ietf.org/html/rfc7914) on secret with cost parameter N,
     block size r, parallelization parameter p, and derived key length of
     keyLength.

7. Let hashFunction be sha-256.

8. Let salt be the salt from the authentication-request message.

9. Let info be a CBOR-serialized certificate-fingerprint-pair object (CDDL
     defined in Appendix A) with the following values:

-   challenger-fingerprint: The result of running sha-256 on the
    Distinguished Encoding Rules (DER) form (see
    https://tools.ietf.org/html/rfc8122#section-5) of the certificate used by
    the challenger in the QUIC crypto handshake during connection establishment.

-   responder-fingerprint: The result of running sha-256 on the
    Distinguished Encoding Rules (DER) form (see
    https://tools.ietf.org/html/rfc8122#section-5) of the certificate used by
    the responder in the QUIC crypto handshake during connection establishment.

9. Let proof be the result of running
     [\HKDF](https://tools.ietf.org/html/rfc5869) on scryptResult with 
     both the extract and expand steps, hash function hashFunction,
     application-specific info, and output key length keyLength.

To verify that the responder's proof is correct, the challenger makes the same
calculation of the proof and compares the result. If the results are the same,
the challenger considers the responder authenticated, and considers it
unauthenticated otherwise.  

Note: the values of 32 above (for salt length, keyLength) are based on the
output size of sha-256.  If a different hash mechanism is used in the future,
these values should be updated as well.


Control Protocols {#control-protocols}
============================

Presentation Protocol {#presentation-protocol}
---------------------------------------------

This section defines the use of the Open Screen Protocol for starting,
stopping, and controlling presentations as defined by
[[PRESENTATION-API|Presentation API]].  A subsequent section will
define how APIs in [[PRESENTATION-API|Presentation API]] map to the
protocol messages defined in this section.

For all messages defined in this section, see [[#appendix-a]] for the full
CDDL definitions.

<!-- TODO: Add a capability that indicates support for the
presentation protocol.
See https://github.com/webscreens/openscreenprotocol/issues/123 -->

To learn which receivers are [=available presentation displays=] for a
particular URL or set of URLs, the controller may send a
presentation-url-availability-request message with the following values:

-   urls: A list of presentation URLs.  Must not be empty.

-   watch-duration: The period of time that the controller is interested in
    receiving updates about the URLs, should the availability change.

-   watch-id: An identifier the receiver may use when sending updates about URL
    availability so that controller knows which URLs the receiver is referring
    to.

In response, the receiver should send one presentation-url-availability-response
message with the following values:

-   url-availabilities: A list of URL availability states (available,
    unavailable, or invalid).  Each state must correspond to the matching URL
    from the request by list index.


The receivers should later (up to the current time plus request
watch-duration) send presentation-url-availability-event messages if
URL availabilities change.  Such events contain the following values:

-   watch-id: The watch-id given in the presentation-url-availability-response,
    used to refer to the presentation URLs whose availability has changed.

-   url-availabilities: A list of URL availability states (available,
    unavailable, or invalid).  Each state must correspond to the URLs from the
    request referred to by the watch-id.

Note that these messages are not broadcasted to all controllers. They are sent
individually to controllers that have requested availability for the URLs that
have changed in availability state within the watch duration of the original
availability request.


To save power, the controller may disconnect the QUIC connection and
later reconnect to send availablity requests and receive availability
responses and updates.


To start a presentation, the controller may send a
presentation-start-request message to the receiver with the following
values:

-   presentation-id: the presentation identifier

-   url: the selected presentation URL

-   headers: headers that the receiver should use to fetch the
    presentationUrl.  For example, section 6.6.1 of
    [[PRESENTATION-API|Presentation API]] says that the Accept-Language
    header should be provided.

The presentation ID must follow the restrictions defined by
[[PRESENTATION-API|Presentation API]] section 6.1, in that it must
consist of at least 16 ASCII characters.


When the receiver receives the presentation-start-request, it should send back a
presentation-start-response message after either the presentation URL has been
fetched and loaded, or the receiver has failed to do so. If it has failed, it
must respond with the appropriate result (such as invalid-url or timeout).  If
it has succeeded, it must reply with a success result.  Additionally, the
response must include the following:

-   connection-id: An ID that both agents can use to send connection messages
    to each other.  It is chosen by the receiver for ease of implementation: if
    the message receiver chooses the connection-id, it may keep the ID unique
    across connections, thus making message demuxing/routing easier.

<!-- TODO: Add optional HTTP response code to the response? -->

To send a presentation message, the controller or receiver may send a
presentation-connection-message with the following values:

-   connection-id: The ID from the presentation-start-response or
    presentation-connection-open-response messages.

-   message: the presentation message data.


To terminate a presentation, the controller may send a
presentation-termination-request message with the following values:

-   presentation-id: The ID of the presentation to terminate.

-   reason: The reason the presentation is being terminated.


When a receiving agent receives a presentation-termination-request, it should
send back a presentation-termination-response message to the requesting
agent.  It should also notify other controllers about the termination by sending
a presentation-termination-event message.  And it can send the same message if
it terminates a presentation without a request from a controller to do so. This
message contains the following values:

-   presentation-id: The ID of the presentation that was terminated.

-   reason: The reason the presentation was terminated.

<!-- TODO: Split up reason into reason and whether it was triggered by the user
or not? -->


To accept incoming connections requests from controller, a receiver
must receive and process the presentation-connection-open-request
message which contains the following values:

-   presentation-id: The ID of the presentation to connect to.

-   url: The URL of the presentation to connect to. 

The receiver should, upon receipt of a
presentation-connection-open-request message, send back a
presentation-connection-open-response message which contains the
following values:

-   result: a code indicating success or failure, and the reason for the failure

-   connection-id: An ID that both agents can use to send connection messages
    to each other.  It is chosen by the receiver for ease of implementation (if
    the message receiver chooses the connection-id, it may keep the ID unique
    across connections, thus making message demuxing/routing easier).


A controller may terminate a connection without terminating the presentation by
sending a presentation-connection-close-request message with the following
values:

-   connection-id: The ID of the connection to close.


The receiver should, upon receipt of a presentation-connection-close-request,
send back a presentation-connection-close-response message with the following
values:

-   result: If the close succeed or failed, and if it failed why it failed.

The receiver may also close a connection without a request from the controller
to do so and without terminating a presentation.  If it does so, it should send
a presentation-connection-close-event to the controller with the following
values:

-   connection-id: The ID of the connection that was closed

-   reason: The reason the connection was closed

-   error-message: A debug message suitable for a log or perhaps presented to
    the user with more explanation as to why it was closed.

<!-- TODO: Why does the Presentation API spec not mention the use of the close
message? -->

<!-- TODO: Specify message ordering groups. -->


Presentation API {#presentation-api}
---------------------------------------------

This section defines how [[PRESENTATION-API|Presentation API]] uses
the messages defined in the previous sections.
Non-browser agents can also send and receive the same messages defined here.
If so, a non-browser agent must follow the same restrictions for the
presentation-id as the a does, as defined by [[PRESENTATION-API|Presentation
API]] section 6.1 (at least 16 ASCII characters).

When [[PRESENTATION-API|Presentation API]] [section
6.4.2](https://www.w3.org/TR/presentation-api/#sending-a-message-through-presentationconnection)
says "This list of presentation displays ... is populated based on an
implementation specific discovery mechanism", the [=controlling user
agent=] may use the mDNS, QUIC, agent-info-request, and
presentation-url-availability-request messages defined previously in
this spec to discover receivers.

When [[PRESENTATION-API|Presentation API]] [section
6.4.2](https://www.w3.org/TR/presentation-api/#the-list-of-available-presentation-displays)
says "To further save power, ... implementation specific discovery of
presentation displays can be resumed or suspended.", the [=controlling
user agent=] may use the power saving mechanism defined in the
previous section.

When [[PRESENTATION-API|Presentation API]] [section
6.3.4](https://www.w3.org/TR/presentation-api/#starting-a-presentation-connection)
says "Using an implementation specific mechanism, tell U to create a
receiving browsing context with D, presentationUrl, and I as
parameters.", U (the [=controlling user agent=]) may send a
presentation-start-request message to D (the receiver), with I for the
presentation identifier and presentationUrl for the selected
presentation URL.

<!-- TODO: Once the Presentation API has text about reconnecting via an
implementation specific mechanism, quote that here and map it to a message -->

When [[PRESENTATION-API|Presentation API]] [section
6.5.2](https://www.w3.org/TR/presentation-api/#sending-a-message-through-presentationconnection)
says "Using an implementation specific mechanism, transmit the
contents of messageOrData as the presentation message data and
messageType as the presentation message type to the destination
browsing context", the [=controlling user agent=] may send a
presentation-connection-message with messageOrData for the
presentation message data.  Note that the messageType is embedded in
the encoded CBOR type and does not need an additional value in the
message.

When [[PRESENTATION-API|Presentation API]] [section
6.5.6](https://www.w3.org/TR/presentation-api/#terminating-a-presentation-in-a-controlling-browsing-context)
says "Send a termination request for the presentation to its receiving user
agent using an implementation specific mechanism", the [=controlling user
agent=] may send a presentation-termination-request message.

When [[PRESENTATION-API|Presentation API]] [section
6.7.1](https://www.w3.org/TR/presentation-api/#monitoring-incoming-presentation-connections)
says "it MUST listen to and accept incoming connection requests from a
controlling browsing context using an implementation specific
mechanism", the [=receiving user agent=] must receive and process the
presentation-connection-open-request.

When [[PRESENTATION-API|Presentation API]] [section
6.7.1](https://www.w3.org/TR/presentation-api/#monitoring-incoming-presentation-connections)
says "Establish the connection between the controlling and receiving browsing
contexts using an implementation specific mechanism.", the [=receiving user
agent=], must send a presentation-connection-open-response message.


Remote Playback API Protocol {#remote-playback}
-----------------------------------------------

Issue(12): Propose control protocol for Remote
Playback API.


Security and Privacy {#security-privacy}
====================

The Open Screen Protocol allows two networked agents to discover each other
and exchange user and application data.  As such, its security and privacy
considerations should be closely examined.  We first evaluate the protocol
itself using the W3C [[SECURITY-PRIVACY-QUESTIONNAIRE|Security and Privacy
Questionnaire]].  We then examine whether the security and privacy guidelines
recommended by the [[PRESENTATION-API|Presentation API]] and the
[[REMOTE-PLAYBACK|Remote Playback API]] are met.  Finally we discuss recommended
mitigations that agents can use to meet these security and privacy
requirements.

Threat Models {#threat-models}
--------------------------------

### Passive Network Attackers ### {#passive-network-attackers}

The Open Screen Protocol should assume that all parties that are connected to
the same LAN, either through a wired connection or through WiFi, are able to
observe all data flowing between Open Screen Protocol agents.

These parties will be able collect any data exposed through unencrypted
messages, such as mDNS records and the QUIC handshakes.

These parties may attempt to learn cryptographic parameters by observing data
flows on the QUIC connection, or by observing cryptographic timing.

### Active Network Attackers ### {#active-network-attackers}

Active attackers, such as compromised routers, will be able to manipulate data
exchanged between agents.  They can inject traffic into existing QUIC
connections and attempt to inititate new QUIC connections.  These abilities can
be used to attempt the following:

*   Impersonate an agent or one already trusted by the user, in an attempt
    to convince the user to authenticate to it.
*   Connect to an agent and query its capabilities.
*   Connect to and control a presentation or remote playback, or extract data
    from the application state of the presentation or remote playback.
    
One particular attack of concern is misconfigured or compromised routers that
expose local network devices (such as Open Screen Protocol agents) to the
Internet.  This vector of attack has been used by malicious parties to take
control of printers and smart TVs by connecting to local network services that
would normally be inaccessible from the Internet.

### Denial of Service ### {#denial-of-service}

Parties with connected to the LAN may attempt to deny access to Open Screen
Protocol agents.  For example, an attacker my attempt to open
a large number of QUIC connections to an agent in an attempt to block
legitimate connections or exhaust the agent's system resources.  They may
also multicast spurious DNS-SD records in an attempt to exhaust the cache
capacity for mDNS listeners, or to get listeners to open a large number of bogus
QUIC connections.

### Same-Origin Policy Violations ### {#same-origin-policy-violations}

The Presentation API allows cross-origin communication between controlling pages
and presentations with the consent of each origin (through their use of the
API).  This is similar to cross-origin communication via
{{Window/postMessage()}} with a target origin of `*`.  However, the Presentation
API does not convey source origin information with each message.  Therefore, the
Open Screen Protocol does not convey origin information between its agents.

The [=presentation ID=] carries some protection against unrestricted
cross-origin access; but, rigorous authentication of the parties connected by a
{{PresentationConnection}} must be done at the application level.

Open Screen Protocol Security and Privacy Considerations {#security-privacy-questions}
-----------------------------------

### Personally Identifiable Information & High-Value Data ### {#personally-identifiable-information}

The following data exchanged by the protocol can be personally identifiable
and/or high value data:

1. Presentation URLs and availability results
1. Presentation IDs
1. Presentation connection IDs
1. Presentation connection messages
1. Remote playback URLs
1. Remote playback commands and status messages

Presentation IDs are considered high value data because they can be used in
conjunction with a Presentation URL to connect to a running presentation.

Presentation display friendly names, model names, and capabilities, while not
considered personally identifiable, are important to protect to prevent an
attacker from changing them or substituting other values during the discovery
and authentication process.

The following data cannot be reasonably made confidential and should be
considered public and untrusted data:

1. IP addresses and ports used by the Open Screen Protocol.
1. Data advertised through mDNS, including the display name prefix, the
    certificate fingerprint, and the metadata version.
   
### Cross Origin State Considerations ### {#cross-origin-state}

Access to origin state across browsing sessions is possible through the
Presentation API by reconnecting to a presentation that was started by a
previous session. This scenario is addressed in
[[PRESENTATION-API#cross-origin-access]].

Presentation display availability and remote playback device availability are
states that are available cross-origin depending on the user's network
context.  Exposure of this data to the Web is also discussed in
[[PRESENTATION-API#personally-identifiable-information]] and
[[REMOTE-PLAYBACK#personally-identifiable-information]].

### Origin Access to Other Devices ### {#origin-access-devices}

By design, the Open Screen Protocol allows access to presentation displays and
remote playback devices from the Web.  By implementing the protocol, these
devices are knowingly making themselves available to the Web and should be
designed accordingly.

Below, we discuss mitigation steps to prevent malicious use of these devices.

### Incognito Mode ### {#incognito-mode}

The Open Screen Protocol does not distinguish between the user agent's normal
browsing and incognito modes, and agents that follow the specification
behave identically regardless of which mode is in use.

It's recommended that user agents use separate authentication contexts and QUIC
connections for normal and incognito profiles from the same user agent instance.
This prevents Open Screen agents from correlating activity among profiles
belonging to the same user (both normal and incognito).

### Persistent State ### {#persistent-state}

An agent is likely to persist the identity of agents that have successfully
completed [[#authentication]].  This may include the public key fingerprints,
metadata versions, and metadata for those parties.

However, this data is not normally exposed to the Web, only through the native
UI of the user agent during the display selection or display authentication
process.  It can be an implementation choice whether the user agent clears or
retains this data when the user clears browsing data.

Issue(132): [Privacy] Fate of metadata / authentication history when clearing
browsing data

### Other Considerations ### {#other-considerations}

The Open Screen Protocol does not grant to the Web additional access to the
following:

* New script loading mechanisms
* Access to the user's location
* Access to device sensors
* Access to the user's local computing environment
* Control over the user agent's native UI
* Security characteristics of the user agent

Presentation API Considerations {#presentation-api-considerations}
-------------------------------

[[PRESENTATION-API#security-and-privacy-considerations]] place these
requirements on the Open Screen Protocol:

1.  Presentation URLs and presentation IDs should remain private among the
    parties that are allowed to connect to a presentation, per the
    cross-origin access guidelines.
1.  Controllers and receivers should be notified when connections representing
    multiple user agent profiles have been made to a presentation, per the user
    interface guidelines.
1.  Messaging between controllers and receivers should be authenticated and
    confidential, per the guidelines for messaging between presentation
    connections.

The Open Screen Protocol addresses these considerations by:

1. Requiring mutual authentication and a TLS-secured QUIC connection before
     presentation URLs, IDs, or messages are exchanged.
1. Adding explicit messages and connection IDs for individual
     {{PresentationConnection|PresentationConnections}} so that agents can track
     the number of active connections.

Remote Playback API Considerations {#remote-playback-considerations}
----------------------------------

The [[REMOTE-PLAYBACK#security-and-privacy-considerations]] also state that
messaging between local and remote playback devices should also be authenticated
and confidential.

This consideration is handled by requiring mutual authentication and a
TLS-secured QUIC connection before any remote playback related messages are
exchanged.

Mitigation Strategies {#security-mitigations}
--------------------------------------------

### Local passive network attackers ### {#local-passive-mitigations}

Local passive attackers may attempt to harvest data about user activities and
device capabilties using the Open Screen Protocol.  The main strategy to address
this is data minimization, by only exposing opaque public key fingerprints
before user-mediated authentication takes place.

Passive attackers may also attempt timing attacks to learn the
cryptographic parameters of the TLS 1.3 QUIC connection.

Issue(130): [Security] Review attack and mitigation considerations for TLS 1.3

### Local active network attackers ### {#local-active-mitigations}

Local active attackers may attempt to impersonate a presentation display the
user would normally trust.  The [[#authentication]] step of the Open Screen
Protocol prevents a man-in-the-middle from impersonating an agent, without
knowledge of a shared secret.  However, it is possible for an attacker to
impersonate an existing, trusted display or a newly discovered display that is
not yet authenticated and try to convince the user to authenticate it.

This can be addressed through a combination of techniques.  The first is flagging:

* Flag an advertised display whose public key fingerprint collides with that
    from an already-trusted display that is concurrently being advertised.
* Flag an advertised display whose friendly name differs from the one previously
    advertised under a public key fingerprint.
* Flag already-trusted displays whose metadata has changed.
* Flag agents that fail the authentication challenge a certain number of times.
  
Flagging means that the user is notified, or in some cases they are required to
re-authenticate to the presentation display to verify its identity.
  
The second is through management of the shared secret during mutual
authentication:

* Rotate the shared secret to prevent brute force attacks.
* Use an increasing backoff to respond to authentication challenges, also to
    prevent brute force attacks.
* Use a cryptographically sound source of entropy to generate the shared secret.
* Require the end user to manually type the shared secret - shown only on the
    display - to prevent the user from blindly clicking through this step.

The active attacker may also attempt to disrupt data exchanged over the QUIC
connection by injecting or modifying traffic.  These attacks should be mitigated
by a correct implementation of TLS 1.3.

Issue(130): [Security] Review attack and mitigation considerations for TLS 1.3

### Remote active network attackers ### {#remote-active-mitigations}

Unfortunately, we cannot rely on network devices to fully protect Open Screen
Protocol agents from traffic from the broader Internet.  Open Screen Protocol
agents that are only intended to work on the LAN should filter packets
from non-local IP addresses.  Agents can also use the ARP cache to
detect attempts to spoof local network IP addresses.

Issue(131): [Security] Mitigations for remote network attackers

### Denial of service ### {#denial-of-service-mitigations}

It will be difficult to completely prevent denial service of attacks that
originate on the user's local area network.  Open Screen Protocol agents can
refuse new connections, rate limit the rate of messages from existing
connections, or limit the number of mDNS records cached from a specific
responder in an attempt to allow existing activities to continue in spite of
such an attack.

### Malicious input ### {#malicious-input-mitigations}

Open Screen Protocol agents should be robust against malicious input that
attempts to compromise the target device by exploiting parsing vulnerabilities.

CBOR is intended to be less vulnerable to such attacks relative to alternatives
like JSON and XML.  Still, agents should be thoroughly tested using approaches
like [fuzz testing](https://en.wikipedia.org/wiki/Fuzzing).

Where possible, Open Screen Protocol agents (including the content rendering
components) should use defense-in-depth techniques like <a
href="https://en.wikipedia.org/wiki/Sandbox_(computer_security)">sandboxing</a>
to prevent vulnerabilities from gaining access to user data or leading to
persistent exploits.

Appendix A: Messages {#appendix-a}
====================

The following messages are defined with [[CDDL]]. When
integer keys are used, a comment is appended to the line to indicate
the name of the field. Each root message (one that can be put into a
QUIC stream without being enclosed by another message) has a comment
indicating the message type key.

Smaller numbers should be reserved for message that will be sent more
frequently or are very small or both and larger numbers should be
reserved for messages that are infrequently sent or large or both
because smaller type keys encode on the wire smaller.

<!-- TODO: Make a bikeshed formatter CDDL -->

<pre>
; type key 10
agent-info-request = {
  request
}

; type key 11
agent-info-response = {
  response
  1: agent-info ; agent-info
}

agent-info = {
  0: text ; friendly-name
  1: text ; model-name
  ; ...
}

; type key 12
agent-status-request = {
  request
  ? 1: status ; status
}

; type key 13
agent-status-response = {
  response
  ? 1: status ; status
}

status = {
}

request = (
 0: request-id ; request-id
)

response = (
 0: request-id ; request-id
)

request-id = uint

; type key 1001 
authentication-request = {
  request
  1: authentication-mechanism ; mechanism
  2: bytes ; salt
  3: uint ; cost
}

; type key 1002 
authentication-response = {
  response
  1: authentication-response-result ; result
  2: bytes ; proof
}

certificate-fingerprint-pair = [
  challenger-fingerprint: bytes
  responder-fingerprint: bytes
]

; type key 1003
authentication-result = {
  1: authentication-result-result ; result
}


authentication-mechanism = &(
  hkdf-of-scrypt-of-psk: 1
)

authentication-response-result = &(
  ok: 0
  unknown-error: 1
  mechanism-unknown: 2
  salt-too-small: 3
  cost-too-low: 4
  cost-too-high: 5
  secret-unknown: 6
  calculation-took-too-long: 7
)

authentication-result-result = &(
  authenticated: 0
  unknown-error: 1
  proof-invalid: 2
)

; type key 14
presentation-url-availability-request = {
  request
  1: [1* text] ; urls
  2: microseconds ; watch-duration
  3: int ; watch-id
}

; type key 15
presentation-url-availability-response = {
  response
  1: [1* url-availability] ; url-availabilities
}

; type key 103
presentation-url-availability-event = {
  1: int ; watch-id
  2: [1* url-availability] ; url-availabilities
}


; idea: use HTTP response codes?
url-availability = &(
  available: 0
  unavailable: 1
  invalid: 10
)

; type key 104
presentation-start-request = {
  request
  1: text ; presentation-id
  2: text ; url
  3: [* http-header] ; headers
}

http-header = [
  key: text
  value: text
]

; type key 105
presentation-start-response = {
  response
  1: &result ; result
  2: uint ; connection-id
}

; type key 106
presentation-termination-request = {
  request
  1: text ; presentation-id
  2: &(
    controller-called-terminate: 10
    user-terminated-via-controller: 11
    unknown: 255
  )
 ; reason
}

; type key 107
presentation-termination-response = {
  response
  1: result ; result
}

; type key 108
presentation-termination-event = {
  1: text ; presentation-id
  2: &(
    receiver-called-terminate: 1
    user-terminated-via-receiver: 2
    controller-called-terminate: 10
    user-terminated-via-controller: 11
    receiver-replaced-presentation: 20
    receiver-idle-too-long: 30
    receiver-attempted-to-navigate: 31
    receiver-powering-down: 100
    receiver-crashed: 101
    unknown: 255
  )
}

; type key 109
presentation-connection-open-request = {
  request
  1: text ; presentation-id
  2: text ; url
}

; type key 110
presentation-connection-open-response = {
  response
  1: &result ; result
  2: uint; connection-id
}

; type key 111
presentation-connection-close-request = {
  request
  1: uint ; connection-id
}

; type key 112
presentation-connection-close-response = {
  response
  1: &result ; result
}

; type key 113
presentation-connection-close-event = {
  1: uint; connection-id
  2: &(
    close-method-called: 1
    connection-object-discarded: 10
    unrecoverable-error-while-sending-or-receiving-message: 100
  ) ; reason
  ? 3: text ; error-message
}


; type key 16
presentation-connection-message = {
  1: uint ; connection-id
  2: bytes / text ; message
}

result = (
  success: 1
  invalid-url: 10
  invalid-presentation-id: 11
  timeout: 100
  transient-error: 101
  permanent-error: 102
  terminating: 103
  unknown-error: 199
)

</pre>