Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define protocol binding for events - closes #42 #92

Merged
merged 9 commits into from
Sep 16, 2021

Conversation

benfrancis
Copy link
Member

@benfrancis benfrancis commented Jul 21, 2021

This a proposed protocol binding for events in the Core Profile, using Server-Sent events.

As noted in #42, using WebSockets would require defining a sub-protocol and long-polling is a non-optimal solution. Server-Sent events are an attractive option because they are defined in an existing W3C specification which now enjoys broad native browser support. Server-Sent Events provide a simple mechanism to push events from a server to a client over HTTP, fit neatly with the WoT event data model and benefit from existing optimisations in user agents for maintaining persistent connections.

I hope that in the future the Web Thing Protocol Community Group will specify a WoT sub-protocol for use with WebSockets, but that is unlikely to be standardised in time for the first version of the WoT Profile specification.

Notes:

  • I noticed that the Protocol Binding Templates specification uses a POST request for the initial SSE connection, but my understanding is that most existing SSE implementations use a GET request. I've therefore used a GET request in this protocol binding to maximise compatibility.
  • The Core Profile Protocol Binding specification defers to the existing Server-Sent Events specification for details of how to initiate, maintain and terminate SSE connections. Any text describing SSE protocol details is therefore only informative and is marked as being non-normative.
  • I haven't mentioned anything about CORS or supplying credentials in the text. Is it sufficient to defer to the SSE specification for these details?
  • I specified that event stream messages sent as a result of a subscribeevent operation should include the name of the event that was emitted. Technically this is only necessary for subscribeallevents where messages may relate to multiple event types, but I have included it in both cases for consistency.
  • WebThings does not currently support Sever-Sent Events (it uses WebSockets), but if this becomes part of the Core Profile then we are willing to implement SSE support.

Feedback welcome.


Preview | Diff

Copy link
Member

@relu91 relu91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR is sound, besides the specific comment below I want to add some general feedback.

The Core Profile Protocol Binding specification defers to the existing Server-Sent Events specification for details of how to initiate, maintain and terminate SSE connections. Any text describing SSE protocol details is therefore only informative and is marked as being non-normative.

I think we should offload everytime we can, unless the reference is not very clear and can lead to different implementation interpretations (which might cause interoperability problems).

I specified that event stream messages sent as a result of a subscribeevent operation should include the name of the event that was emitted. Technically this is only necessary for subscribeallevents where messages may relate to multiple event types, but I have included it in both cases for consistency.

There might be a negligible overhead but I think that clarity and consistency are more important here.

I noticed that the Protocol Binding Templates specification uses a POST request for the initial SSE connection, but my understanding is that most existing SSE implementations use a GET request. I've therefore used a GET request in this protocol binding to maximise compatibility.

FYI I checked node-wot and we already have a client implementation for SSE and the library that we are using initiates the connection with a GET. I think that UML sequence diagram has also other problems but I'll open an issue in the protocol binding repository.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Show resolved Hide resolved
<p class="rfc2119-assertion"
id="core-profile-protocol-binding-events-subscribeevent-2">
In order to subscribe to an event, a Consumer MUST follow the
Server-Sent Events [[EVENTSOURCE]] specification to open a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source is pointing to a surpeseded recommendation: https://www.w3.org/TR/eventsource/ . Should we use this one instead: https://html.spec.whatwg.org/multipage/server-sent-events.html ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this too. When I look up EventSource in the Specref database it refers to the latter URL as "ED" which I assume means Editor's Draft? I'm not exactly sure of the correct way to fix this, do you know?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed in Arch call on 9.9. and concluded to rely on the official W3C spec.
The whatwg document is a living document and not a reliable reference.

@mlagally
Copy link
Contributor

@benfrancis
A couple of initial comments:

  • I'm not sure why you include code examples into the spec - the profile is not selecting any programming language
  • We should consider using a payload format such as https://github.com/cloudevents/spec/blob/v1.0.1/spec.md
  • We need a mechanism that is scalable to cloud deployments, i.e. keeping open millions of connections simultaneously does not work

@mmccool
Copy link
Contributor

mmccool commented Jul 29, 2021

I think this is a good start; note that we previously agreed in a meeting that SSE (being a standard) was a reasonable compromize.
Some more specific comments:

  • I thought WhatWG specs were being folded back into W3C; how does this relate to the WhatWG SSE spec? And what are the actual differences?
  • We can do a security review via the Security TF, but I would expect that we can use "security" and appropriate SecuritySchemes to provide any metadata, but that the server and client would still have to mutually authenticate, etc. We do not want to embed any credentials in the TD; they need to be communicated and set up out of band. I think this also applies to CORS configuration.
  • Regarding scalability to large number of services/devices: this is going to be a general problem with most mechanisms to deliver events: websockets, MQTT, long-polling, etc. The exception would be webhooks, which however has other issues, like the need for the consumer to provide a web server and a URL, etc. So I think we just have to live with this limitation. This gets back to what this profile is "for", and my understanding was that we were targeting "limited local installations" like homes, factories, etc. I think we may have to consider "digital twins" as a different situation and it might need its own profile (and then it could specify webhooks as its event mechanism to deal with scalability issues). The other place where I see scalability being a problem however might be smart cities. Still, I think we should proceed with SSE for the reasons discussed already (including KISS).

Copy link
Contributor

@mmccool mmccool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I had to rush this review a little due to a a schedule constraint, but it generally looks good. I need to look at the SSE spec as well, however, to answer certain questions. At the very least, it's a good baseline.

See my other comments about scalability, etc. This gets back to the scope or intended usage of the profile, which we should define. At this point I would rather not reopen the discussion about whether or not we should use SSE, but should focus on how to use SSE well.

index.html Outdated Show resolved Hide resolved
index.html Outdated Show resolved Hide resolved
index.html Show resolved Hide resolved
@benfrancis
Copy link
Member Author

@mlagally wrote:

I'm not sure why you include code examples into the spec - the profile is not selecting any programming language

These are just examples to show that a Consumer which fully implements with the Server-Sent Events standard does not need to formulate the requests manually as there is an easier way of doing it. The Server-Sent Events specification defines both the protocol and scripting aspects in a single specification.

We should consider using a payload format such as https://github.com/cloudevents/spec/blob/v1.0.1/spec.md

I don't think this is necessary. The Server-Sent Events specification already defines a content format and the data schemas for event payloads can be directly serialised into JSON.

We need a mechanism that is scalable to cloud deployments, i.e. keeping open millions of connections simultaneously does not work

What mechanism do you suggest?

@benfrancis
Copy link
Member Author

benfrancis commented Jul 29, 2021

Action points from WoT Architecture meeting:

  1. Remove implied default subprotocol for Event affordances
  2. Add localbib reference for WHATWG EventSource specification
  3. Replace reference to canonical TDs with "Forms with defaults applied"

@egekorkan
Copy link
Contributor

egekorkan commented Jul 31, 2021

I also like the proposal. Some comments:

  • As pointed out by @mlagally the examples of JS code can be misleading the reader to think that this is how it should be done. In the TD spec, the examples show how a certain term should be used, here they are really just FYI kind of examples. It is still good to have these examples though, just making it more clear somehow.
  • I think there was a decision to not support observeproperty (and its similar friends). Is this decision still valid if this PR gets accepted?
  • Regarding scalability, there is the following comment at MDN:

When not used over HTTP/2, SSE suffers from a limitation to the maximum number of open connections, which can be especially painful when opening multiple tabs, as the limit is per browser and is set to a very low number (6). The issue has been marked as "Won't fix" in Chrome and Firefox. This limit is per browser + domain, which means that you can open 6 SSE connections across all of the tabs to www.example1.com and another 6 SSE connections to www.example2.com (per Stackoverflow). When using HTTP/2, the maximum number of simultaneous HTTP streams is negotiated between the server and the client (defaults to 100).

@benfrancis
Copy link
Member Author

I think I have now addressed all of the open issues on this PR:

  1. Removed implied default subprotocol for Event affordances
  2. Added a localBiblio reference for the WHATWG EventSource specification*
  3. Removed the dependency on Canonical TDs by describing applying defaults to forms instead
  4. Made it more explicit that the JavaScript examples are only for Consumers implemented in JavaScript

As far as I'm concerned this is now ready to land. Who can approve this?


*Note that the ReSpec documentation says that overriding references is strongly discouraged, but in the case I'm not sure of the correct fix. The W3C specification says that it has been superseded by the WHATWG HTML specification, but the SpecRef database currently lists the WHATWG Server-Sent Events specification as an "Editor's Draft".

@benfrancis
Copy link
Member Author

@egekorkan wrote:

  • As pointed out by @mlagally the examples of JS code can be misleading the reader to think that this is how it should be done. In the TD spec, the examples show how a certain term should be used, here they are really just FYI kind of examples. It is still good to have these examples though, just making it more clear somehow.

I've addressed this in the latest version.

  • I think there was a decision to not support observeproperty (and its similar friends). Is this decision still valid if this PR gets accepted?

I personally don't mind either way. Subscribing is the primary (default) use case of an event affordance, whereas observing is arguably more of a nice-to-have for a property. I wouldn't be opposed to defining a protocol binding for observeproperty/unobserveproperty using Server-Sent Events (to reduce implementation complexity) as an optional feature of the Core Profile. I have filed #95 to discuss this further.

  • Regarding scalability, there is the following comment at MDN

Thank you for linking to the reference. We discussed this point on the WoT Architecture call and I certainly do think this is an unfortunate limitation for Consumers implemented as client-side web applications expected to run in a web browser. However, we concluded that this is an implementation specific limitation which isn't inherent to the Server-Sent Events specification itself, and therefore should not rule out Server-Sent Events as a solution.

I explained in the introduction to this PR why I think Server-Sent Events are the best option currently available to us.

@benfrancis
Copy link
Member Author

benfrancis commented Aug 25, 2021

For information, this events protocol binding is now implemented in WebThings Gateway.

There are existing libraries available which implement Server-Sent Events, but it was actually fairly straightforward to implement a server manually using a standard HTTP library.

Things I have learned through implementation:

  1. Unlike WebSockets, all messages are encoded in UTF-8 which means that if you want to include a binary file in an event payload it must be base64 encoded
  2. Like WebSockets, the client-side JavaScript API used in web browsers (EventSource) doesn't have the option to add custom HTTP headers to a request, e.g. to add a bearer token in an authorization header. That means that for WebThings Gateway we had to put the JWT in a query string, which isn't ideal but we already have to do that for WebSockets too.
  3. Implementers need to take care to ensure that server middleware or proxies don't try to transform or optimise responses (which can prevent the connection staying open or cause clients to not recognise the response as an EventSource) and that clients don't try to cache them (which can prevent events from being received). This is probably quite implementation specific but in the case of WebThings an additional response header Cache-Control: no-cache, no-transform helped disable optimisations which would otherwise prevent SSE from working properly.

Overall my impression is that SSE is a good choice for this protocol binding:

  1. It's an existing standard with a simple pre-defined protocol (and a standard client-side JavaScript API if you want to use it) which are natively supported by web browsers
  2. It's built on HTTP, and JSON is a natural fit for event payloads
  3. Implementation is relatively straightforward
  4. It has a built-in mechanism for recovering dropped connections
  5. Existing client and server implementations have optimisations in place for maintaining a persistent connection
  6. The SSE protocol is a very good fit with the Web of Things data model

@mlagally
Copy link
Contributor

@benfrancis
Copy link
Member Author

@mlagally For future reference the rendered preview automatically updates with new commits, there's no need to include the commit number in the link https://pr-preview.s3.amazonaws.com/benfrancis/wot-profile/pull/92.html

Copy link
Contributor

@mlagally mlagally left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlagally For future reference the rendered preview automatically updates with new commits, there's no need to include the commit number in the link https://pr-preview.s3.amazonaws.com/benfrancis/wot-profile/pull/92.html

I'm missing the preview link in this PR.

The Core Profile uses Server-Sent Events [[EVENTSOURCE]] as a
mechanism for Consumers to subscribe to events emitted by a Web Thing.
</p>
<p class="note">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this note, which should reference section 4. Should it also exclude the processing model, which refers to section 4?
The processing model is a bit vague, e.g.

For HTTP connections, the Accept header may be included; if included, it must contain only formats of event framing that are supported by the user agent (one of which must be text/event-stream, as described below).
Which Accept header are we using?

I believe we we need to create a few normative assertions that clearly states which parts of SSE MUST be implemented and provides a few additional clarifications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this note, which should reference section 4. Should it also exclude the processing model, which refers to section 4?

As I tried to explain on the Architecture call, this is why I wanted to refer to the Server-Sent Events specification as a whole rather than trying to slice it up, because the way the specification is written tightly couples the scripting API with the wire protocol. E.g. We can't exclude the section 5 (Processing Model) because it's a core part of the protocol definition, but it does make reference to section 4 (the EventSource interface). Sections 1, 7, 8, 9 and 10 also reference section 4.

That's a shame, because there's really nothing JavaScript-specific about the Server-Sent events protocol, demonstrated by the fact that there are existing SSE client implementations in Python, Java, Rust etc. which provide their own language-specific APIs.

The specification does allude to this by mentioning in the Introduction that the client API can emulated using XMLHttpRequest (i.e. a generic HTTP client library), but that in the case of web browsers an implementation of the EventSource API can provide additional optimisations.

The processing model is a bit vague, e.g.

For HTTP connections, the Accept header may be included; if included, it must contain only formats of event framing that are supported by the user agent (one of which must be text/event-stream, as described below).
Which Accept header are we using?

I don't think this is particularly vague. It's just saying that an Accept header in the request used to initiate a connection MUST include text/event-stream, but it may also include other formats if the client supports them. It is a bit loose in that it it only says an Accept header MAY be included, but what I read from this that if an Accept header isn't included then no content negotiation is required and the server should just return text/event-stream regardless.

I believe we we need to create a few normative assertions that clearly states which parts of SSE MUST be implemented and provides a few additional clarifications.

I'm not really sure what to suggest here unless we plan on forking the whole Server-Sent Events specification with a version which is less tightly coupled with the EventSource JavaScript API. Do you have a suggestion of alternative text which would make this more clear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A careful reading of the SSE spec unfortunately showed a couple of significant issues/gaps:

Chaper 5 of this specification defines a normative processing model on the user agent, which assumes a JavaScript based browser implementation and mandates a specific implementation methodology e.g. describing task queues etc.
which is appropriate for a browser.
However these requirements put unnecessary requirements on non-browser consumers and should be excluded. We need however to incorporate the protocol handling part from that chapter, but these are only a few paragraphs.

Chapter 6 defines the lexical format "text/event-stream". It is strange that the name field names may include non-printable characters as well as space, i.e. a name field may actually be a name that contains spaces. These cases are however excluded in
Chapter 7

Chapter 7 defines 4 field names of the event stream and their interpretation
This is a plain text serialisation of event types and event data where we need to define a mapping of the data schema, which is defined by the event affordance of the TD spec (section 5.3.1.5) to this serialised text stream.
Event types are defined in SSE, however we need to defin a mapping as well.
It would be appropriate to consider
The retry field leaves some room for interpretation (which unit?), also the permitted values of the id field needs clarification.

Chapter 9 assumes a push proxy - we need to discuss details further.

Chapter 10 is probably not required by the WoT profile.

With all this taken into account, we should either cherry-pick and provide significant clarifications on the SSE model
and define a mapping of the event dataschema to the event stream (chapter 7) or we describe the full event model by ourselves.

We will just need to define protocol behavior (i.e. HTTP response codes and corresponding semantics) and have to decide if we want to adopt a textual event stream format or go with a serialized stream of JSON objects.
Since we have structured events in the TD, i.e. we have a JSON schema for event data, the obvious alternative is to select an appropriate media type, e.g. JSON text sequences: https://datatracker.ietf.org/doc/html/rfc7464

Copy link
Member Author

@benfrancis benfrancis Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlagally wrote:

Chaper 5 of this specification defines a normative processing model on the user agent, which assumes a JavaScript based browser implementation and mandates a specific implementation methodology e.g. describing task queues etc.
which is appropriate for a browser.
However these requirements put unnecessary requirements on non-browser consumers and should be excluded. We need however to incorporate the protocol handling part from that chapter, but these are only a few paragraphs.

Yes, that is what I meant when I said that the wire protocol is tightly coupled with the scripting API. In my experience of implementing the specification that wasn't really a problem, but it does make it tricky to cite as a reference.

Chapter 7 defines 4 field names of the event stream and their interpretation
This is a plain text serialisation of event types and event data where we need to define a mapping of the data schema, which is defined by the event affordance of the TD spec (section 5.3.1.5) to this serialised text stream.
Event types are defined in SSE, however we need to defin a mapping as well.
It would be appropriate to consider

The specification text I have written already does this:

Whenever an event of the specified type occurs while the Web Thing has an open connection with a Consumer, the Web Thing MUST send event data to the Consumer using the event stream format in the Server-Sent Events [EVENTSOURCE] specification. For each message sent, the Web Thing MUST set the event field to the name of the EventAffordance and populate the data field with event data, if any. The event data MUST follow the data schema specified in the EventAffordance and MUST be serialized in JSON. The id field SHOULD be set to a unique identifier for the event, for use when re-establishing a dropped connection (see below).

The retry field leaves some room for interpretation (which unit?)

The unit is defined as milliseconds https://www.w3.org/TR/eventsource/#concept-event-stream-reconnection-time

also the permitted values of the id field needs clarification.

The id can be any string, encoded as UTF-8. https://www.w3.org/TR/eventsource/#concept-event-stream-last-event-id In the implementations I've seen it's quite common to use a timestamp (number of milliseconds past the epoch) as the id, which is also what I used when I implemented it.

Chapter 9 assumes a push proxy - we need to discuss details further.

I think this is just implementation advice, and shouldn't pose any problems. We should assume that all parts of the HTTP protocol binding could be proxied. In the case of events it is important for clients to send a Cache-Control: no-cache header to prevent proxies from caching responses, as defined in the specification.

Chapter 10 is probably not required by the WoT profile.

Again, this is really just implementation advice for web browsers (Window and Document objects probably won't exist in other clients).

We will just need to define protocol behavior (i.e. HTTP response codes and corresponding semantics)

I think in general we should leave this down to the Server-Sent Events specification to define, but is there a specific response code which you think needs defining because it's specific to our use case?

and have to decide if we want to adopt a textual event stream format or go with a serialized stream of JSON objects.
Since we have structured events in the TD, i.e. we have a JSON schema for event data, the obvious alternative is to select an appropriate media type, e.g. JSON text sequences: https://datatracker.ietf.org/doc/html/rfc7464

This is unnecessary. Each event payload consists of a single JSON text, which in the proposed specification text I quoted above is directly mapped onto the data field of the event stream format. It's very common to use JSON text as a data payload format for Server-Sent Events. If you tried to send a sequence of events as JSON text sequences rather than following the text/event-stream format, then existing clients (like web browsers) wouldn't understand them.

Now that we have established that trying to cherry-pick parts of the specification doesn't work very well because they all reference each other, I think we could just remove the note and reference the specification as a whole. I agree it's not perfect because the specification is clearly written with browser vendors in mind, but there are plenty of examples of non-JavaScript implementations where developers were able to successfully implement the wire protocol without the scripting API.

index.html Outdated Show resolved Hide resolved
@benfrancis
Copy link
Member Author

I'm missing the preview link in this PR.

Oh that's strange, it's working for me.

@mlagally mlagally merged commit f135e57 into w3c:main Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants