New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Notification versus Content Distribution #84

Closed
azaroth42 opened this Issue Jan 13, 2017 · 15 comments

Comments

Projects
None yet
5 participants
@azaroth42

azaroth42 commented Jan 13, 2017

Section 6 says:

This request MUST have a Content-Type Header corresponding to the Content-Type of the topic, and SHOULD contain the full contents of the topic URL.

This makes it impossible to have notifications about the change to the topic resource distributed. For example, if I have a gigapixel Image, and I want to say that I modified it, I MUST send an image to the subscriber from the hub, and SHOULD send the entire multi-gigabyte TIFF. I'm pretty sure the subscriber does not want my TIFF pushed down their throat, just to know that I changed it.

This also breaks the direction being discussion in #68. If I have a collection of images, and the topic URL is an HTML page, then I MUST send an HTML notification.

I propose to drop this sentence.

@sandhawke

This comment has been minimized.

Show comment
Hide comment
@sandhawke

sandhawke Jan 13, 2017

Contributor

@azaroth42 It's clear there are use cases for sending just deltas instead of the full content with each change, but it's not exactly clear the best way to do that. In particular, it seems important to keep the two cases distinct, and not get systems confused about which they're getting or supposed to be sending. If I understand correctly, that's been a bit of a problem in the past. (With an RSS-type feed, in the normal case, things work the same whether you treat the notification as consisting of the full content or just the new items. But then if an item is removed, does that mean it was deleted or is just no longer new? I understand implementations have been inconsistent on this.)

I think the proper-webarch solution would be to treat the callback URL as identifying a resource which is intended to mirror the state of the topic resource. As such, it would make sense for the hub to do notifications either with a PUT of the full content or a PATCH for efficient updates.

This is not, however, what's currently implemented, and the WG doesn't have time to experiment with this.

The nice thing is, this works well as an extension. Everyone can do POST with full content, and folks who know how to send and receive patches can negotiate to do that.

A very simple approach would be:

  1. Hub sends the first notification as a POST with full content, as per current spec.
  2. If the receiver wants patches, it includes an Accept-Patch response header in its reply, listing the patch media-types it understands.
  3. If the hub can send patches using one of those media types, it does that for future notifications (using the HTTP PATCH verb to the same callback URL)

A slightly more sophisticated approach would allow skipping even that first POST:

  1. Before doing a POST with a full content for a very large resource, the hub does a HEAD on the callback URL
  2. If it gets an Accept-Patch header and an ETag header with an ETag for a version it can use for generating its patch, it proceeds as above in step 2.

How's that sound?

Alternatively, one could put the Accept-Patch and ETag information in the subscription, of course. That seems to be cheating a little on webarch, and might conceivably causes problems with some infrastructure.

Contributor

sandhawke commented Jan 13, 2017

@azaroth42 It's clear there are use cases for sending just deltas instead of the full content with each change, but it's not exactly clear the best way to do that. In particular, it seems important to keep the two cases distinct, and not get systems confused about which they're getting or supposed to be sending. If I understand correctly, that's been a bit of a problem in the past. (With an RSS-type feed, in the normal case, things work the same whether you treat the notification as consisting of the full content or just the new items. But then if an item is removed, does that mean it was deleted or is just no longer new? I understand implementations have been inconsistent on this.)

I think the proper-webarch solution would be to treat the callback URL as identifying a resource which is intended to mirror the state of the topic resource. As such, it would make sense for the hub to do notifications either with a PUT of the full content or a PATCH for efficient updates.

This is not, however, what's currently implemented, and the WG doesn't have time to experiment with this.

The nice thing is, this works well as an extension. Everyone can do POST with full content, and folks who know how to send and receive patches can negotiate to do that.

A very simple approach would be:

  1. Hub sends the first notification as a POST with full content, as per current spec.
  2. If the receiver wants patches, it includes an Accept-Patch response header in its reply, listing the patch media-types it understands.
  3. If the hub can send patches using one of those media types, it does that for future notifications (using the HTTP PATCH verb to the same callback URL)

A slightly more sophisticated approach would allow skipping even that first POST:

  1. Before doing a POST with a full content for a very large resource, the hub does a HEAD on the callback URL
  2. If it gets an Accept-Patch header and an ETag header with an ETag for a version it can use for generating its patch, it proceeds as above in step 2.

How's that sound?

Alternatively, one could put the Accept-Patch and ETag information in the subscription, of course. That seems to be cheating a little on webarch, and might conceivably causes problems with some infrastructure.

@azaroth42

This comment has been minimized.

Show comment
Hide comment
@azaroth42

azaroth42 Jan 13, 2017

The future patches are then outside of the specification? Because it seems unlikely that the patch format will be the same content type as the topic URL, which is mandatory according to section 6.

It also doesn't address notification by reference -- If my image changes, and I send an AS2 Updated JSON-LD notification, where the topic is the Image, then I'm not compliant either. For example, the publisher of an image could send:

{
  "type": "as:Updated",
  "object": "http://example.org/path/to/image.jpg"
}

And the subscriber can dereference the image if it cares to. This is prevented as the Topic's content is image/jpeg.

The requirement also ignores content negotiation, another fundamental of the web architecture. If I publish a negotiable resource, which of the media types am I required to send the notifications as?

I maintain that Section 6 is over-specified and prevents use cases other than the most vanilla. And maybe that's sufficient for the WG and v1.0, of course, but unfortunately not for any of our real world use cases, thus preventing us from adopting.

azaroth42 commented Jan 13, 2017

The future patches are then outside of the specification? Because it seems unlikely that the patch format will be the same content type as the topic URL, which is mandatory according to section 6.

It also doesn't address notification by reference -- If my image changes, and I send an AS2 Updated JSON-LD notification, where the topic is the Image, then I'm not compliant either. For example, the publisher of an image could send:

{
  "type": "as:Updated",
  "object": "http://example.org/path/to/image.jpg"
}

And the subscriber can dereference the image if it cares to. This is prevented as the Topic's content is image/jpeg.

The requirement also ignores content negotiation, another fundamental of the web architecture. If I publish a negotiable resource, which of the media types am I required to send the notifications as?

I maintain that Section 6 is over-specified and prevents use cases other than the most vanilla. And maybe that's sufficient for the WG and v1.0, of course, but unfortunately not for any of our real world use cases, thus preventing us from adopting.

@sandhawke

This comment has been minimized.

Show comment
Hide comment
@sandhawke

sandhawke Jan 13, 2017

Contributor

I think there are four topics here:

  1. Fat pings vs thin pings. As I understand it, one of the major advantages of pubsubhubbub has been its use of 'fat pings' (full content notifications vs mere change notificiations). Basic argument is that thin pings lead to Thundering Herd.

  2. Certainly patch formats are outside this spec. They're logically orthogonal. There should be a market for ways to express patches to plain text, and a separate market for ways to express patches to jpegs, and a separate market for ways to express patches to pngs, etc, etc. And all of those formats work for the various uses of the PATCH verb. Websub would just be piggy-backing on that work (although it might turn out to be the main driver).

  3. Re Content-Negotiation: I think the spec should say something here. Specifically, I'd suggest it tell people they SHOULD NOT do con-neg with topics, but rather if they're doing con-neg, have the topics be the Content-Location URLs. At least, that's my quick impression, not knowing what folks have done in practice. @julien51 do people ever serve HTML and Atom, or something, at the same URL and allow subscription to it? @azaroth42 want to raise this as a separate issue?

  4. Re over-specification. It's possible, but isn't it better to have interoperability, rather than have a spec where implementations can't actually work together out of the box? We'd like websub implementations to just work, zero expertise required. Can you tell me an actual use case you care about that can't be reasonably done with websub as specified in the current draft?

Obviously it would be straightforward to add a fat/thin flag to subscriptions and if-thin, the POST would just always be empty. So, I guess this is a question for people who've worked with pubsubhubbub over the years --- why would that be bad? Is it really just concern about Thundering Herd?

Contributor

sandhawke commented Jan 13, 2017

I think there are four topics here:

  1. Fat pings vs thin pings. As I understand it, one of the major advantages of pubsubhubbub has been its use of 'fat pings' (full content notifications vs mere change notificiations). Basic argument is that thin pings lead to Thundering Herd.

  2. Certainly patch formats are outside this spec. They're logically orthogonal. There should be a market for ways to express patches to plain text, and a separate market for ways to express patches to jpegs, and a separate market for ways to express patches to pngs, etc, etc. And all of those formats work for the various uses of the PATCH verb. Websub would just be piggy-backing on that work (although it might turn out to be the main driver).

  3. Re Content-Negotiation: I think the spec should say something here. Specifically, I'd suggest it tell people they SHOULD NOT do con-neg with topics, but rather if they're doing con-neg, have the topics be the Content-Location URLs. At least, that's my quick impression, not knowing what folks have done in practice. @julien51 do people ever serve HTML and Atom, or something, at the same URL and allow subscription to it? @azaroth42 want to raise this as a separate issue?

  4. Re over-specification. It's possible, but isn't it better to have interoperability, rather than have a spec where implementations can't actually work together out of the box? We'd like websub implementations to just work, zero expertise required. Can you tell me an actual use case you care about that can't be reasonably done with websub as specified in the current draft?

Obviously it would be straightforward to add a fat/thin flag to subscriptions and if-thin, the POST would just always be empty. So, I guess this is a question for people who've worked with pubsubhubbub over the years --- why would that be bad? Is it really just concern about Thundering Herd?

@azaroth42

This comment has been minimized.

Show comment
Hide comment
@azaroth42

azaroth42 Jan 13, 2017

I understand the Thundering Herd issue, and it's not a significant concern for my use cases as the (projected) number of subscribers wouldn't be sufficient to take down the system, and nowhere near enough to cause total deadlock. And I agree about patch formats being out of scope.

If you subscribe to the specific resource as the topic, then you would require negotiable resources to send multiple notifications per change. For example, if I update an RDF resource, and it has Turtle, JSON-LD, RDFA and RDF/XML serializations (not unreasonable), I then need to send four notifications rather than one. I can raise it as a separate issue.

And for 4, the scope of the specification seems like it should be subscription per @aaronpk's point in #68. The content distribution requirements are beyond that. Given the MUST requirement to distribute the same content type as the topic, I can't think of a situation when I would ever use websub. It seems to rule out topic resources that are significant in size (say even > 1Mb), it makes subscription to sets of infrequently changing resources impossible (see #68), and it would require many MANY profiles of a thin notification serialization to work in a compliant fashion (e.g. one for JSON, one for XML, one for HTML, one for CSV, one for ...) ... and would be impossible for media types with parameters (application/ld+json;profile=web-annotation) [which I'll raise as yet another issue].

azaroth42 commented Jan 13, 2017

I understand the Thundering Herd issue, and it's not a significant concern for my use cases as the (projected) number of subscribers wouldn't be sufficient to take down the system, and nowhere near enough to cause total deadlock. And I agree about patch formats being out of scope.

If you subscribe to the specific resource as the topic, then you would require negotiable resources to send multiple notifications per change. For example, if I update an RDF resource, and it has Turtle, JSON-LD, RDFA and RDF/XML serializations (not unreasonable), I then need to send four notifications rather than one. I can raise it as a separate issue.

And for 4, the scope of the specification seems like it should be subscription per @aaronpk's point in #68. The content distribution requirements are beyond that. Given the MUST requirement to distribute the same content type as the topic, I can't think of a situation when I would ever use websub. It seems to rule out topic resources that are significant in size (say even > 1Mb), it makes subscription to sets of infrequently changing resources impossible (see #68), and it would require many MANY profiles of a thin notification serialization to work in a compliant fashion (e.g. one for JSON, one for XML, one for HTML, one for CSV, one for ...) ... and would be impossible for media types with parameters (application/ld+json;profile=web-annotation) [which I'll raise as yet another issue].

@sandhawke

This comment has been minimized.

Show comment
Hide comment
@sandhawke

sandhawke Jan 13, 2017

Contributor

Yeah, it's not hard to design other pubsub protocols with other characteristics. (I've done it many time.) This one picks fat pings and one-subscription-one-resource.

Contributor

sandhawke commented Jan 13, 2017

Yeah, it's not hard to design other pubsub protocols with other characteristics. (I've done it many time.) This one picks fat pings and one-subscription-one-resource.

@azaroth42

This comment has been minimized.

Show comment
Hide comment
@azaroth42

azaroth42 Jan 13, 2017

Which is great, of course, but the specification should be clear that's the case.

azaroth42 commented Jan 13, 2017

Which is great, of course, but the specification should be clear that's the case.

@sandhawke

This comment has been minimized.

Show comment
Hide comment
@sandhawke

sandhawke Jan 14, 2017

Contributor

Sounds reasonable to me. In the abstract? Intro?

Contributor

sandhawke commented Jan 14, 2017

Sounds reasonable to me. In the abstract? Intro?

@azaroth42

This comment has been minimized.

Show comment
Hide comment
@azaroth42

azaroth42 Jan 16, 2017

Let me clarify my understanding after thinking about this over the weekend...

If I have an ATOM feed that typically has 20 entries in it, and I use the feed URL as the Topic URL, then when I add a new entry to the feed, I have to distribute the entire representation of the feed with all 20 entries, not just the newly added one?

Beyond that, it would still be legitimate to have a resource that is defined as representing the most recently added entry (/latest) and allowing subscription to that as a Topic resource ... then I can just change it at will, have it point back to the real change, and we're back to thin pings via a layer of indirection.

azaroth42 commented Jan 16, 2017

Let me clarify my understanding after thinking about this over the weekend...

If I have an ATOM feed that typically has 20 entries in it, and I use the feed URL as the Topic URL, then when I add a new entry to the feed, I have to distribute the entire representation of the feed with all 20 entries, not just the newly added one?

Beyond that, it would still be legitimate to have a resource that is defined as representing the most recently added entry (/latest) and allowing subscription to that as a Topic resource ... then I can just change it at will, have it point back to the real change, and we're back to thin pings via a layer of indirection.

@aaronpk

This comment has been minimized.

Show comment
Hide comment
@aaronpk

aaronpk Jan 16, 2017

Member

If I have an ATOM feed that typically has 20 entries in it, and I use the feed URL as the Topic URL, then when I add a new entry to the feed, I have to distribute the entire representation of the feed with all 20 entries, not just the newly added one?

Yes, this is typically how PubSubHubbub implementations have worked. The PubSubHubbub (now WebSub) benefit is that it prevents subscribers from needing to poll the topic URL, getting the contents of the topic URL delivered to subscribers only when it has changed. This is not a generic pubsub mechanism, and like Sandro said, there are plenty of other ways to design pubsub protocols with other characteristics.

Some PubSubHubbub hub implementations went further and implemented a diffing mechanism on the Atom/RSS feed, delivering only the new items to the subscribers. However since subscribers were still expecting a full Atom/RSS feed, the items are still wrapped in the appropriate Atom/RSS feed container rather than delivering just a single <item> or <entry>.

Member

aaronpk commented Jan 16, 2017

If I have an ATOM feed that typically has 20 entries in it, and I use the feed URL as the Topic URL, then when I add a new entry to the feed, I have to distribute the entire representation of the feed with all 20 entries, not just the newly added one?

Yes, this is typically how PubSubHubbub implementations have worked. The PubSubHubbub (now WebSub) benefit is that it prevents subscribers from needing to poll the topic URL, getting the contents of the topic URL delivered to subscribers only when it has changed. This is not a generic pubsub mechanism, and like Sandro said, there are plenty of other ways to design pubsub protocols with other characteristics.

Some PubSubHubbub hub implementations went further and implemented a diffing mechanism on the Atom/RSS feed, delivering only the new items to the subscribers. However since subscribers were still expecting a full Atom/RSS feed, the items are still wrapped in the appropriate Atom/RSS feed container rather than delivering just a single <item> or <entry>.

@julien51

This comment has been minimized.

Show comment
Hide comment
@julien51

julien51 Feb 4, 2017

Collaborator

I think most of the topics in this issue have been "dispatched" in their own issues. So for the sake of clarity I think we should close this one.

I want to add one last item. As you've noted @azaroth42 the thundering problem is mostly a theoretical one... but the fat pings also have a very practical one (and probably more real): piercing through caches. Basically, with light pings and the omnipresence of caches it is not rare that the hub would notify subscribers of a change which is subscribers would not necessarily find about if the hit a cache that the hub did not hit. In this case it is not clear what the subscriber should do.
Fat pings solve this problem very elegantly.

Collaborator

julien51 commented Feb 4, 2017

I think most of the topics in this issue have been "dispatched" in their own issues. So for the sake of clarity I think we should close this one.

I want to add one last item. As you've noted @azaroth42 the thundering problem is mostly a theoretical one... but the fat pings also have a very practical one (and probably more real): piercing through caches. Basically, with light pings and the omnipresence of caches it is not rare that the hub would notify subscribers of a change which is subscribers would not necessarily find about if the hit a cache that the hub did not hit. In this case it is not clear what the subscriber should do.
Fat pings solve this problem very elegantly.

@azaroth42

This comment has been minimized.

Show comment
Hide comment
@azaroth42

azaroth42 Feb 4, 2017

Indeed, as the other issues make this irrelevant to our formerly (PuSH 0.4) compliant use cases, per Sandro we'll have to do our own thing. Interesting, however, that the exceptions for your use cases were added (no need to send already sent ATOM/RSS entries) but not others.

A shame that the WG is not willing to consider other use cases from existing, engaging adopters beyond the indiewebcamp inner circle. Clearly SWWG is just a rubber stamping exercise and I believe reflects poorly on the W3C.

As the group has clearly stated your unwillingness to engage in discussion, I see no reason to keep the issue open.

azaroth42 commented Feb 4, 2017

Indeed, as the other issues make this irrelevant to our formerly (PuSH 0.4) compliant use cases, per Sandro we'll have to do our own thing. Interesting, however, that the exceptions for your use cases were added (no need to send already sent ATOM/RSS entries) but not others.

A shame that the WG is not willing to consider other use cases from existing, engaging adopters beyond the indiewebcamp inner circle. Clearly SWWG is just a rubber stamping exercise and I believe reflects poorly on the W3C.

As the group has clearly stated your unwillingness to engage in discussion, I see no reason to keep the issue open.

@aaronpk

This comment has been minimized.

Show comment
Hide comment
@aaronpk

aaronpk Feb 4, 2017

Member

@azaroth42 are you saying you had an implementation that was PuSH 0.4 compliant that now no longer is compliant with WebSub? The intent of the changes made so far in WebSub was to clarify things, not to make previous implementations no longer compliant.

Member

aaronpk commented Feb 4, 2017

@azaroth42 are you saying you had an implementation that was PuSH 0.4 compliant that now no longer is compliant with WebSub? The intent of the changes made so far in WebSub was to clarify things, not to make previous implementations no longer compliant.

@azaroth42

This comment has been minimized.

Show comment
Hide comment
@azaroth42

azaroth42 Feb 6, 2017

Beyond any particular implementation, an entire class of implementations are invalidated -- those that conform to the ANSI/NISO Z39.99 notifications spec: http://www.openarchives.org/rs/notification/1.0/notification

azaroth42 commented Feb 6, 2017

Beyond any particular implementation, an entire class of implementations are invalidated -- those that conform to the ANSI/NISO Z39.99 notifications spec: http://www.openarchives.org/rs/notification/1.0/notification

@sandhawke

This comment has been minimized.

Show comment
Hide comment
@sandhawke

sandhawke Feb 7, 2017

Contributor

Interesting. Do you know of implementations and users?

Contributor

sandhawke commented Feb 7, 2017

Interesting. Do you know of implementations and users?

@julien51

This comment has been minimized.

Show comment
Hide comment
@julien51

julien51 Apr 4, 2017

Collaborator

Telecon:

[11:09] PROPOSED: close issue #84 since all relevant points have been addressed in separate issues

Adopted:

[11:10] RESOLVED: close issue #84 since all relevant points have been addressed in separate issues

Collaborator

julien51 commented Apr 4, 2017

Telecon:

[11:09] PROPOSED: close issue #84 since all relevant points have been addressed in separate issues

Adopted:

[11:10] RESOLVED: close issue #84 since all relevant points have been addressed in separate issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment