Change Notification versus Content Distribution #84

azaroth42 · 2017-01-13T18:16:25Z

Section 6 says:

This request MUST have a Content-Type Header corresponding to the Content-Type of the topic, and SHOULD contain the full contents of the topic URL.

This makes it impossible to have notifications about the change to the topic resource distributed. For example, if I have a gigapixel Image, and I want to say that I modified it, I MUST send an image to the subscriber from the hub, and SHOULD send the entire multi-gigabyte TIFF. I'm pretty sure the subscriber does not want my TIFF pushed down their throat, just to know that I changed it.

This also breaks the direction being discussion in #68. If I have a collection of images, and the topic URL is an HTML page, then I MUST send an HTML notification.

I propose to drop this sentence.

The text was updated successfully, but these errors were encountered:

sandhawke · 2017-01-13T19:29:13Z

@azaroth42 It's clear there are use cases for sending just deltas instead of the full content with each change, but it's not exactly clear the best way to do that. In particular, it seems important to keep the two cases distinct, and not get systems confused about which they're getting or supposed to be sending. If I understand correctly, that's been a bit of a problem in the past. (With an RSS-type feed, in the normal case, things work the same whether you treat the notification as consisting of the full content or just the new items. But then if an item is removed, does that mean it was deleted or is just no longer new? I understand implementations have been inconsistent on this.)

I think the proper-webarch solution would be to treat the callback URL as identifying a resource which is intended to mirror the state of the topic resource. As such, it would make sense for the hub to do notifications either with a PUT of the full content or a PATCH for efficient updates.

This is not, however, what's currently implemented, and the WG doesn't have time to experiment with this.

The nice thing is, this works well as an extension. Everyone can do POST with full content, and folks who know how to send and receive patches can negotiate to do that.

A very simple approach would be:

Hub sends the first notification as a POST with full content, as per current spec.
If the receiver wants patches, it includes an Accept-Patch response header in its reply, listing the patch media-types it understands.
If the hub can send patches using one of those media types, it does that for future notifications (using the HTTP PATCH verb to the same callback URL)

A slightly more sophisticated approach would allow skipping even that first POST:

Before doing a POST with a full content for a very large resource, the hub does a HEAD on the callback URL
If it gets an Accept-Patch header and an ETag header with an ETag for a version it can use for generating its patch, it proceeds as above in step 2.

How's that sound?

Alternatively, one could put the Accept-Patch and ETag information in the subscription, of course. That seems to be cheating a little on webarch, and might conceivably causes problems with some infrastructure.

azaroth42 · 2017-01-13T19:41:04Z

The future patches are then outside of the specification? Because it seems unlikely that the patch format will be the same content type as the topic URL, which is mandatory according to section 6.

It also doesn't address notification by reference -- If my image changes, and I send an AS2 Updated JSON-LD notification, where the topic is the Image, then I'm not compliant either. For example, the publisher of an image could send:

{
  "type": "as:Updated",
  "object": "http://example.org/path/to/image.jpg"
}

And the subscriber can dereference the image if it cares to. This is prevented as the Topic's content is image/jpeg.

The requirement also ignores content negotiation, another fundamental of the web architecture. If I publish a negotiable resource, which of the media types am I required to send the notifications as?

I maintain that Section 6 is over-specified and prevents use cases other than the most vanilla. And maybe that's sufficient for the WG and v1.0, of course, but unfortunately not for any of our real world use cases, thus preventing us from adopting.

sandhawke · 2017-01-13T21:09:44Z

I think there are four topics here:

Fat pings vs thin pings. As I understand it, one of the major advantages of pubsubhubbub has been its use of 'fat pings' (full content notifications vs mere change notificiations). Basic argument is that thin pings lead to Thundering Herd.
Certainly patch formats are outside this spec. They're logically orthogonal. There should be a market for ways to express patches to plain text, and a separate market for ways to express patches to jpegs, and a separate market for ways to express patches to pngs, etc, etc. And all of those formats work for the various uses of the PATCH verb. Websub would just be piggy-backing on that work (although it might turn out to be the main driver).
Re Content-Negotiation: I think the spec should say something here. Specifically, I'd suggest it tell people they SHOULD NOT do con-neg with topics, but rather if they're doing con-neg, have the topics be the Content-Location URLs. At least, that's my quick impression, not knowing what folks have done in practice. @julien51 do people ever serve HTML and Atom, or something, at the same URL and allow subscription to it? @azaroth42 want to raise this as a separate issue?
Re over-specification. It's possible, but isn't it better to have interoperability, rather than have a spec where implementations can't actually work together out of the box? We'd like websub implementations to just work, zero expertise required. Can you tell me an actual use case you care about that can't be reasonably done with websub as specified in the current draft?

Obviously it would be straightforward to add a fat/thin flag to subscriptions and if-thin, the POST would just always be empty. So, I guess this is a question for people who've worked with pubsubhubbub over the years --- why would that be bad? Is it really just concern about Thundering Herd?

azaroth42 · 2017-01-13T21:27:20Z

I understand the Thundering Herd issue, and it's not a significant concern for my use cases as the (projected) number of subscribers wouldn't be sufficient to take down the system, and nowhere near enough to cause total deadlock. And I agree about patch formats being out of scope.

If you subscribe to the specific resource as the topic, then you would require negotiable resources to send multiple notifications per change. For example, if I update an RDF resource, and it has Turtle, JSON-LD, RDFA and RDF/XML serializations (not unreasonable), I then need to send four notifications rather than one. I can raise it as a separate issue.

And for 4, the scope of the specification seems like it should be subscription per @aaronpk's point in #68. The content distribution requirements are beyond that. Given the MUST requirement to distribute the same content type as the topic, I can't think of a situation when I would ever use websub. It seems to rule out topic resources that are significant in size (say even > 1Mb), it makes subscription to sets of infrequently changing resources impossible (see #68), and it would require many MANY profiles of a thin notification serialization to work in a compliant fashion (e.g. one for JSON, one for XML, one for HTML, one for CSV, one for ...) ... and would be impossible for media types with parameters (application/ld+json;profile=web-annotation) [which I'll raise as yet another issue].

sandhawke · 2017-01-13T23:29:27Z

Yeah, it's not hard to design other pubsub protocols with other characteristics. (I've done it many time.) This one picks fat pings and one-subscription-one-resource.

azaroth42 · 2017-01-13T23:57:35Z

Which is great, of course, but the specification should be clear that's the case.

sandhawke · 2017-01-14T00:04:57Z

Sounds reasonable to me. In the abstract? Intro?

azaroth42 · 2017-01-16T19:07:04Z

Let me clarify my understanding after thinking about this over the weekend...

If I have an ATOM feed that typically has 20 entries in it, and I use the feed URL as the Topic URL, then when I add a new entry to the feed, I have to distribute the entire representation of the feed with all 20 entries, not just the newly added one?

Beyond that, it would still be legitimate to have a resource that is defined as representing the most recently added entry (/latest) and allowing subscription to that as a Topic resource ... then I can just change it at will, have it point back to the real change, and we're back to thin pings via a layer of indirection.

aaronpk · 2017-01-16T19:16:55Z

If I have an ATOM feed that typically has 20 entries in it, and I use the feed URL as the Topic URL, then when I add a new entry to the feed, I have to distribute the entire representation of the feed with all 20 entries, not just the newly added one?

Yes, this is typically how PubSubHubbub implementations have worked. The PubSubHubbub (now WebSub) benefit is that it prevents subscribers from needing to poll the topic URL, getting the contents of the topic URL delivered to subscribers only when it has changed. This is not a generic pubsub mechanism, and like Sandro said, there are plenty of other ways to design pubsub protocols with other characteristics.

Some PubSubHubbub hub implementations went further and implemented a diffing mechanism on the Atom/RSS feed, delivering only the new items to the subscribers. However since subscribers were still expecting a full Atom/RSS feed, the items are still wrapped in the appropriate Atom/RSS feed container rather than delivering just a single <item> or <entry>.

julien51 · 2017-02-04T03:21:32Z

I think most of the topics in this issue have been "dispatched" in their own issues. So for the sake of clarity I think we should close this one.

I want to add one last item. As you've noted @azaroth42 the thundering problem is mostly a theoretical one... but the fat pings also have a very practical one (and probably more real): piercing through caches. Basically, with light pings and the omnipresence of caches it is not rare that the hub would notify subscribers of a change which is subscribers would not necessarily find about if the hit a cache that the hub did not hit. In this case it is not clear what the subscriber should do.
Fat pings solve this problem very elegantly.

azaroth42 · 2017-02-04T23:02:23Z

Indeed, as the other issues make this irrelevant to our formerly (PuSH 0.4) compliant use cases, per Sandro we'll have to do our own thing. Interesting, however, that the exceptions for your use cases were added (no need to send already sent ATOM/RSS entries) but not others.

A shame that the WG is not willing to consider other use cases from existing, engaging adopters beyond the indiewebcamp inner circle. Clearly SWWG is just a rubber stamping exercise and I believe reflects poorly on the W3C.

As the group has clearly stated your unwillingness to engage in discussion, I see no reason to keep the issue open.

aaronpk · 2017-02-04T23:19:37Z

@azaroth42 are you saying you had an implementation that was PuSH 0.4 compliant that now no longer is compliant with WebSub? The intent of the changes made so far in WebSub was to clarify things, not to make previous implementations no longer compliant.

azaroth42 · 2017-02-06T23:53:05Z

Beyond any particular implementation, an entire class of implementations are invalidated -- those that conform to the ANSI/NISO Z39.99 notifications spec: http://www.openarchives.org/rs/notification/1.0/notification

sandhawke · 2017-02-07T05:07:59Z

Interesting. Do you know of implementations and users?

julien51 · 2017-04-04T15:10:38Z

Telecon:

[11:09] PROPOSED: close issue #84 since all relevant points have been addressed in separate issues

Adopted:

[11:10] RESOLVED: close issue #84 since all relevant points have been addressed in separate issues

azaroth42 mentioned this issue Jan 13, 2017

Content-Type and Schema distinction #87

Closed

sandhawke mentioned this issue Jan 13, 2017

Content-Type requirement and Content Negotiation #86

Closed

julien51 added the Waiting for Commenter label Feb 4, 2017

julien51 assigned azaroth42 Mar 31, 2017

julien51 closed this as completed Apr 4, 2017

rhiaro added Commenter Satisfied and removed Waiting for Commenter labels Apr 4, 2017

ltankey mentioned this issue Sep 17, 2019

Content Distribution topic clarification #156

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Notification versus Content Distribution #84

Change Notification versus Content Distribution #84

azaroth42 commented Jan 13, 2017

sandhawke commented Jan 13, 2017

azaroth42 commented Jan 13, 2017

sandhawke commented Jan 13, 2017 •

edited

Loading

azaroth42 commented Jan 13, 2017

sandhawke commented Jan 13, 2017

azaroth42 commented Jan 13, 2017

sandhawke commented Jan 14, 2017

azaroth42 commented Jan 16, 2017

aaronpk commented Jan 16, 2017

julien51 commented Feb 4, 2017

azaroth42 commented Feb 4, 2017

aaronpk commented Feb 4, 2017

azaroth42 commented Feb 6, 2017

sandhawke commented Feb 7, 2017

julien51 commented Apr 4, 2017 •

edited

Loading

Change Notification versus Content Distribution #84

Change Notification versus Content Distribution #84

Comments

azaroth42 commented Jan 13, 2017

sandhawke commented Jan 13, 2017

azaroth42 commented Jan 13, 2017

sandhawke commented Jan 13, 2017 • edited Loading

azaroth42 commented Jan 13, 2017

sandhawke commented Jan 13, 2017

azaroth42 commented Jan 13, 2017

sandhawke commented Jan 14, 2017

azaroth42 commented Jan 16, 2017

aaronpk commented Jan 16, 2017

julien51 commented Feb 4, 2017

azaroth42 commented Feb 4, 2017

aaronpk commented Feb 4, 2017

azaroth42 commented Feb 6, 2017

sandhawke commented Feb 7, 2017

julien51 commented Apr 4, 2017 • edited Loading

sandhawke commented Jan 13, 2017 •

edited

Loading

julien51 commented Apr 4, 2017 •

edited

Loading