-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Notification versus Content Distribution #84
Comments
@azaroth42 It's clear there are use cases for sending just deltas instead of the full content with each change, but it's not exactly clear the best way to do that. In particular, it seems important to keep the two cases distinct, and not get systems confused about which they're getting or supposed to be sending. If I understand correctly, that's been a bit of a problem in the past. (With an RSS-type feed, in the normal case, things work the same whether you treat the notification as consisting of the full content or just the new items. But then if an item is removed, does that mean it was deleted or is just no longer new? I understand implementations have been inconsistent on this.) I think the proper-webarch solution would be to treat the callback URL as identifying a resource which is intended to mirror the state of the topic resource. As such, it would make sense for the hub to do notifications either with a PUT of the full content or a PATCH for efficient updates. This is not, however, what's currently implemented, and the WG doesn't have time to experiment with this. The nice thing is, this works well as an extension. Everyone can do POST with full content, and folks who know how to send and receive patches can negotiate to do that. A very simple approach would be:
A slightly more sophisticated approach would allow skipping even that first POST:
How's that sound? Alternatively, one could put the Accept-Patch and ETag information in the subscription, of course. That seems to be cheating a little on webarch, and might conceivably causes problems with some infrastructure. |
The future patches are then outside of the specification? Because it seems unlikely that the patch format will be the same content type as the topic URL, which is mandatory according to section 6. It also doesn't address notification by reference -- If my image changes, and I send an AS2 Updated JSON-LD notification, where the topic is the Image, then I'm not compliant either. For example, the publisher of an image could send:
And the subscriber can dereference the image if it cares to. This is prevented as the Topic's content is image/jpeg. The requirement also ignores content negotiation, another fundamental of the web architecture. If I publish a negotiable resource, which of the media types am I required to send the notifications as? I maintain that Section 6 is over-specified and prevents use cases other than the most vanilla. And maybe that's sufficient for the WG and v1.0, of course, but unfortunately not for any of our real world use cases, thus preventing us from adopting. |
I think there are four topics here:
Obviously it would be straightforward to add a fat/thin flag to subscriptions and if-thin, the POST would just always be empty. So, I guess this is a question for people who've worked with pubsubhubbub over the years --- why would that be bad? Is it really just concern about Thundering Herd? |
I understand the Thundering Herd issue, and it's not a significant concern for my use cases as the (projected) number of subscribers wouldn't be sufficient to take down the system, and nowhere near enough to cause total deadlock. And I agree about patch formats being out of scope. If you subscribe to the specific resource as the topic, then you would require negotiable resources to send multiple notifications per change. For example, if I update an RDF resource, and it has Turtle, JSON-LD, RDFA and RDF/XML serializations (not unreasonable), I then need to send four notifications rather than one. I can raise it as a separate issue. And for 4, the scope of the specification seems like it should be subscription per @aaronpk's point in #68. The content distribution requirements are beyond that. Given the MUST requirement to distribute the same content type as the topic, I can't think of a situation when I would ever use websub. It seems to rule out topic resources that are significant in size (say even > 1Mb), it makes subscription to sets of infrequently changing resources impossible (see #68), and it would require many MANY profiles of a thin notification serialization to work in a compliant fashion (e.g. one for JSON, one for XML, one for HTML, one for CSV, one for ...) ... and would be impossible for media types with parameters (application/ld+json;profile=web-annotation) [which I'll raise as yet another issue]. |
Yeah, it's not hard to design other pubsub protocols with other characteristics. (I've done it many time.) This one picks fat pings and one-subscription-one-resource. |
Which is great, of course, but the specification should be clear that's the case. |
Sounds reasonable to me. In the abstract? Intro? |
Let me clarify my understanding after thinking about this over the weekend... If I have an ATOM feed that typically has 20 entries in it, and I use the feed URL as the Topic URL, then when I add a new entry to the feed, I have to distribute the entire representation of the feed with all 20 entries, not just the newly added one? Beyond that, it would still be legitimate to have a resource that is defined as representing the most recently added entry (/latest) and allowing subscription to that as a Topic resource ... then I can just change it at will, have it point back to the real change, and we're back to thin pings via a layer of indirection. |
Yes, this is typically how PubSubHubbub implementations have worked. The PubSubHubbub (now WebSub) benefit is that it prevents subscribers from needing to poll the topic URL, getting the contents of the topic URL delivered to subscribers only when it has changed. This is not a generic pubsub mechanism, and like Sandro said, there are plenty of other ways to design pubsub protocols with other characteristics. Some PubSubHubbub hub implementations went further and implemented a diffing mechanism on the Atom/RSS feed, delivering only the new items to the subscribers. However since subscribers were still expecting a full Atom/RSS feed, the items are still wrapped in the appropriate Atom/RSS feed container rather than delivering just a single |
I think most of the topics in this issue have been "dispatched" in their own issues. So for the sake of clarity I think we should close this one. I want to add one last item. As you've noted @azaroth42 the thundering problem is mostly a theoretical one... but the fat pings also have a very practical one (and probably more real): piercing through caches. Basically, with light pings and the omnipresence of caches it is not rare that the hub would notify subscribers of a change which is subscribers would not necessarily find about if the hit a cache that the hub did not hit. In this case it is not clear what the subscriber should do. |
Indeed, as the other issues make this irrelevant to our formerly (PuSH 0.4) compliant use cases, per Sandro we'll have to do our own thing. Interesting, however, that the exceptions for your use cases were added (no need to send already sent ATOM/RSS entries) but not others. A shame that the WG is not willing to consider other use cases from existing, engaging adopters beyond the indiewebcamp inner circle. Clearly SWWG is just a rubber stamping exercise and I believe reflects poorly on the W3C. As the group has clearly stated your unwillingness to engage in discussion, I see no reason to keep the issue open. |
@azaroth42 are you saying you had an implementation that was PuSH 0.4 compliant that now no longer is compliant with WebSub? The intent of the changes made so far in WebSub was to clarify things, not to make previous implementations no longer compliant. |
Beyond any particular implementation, an entire class of implementations are invalidated -- those that conform to the ANSI/NISO Z39.99 notifications spec: http://www.openarchives.org/rs/notification/1.0/notification |
Interesting. Do you know of implementations and users? |
Section 6 says:
This makes it impossible to have notifications about the change to the topic resource distributed. For example, if I have a gigapixel Image, and I want to say that I modified it, I MUST send an image to the subscriber from the hub, and SHOULD send the entire multi-gigabyte TIFF. I'm pretty sure the subscriber does not want my TIFF pushed down their throat, just to know that I changed it.
This also breaks the direction being discussion in #68. If I have a collection of images, and the topic URL is an HTML page, then I MUST send an HTML notification.
I propose to drop this sentence.
The text was updated successfully, but these errors were encountered: