Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supporting ldn requires supporing http notifications #27

Closed
melvincarvalho opened this issue Aug 26, 2016 · 45 comments
Closed

supporting ldn requires supporing http notifications #27

melvincarvalho opened this issue Aug 26, 2016 · 45 comments

Comments

@melvincarvalho
Copy link
Contributor

I think it's an unreasonably high burden to ask those people wanting to support linked data notifications to also support link header notifications.

The complexity of client side apps spiral when both headers and content must be stored, analyzed and acted upon.

I would suggest splitting the spec into two sections, or maybe two specs

  • HTTP Notifications
  • Linked Data Notifications

And not closely couple them, but rather have modular units that can be taken on their own merits or both together.

@rhiaro
Copy link
Member

rhiaro commented Aug 26, 2016

What the spec requires is that senders and consumers have the ability to check a Link header on a resource to see if there's a rel="http://www.w3.org/ns/ldp#inbox" relation, and use the value of that as the inbox.

(I assume that's what you mean by "link header notifications" or "http notifications" although that way of calling it seems oddly misleading, as if there's a whole different process if you discover the inbox in the link header.. there is not. Once you've discovered, everything from then on is the same. You still make a POST request to it with a Linked Data/RDF body. So I don't understand how this could possibly be two specs, either).

Senders and consumers can check the body of the resource first, and if they find an inbox there, they do not need to check the Link header.

This allows publishers of resources to point to inboxes either using a Link header, or in the body of the resource. They may not do both, so the problem of finding two and not knowing which to use doesn't arise.

We added link header discovery so that people who are not able or willing to modify their data directly can still advertise an inbox.

Senders and consumers can check the body and link header in either order, so those that prefer the body or are parsing the body anyway can check that first. Those that are only interested in the inbox and not the rest the resource then are not required to parse the body (which could be a very large document) if they can do a quick HEAD on the resource first and find an inbox there. Of course in both cases clients have to be prepared to fall back to their least preferred option, but having it this way does allow clients to optimise for their needs.

In my experience having this process:

  1. Check body
  2. if not found, check header.

(or vice versa, which is what I do in Errol)

Is not complex. I don't see why I would need to "store" or "analyze" either headers or content to do this.

Ultimately our goal with these two discovery mechanisms is to allow more people to advertise inboxes. I feel the slightly higher bar for senders and consumers in discovery is well worth this.

@rhiaro
Copy link
Member

rhiaro commented Aug 26, 2016

Just to emphasise, this means that as well as RDF resources which cannot be modified, resources which are not RDF sources can also have inboxes.

@cwebber
Copy link

cwebber commented Aug 27, 2016

Checking the headers in an http request doesn't seem complex to me. Every web-api using person in the world is using an http library, and what http library can't check headers? It seems simple.

@melvincarvalho
Copy link
Contributor Author

@cwebber could you elaborate on your use case? Are you implementing LDN with client side apps, or implementing it on the server, or implementing something else?

In any of those cases, do you agree with my point that a discovery link header should not constrain the content?

@rhiaro
Copy link
Member

rhiaro commented Aug 27, 2016

@melvincarvalho We discussed and I thought resolved the "not constrain the content" case in #28 (I agreed the wording was poor and we can correct this). The link header is not intended to "constrain the content". It is there to provide an option to point to an inbox when it is not possible to do this in the data directly.

@melvincarvalho
Copy link
Contributor Author

@rhiaro oh I didnt realise that you thought this was resolved.

My understanding was that your suggestion was to allow both a link header and a link in the content, and change the text to a more informative style.

My proposal was to just go with one option, namely, discovery via follow your nose in the document. And perhaps move informative and other text regarding link headers to Social Web Protocols, or another doc.

Seems like not 100% the same thing.

@melvincarvalho
Copy link
Contributor Author

melvincarvalho commented Aug 27, 2016

Here's how I would change things:

To discover the Inbox URL, senders and consumers MUST:

fetch the target URL (and follow redirects) and use the object URI(s) of the predicate http://www.w3.org/ns/ldp#inbox in RDF Sources [LDP-RS]

Keep this section ^^

This section:

To discover the Inbox URL, senders and consumers MUST:

make a HEAD request on the target URL, to check for an HTTP Link header with a rel value of http://www.w3.org/ns/ldp#inbox

Either

  1. remove,
  2. move to the inboxes section of SWP ( https://www.w3.org/TR/social-web-protocols/#inbox-interop )
  3. change the MUST to a MAY.

That way implementors have the choice.

EDIT: fixed cut and paste error

@rhiaro
Copy link
Member

rhiaro commented Aug 27, 2016

There are three different issues across these two threads, please don't mix them up.

  1. The implication that the header somehow constrains the content (Headers constrain the content #28)
    • Resolvable through more informative wording to show that the header is not in fact intended to constrain the content.
  2. What to do if you discover more than one inbox (also in Headers constrain the content #28)
    • Currently the spec says pick one.
    • We could change to support multiple, with which to choose being at discretion of sender/consumer.
    • Changing anything about link headers does not solve this, as you can still advertise multiple inboxes in the body, so it's orthogonal to the link header discussion.
  3. Link header discovery is too complex (this issue)
    • I ack that it's less convenient for your use case, but feel this inconvenience is outweighed by the broader set of resources which are permitted to have inboxes if they can use link headers to advertise.

@rhiaro
Copy link
Member

rhiaro commented Aug 27, 2016

Recap of reasons the link header discovery is necessary:

(from #28):

Would you mind capturing the argument in this issue, or repeating it if it was said before? Apologies if I didnt quite get it from your first reply.

I am wondering what prevents you from putting an inbox link in your content?

Here are several independent reasons:

  • Much of my content is not RDF resources. For example, images.
  • I am only able to generate HTML pages. Even if I add RDFa with the inbox triple, I can't be sure senders/consumers will find it, as the spec only requires them to support JSON-LD, which is not an option for me due to limitations of the system generating content.
  • Some of my content cannot be modified because I do not have the authority to add data. However, I am able to administer the server, so that I can use link headers advertise inboxes for resources despite the fact I am unable to modify the data itself.
  • I have thousands and thousands of resources. It is inconvenient for me to generate and return thousands and thousands of inbox triples.

None of these are reasons I shouldn't be allowed to use LDN to receive notifications about my resources.

@melvincarvalho
Copy link
Contributor Author

@rhiaro I think you have the incorrect issue in #27 (comment) part 2

When you say 'pick one'. Pick one of what?

In the scenario where someone adds multiple predicates in their data, it's pretty clear that the consuming software wont know what to do, so it would generally be a bad idea, unless there were special circumstances like an out of band negotiation on what action to take.

When you say 'pick one' of the meta data (headers) and content data (triples) this becomes a major architectural problem because you re mixing content and transport, for convenience, some of which is because you dont have full access your web server.

If there is a need for discovery to happen at the HTTP level, maybe for a group of people that have less capable servers, then make some a spec for that (the images use case may be a good reason to do that, perhaps). But please dont force it on others as a MUST. Just have it as an option. Please, dont closely couple things that can be modular.

@melvincarvalho
Copy link
Contributor Author

melvincarvalho commented Aug 27, 2016

An argument I can see right now for an inbox at the http level is when you get an HTTP 401/403 back and wish to request access. If this pattern takes off I could see an inbox link header being really useful. But im unsure anyone has proposed or implemented something like that right now. What's clear is that the content level inboxes have to be supported. Simply make header level inboxes an option, Id say.

@melvincarvalho
Copy link
Contributor Author

@rhiaro

I have thousands and thousands of resources. It is inconvenient for me to generate and return thousands and thousands of inbox triples.

I'd like to understand this issue better. Taking one of your posts at random

http://rhiaro.co.uk/2016/06/stuff-shared

This contains tags, a twitter button, a date, and a whole bunch of other links. Why would it be a show stopper for you to add one more item?

@rhiaro
Copy link
Member

rhiaro commented Aug 27, 2016

Please don't force me, with my tired and sluggish PHP rdf parser, to parse every document I want to send to before I can find it's inbox!! I can do that as a backup, but if I have the opportunity to avoid it by starting with a HEAD, I'm certainly going to take it.

Please acknowledge that the inconveniences of using the body are just as valid as the inconveniences of using the link header. Neither is objectively better. Both are useful in different cases.

The spec is already a compromise that tries to treat both cases fairly for the sender/consumer. They can check both options in any order they prefer, in order to optimise for their particular case. So you can optimise for finding in the content, and I can optimise for finding in the header.

then make some a spec for that

But then we wouldn't have interop at all. We might as well not bother with any specs. I don't think link headers are so bad that we need to throw this whole effort out of the window.

The point is, it doesn't matter what your content is or how you publish it, you should be able to receive notifications about it. So we offer two possibilities for how to advertise your inbox.

Otherwise what you are saying is that only mutable RDF documents are candidates as notification targets. This is simply not practical, and we do not need to apply this constraint. This not the specification we're trying to write.

An argument I can see right now for an inbox at the http level is when you get an HTTP 401/403 back and wish to request access. If this pattern takes off I could see an inbox link header being really useful. But im unsure anyone has proposed or implemented something like that right now. What's clear is that the content level inboxes have to be supported. Simply make header level inboxes an option, Id say.

I think what you mean is that this would enable you to attempt to send a notification to an inbox, where you don't have access to the target resource directly. I would imagine you'd then also be denied access to the inbox, but perhaps not if the publisher is using a link header there - so this seems like a good use case in favour of link header discovery.

This contains tags, a twitter button, a date, and a whole bunch of other links. Why would it be a show stopper for you to add one more item?

I made that example on behalf of Bart (see https://gitter.im/csarven/ldn?at=57c062a1d872312a1e7ec1bc), and other managers of large datasets, rather than being about my personal blog.

But, if we are talking about my blog, I want to be able to set the same inbox for all posts by editing one line in a .htaccess file. If I need to change the inbox (which I've done at least twice whilst experimenting), I can make that update without the need to run two (insert and delete) big SPARQL queries each time. For me, that is too high a bar. If I had a Solid filesystem backend, rather than a triplestore, I woudn't even be able to do this with a SPARQL query, but would have to write a script to crawl and PATCH every resource on the server!

@melvincarvalho
Copy link
Contributor Author

melvincarvalho commented Aug 27, 2016

I woudn't even be able to do this with a SPARQL query, but would have to write a script to crawl and PATCH every resource on the server!

Which is a good exercise which should also be 1 line.

for i in $( find . -name '*.ttl' ) ; do curl -X PATCH -H "Content-Type: application/sparql" -d "INSERT DATA { <> solid:inbox < INBOX > . } "$i" ; done

Isnt so hard, is it?

@melvincarvalho
Copy link
Contributor Author

But then we wouldn't have interop at all.

We would have interop. All the document level follow your nose and linked data would work perfectly fine and has huge interop already (including facebook).

The header link stuff would be up to those that implemented it. And if it catches on, people will take it up.

@rhiaro
Copy link
Member

rhiaro commented Aug 27, 2016

Isnt so hard, is it?

..neither is checking link headers. rdflib.js already handles that.

We would have interop. All the document level follow your nose and linked data would work perfectly fine and has huge interop already (including facebook).

And indeed all of the document level FYN stuff will work just fine if other people are also using link headers for discovery. We're in no way getting in the way of that.

I'm generally an optimist, but I don't expect facebook to be implementing LDN. And I think if they did decide to implement it, needing to check link headers would not stop them in their tracks.

In terms of larger-than-one-individual implementations, I'm more hopeful about organisations which publish datasets (academic, statistical, government, public benefit, heritage) using this as a mechanism to track people's interaction with the datasets. And I'll say again, people we have spoken to with this use case are not able to add triples to their data. We'd lose a lot if we enable senders and consumers to fail to do discovery on any of this data from the outset.

@melvincarvalho
Copy link
Contributor Author

I think we may have crossed wires

'But then we wouldn't have interop at all.'

So this isnt quite accurate, and lead me to reply

We would have interop. All the document level follow your nose and linked data would work perfectly fine and has huge interop already (including facebook).

What that means is:

Facebook already implement linked data. So do google. So do countless of sites. They may not implement LDN but they implement LD, hence interop. All of this massive interop occurs at the document level. So to say

We have a giant global graph of data that is growing. LDN just gives one more tool for those that want to use it. I hope that's clearer.

@rhiaro
Copy link
Member

rhiaro commented Aug 27, 2016

The possibility of (though not magically automatic) interop that you're proposing for sites that already use linked data is great. That's not prevented by asking senders and consumers to check link headers as a back up. It really isn't.

What it does allow is other sites who aren't publishing linked data to still accept linked data notifications. Thus "growing the giant global graph of data" even more. Which removing the link header support would prevent.

What you propose excludes more potential implementers than it enables.

@melvincarvalho
Copy link
Contributor Author

@rhiaro

But then we wouldn't have interop at all.

This is not accurate

Interop would work with the existing discovery mechanism of linked data, namely follow your nose over documents.

I gave several options. What I have argued is dont make me implement a discovery method I dont want to when the one I have is working fine.

It's the web axioms of modularity and loose coupling.

Here's one of my favourite quotes from Tim

The way the Web spread was a piece at a time. So you could take html without taking http. So the failure of NEXT was a lesson, don’t try to sell it all at one time. Sell each piece on its own merits. Never insist that everybody take all. They will take all the pieces once they see how it fits together.

@melvincarvalho
Copy link
Contributor Author

@rhiaro

Please don't force me, with my tired and sluggish PHP rdf parser, to parse every document I want to send to before I can find it's inbox!! I can do that as a backup, but if I have the opportunity to avoid it by starting with a HEAD, I'm certainly going to take it.

On this. I am hearing that your tools dont work very well. The solution here is improve your tooling. People whose tools are working well should not be penalized by those whose dont.

@rhiaro
Copy link
Member

rhiaro commented Aug 28, 2016

On this. I am hearing that your tools dont work very well. The solution here is improve your tooling. People whose tools are working well should not be penalized by those whose dont.

Thanks for pointing this out. I believe it also applies in your original argument that parsing link headers adds too much complexity.

@melvincarvalho
Copy link
Contributor Author

@rhiaro Then, I think we agree. The two methods should not be closely coupled.

@rhiaro
Copy link
Member

rhiaro commented Aug 28, 2016

I believe we should support both, so that developers with either set of tooling are able to interoperate, rather than create two completely separate sets of implementations which can't talk to each other.

I firmly believe that the bar to supporting both is not as high as you suggest, and the payoff is much greater.

@rhiaro
Copy link
Member

rhiaro commented Aug 28, 2016

If my sender only supports discovery with link headers, and you advertise your inbox with the body, I'll never be able to send you notifications. If your implementation only supports discovery with the body and I advertise my inbox in the headers, you'll never be able to send me notifications.

That is not interoperability.

That is a completely unnecessary point of divergence.

@BigBlueHat
Copy link
Member

Following all this from afar. 😄

Can resources that do not return RDF be the target of the a notification?

If, for instance, an image can be the target of a notification, then this spec MUST support Link header-based discovery, as there'd not be a way to discovery it from the document itself.

Additionally, it's much easier for a server to setup (even via a proxy) a Link-header based discovery expression than changing all the documents themselves (which I also find a bit polluting).

I do understand the points about wanting to extract an inbox IRI directly from the content, so supporting that seems prudent.

I think requiring a HEAD request specifically, though, may be part of the issue--as one could also pull the Link header off a GET request or even a PUT or POST response (i.e. user POSTs blog post, and 201 created response includes a Link header referencing the newly assigned inbox URL).

It seems prudent that client implementations MUST (given the existence of non-RDF resources) start with a Link header check, and barring that (and knowing/assuming the content is RDF) attempt to extract the appropriate URL from the returned representation.

Removing the Link header discovery option (or even making it the secondary option of the two) would potentially require heavy round-trips on large documents just to find out they're not RDF in the first place.

@csarven
Copy link
Member

csarven commented Aug 29, 2016

Can resources that do not return RDF be the target of the a notification?

Yes. Any URI can have its own inbox. The current section on discovery says:

An Inbox, the endpoint which notifications are sent to or consumed from, can be discovered from any resource

We can improve that wording. We should also mention that in the Introduction (I'll make a commit for this). It has always been about any URI can have an inbox property (domain is omitted or just assumed to be rdfs:Resource/owl:Thing).

@melvincarvalho
Copy link
Contributor Author

@rhiaro

neither is checking link headers [hard]. rdflib.js already handles that.

You may have underestimated how easy you think this is.

Could you explain how rdflib handles that? Specifically.

How does rdflib store the (quad) data associated with headers.

How does that compare with how rdflib stores the (quad) data associated with document level discovery.

What happens if both the link header and the content inbox are set?

@melvincarvalho
Copy link
Contributor Author

@rhiaro you have not fully addressed my proposal, I will repeat it for convenience. Do you appreciate that I set forth a number of resolutions, and that if you dont like one of them you can look at others, or propose something more?

Either

  1. remove,
  2. move to the inboxes section of SWP ( https://www.w3.org/TR/social-web-protocols/#inbox-interop )
  3. change the MUST to a MAY.

That way implementors have the choice.

What I am reading is that you are against (1). Please could you comment on (2) and (3).

@BigBlueHat
Copy link
Member

@melvincarvalho none of those options address the needs for inbox discovery on non-RDF resources. How would you propose that be handled?

@melvincarvalho
Copy link
Contributor Author

@BigBlueHat why cant this use case be addressed in SWP? Did you look at the link that I posted : https://www.w3.org/TR/social-web-protocols/#inbox-interop

@semanticfire
Copy link

@BigBlueHat I'm totally with you!

Although I generate tons of RDF resources for realtime incident processing, I don't want to bloat them with extra tripples, which are totally irrelevant to the context of the data itself

@BigBlueHat
Copy link
Member

@melvincarvalho this use case of notifications on any resource type seems core to what LDN is about and not just an optional thing added by clients supporting more than one social protocol.

LDN is for all resource types. Linked Data is about more than RDF. Link headers are the original webby triple expression. 😺

@melvincarvalho
Copy link
Contributor Author

@semanticfire this is a data modeling issue. I presume incidents do not exist in a vacuum, so you can tie an inbox to container of those incidents in your data model, or indeed to the domain. The triples are going to come whether you put them in the header or in the body, so actually, yes, you are generating triples for not just every single incident by using link header, but for every single resource, images, css, javascript. And you've completely missed the point of this issue. What you put in your headers is entirely up to you.

But it's also reasonable for me to use inboxes on the web of data. I should have a predicate to do this and get on with my work without being forced to model out of band information.

There are two separate use cases that are being conflated.

  1. Inboxes in the web of data which we currently do with solid : inbox, and may be worth standarizing to ldp : inbox
  2. Site wide meta data, which is a topic in itself, addressed by the IETF and W3C in various places. They are separate concerns and should be specified as such.

@semanticfire
Copy link

@melvincarvalho to me its more about being pragmatic.
every incident while running has its own inbox to get updates.

headers are in the core of the web of data as well, and certainly not difficult to program.

@melvincarvalho
Copy link
Contributor Author

We can improve that wording. We should also mention that in the Introduction (I'll make a commit for this). It has always been about any URI can have an inbox property (domain is omitted or just assumed to be rdfs:Resource/owl:Thing).

@csarven this sounds like a sensible approach. We really have two distinct cases. A clear granular per resource inbox as part of the web of data and giant global graph.

There is a second use case for site wide meta data, or protocol level messages sent by a server about where an inbox is. I think it's a good idea to reflect these two separate concepts in the text.

@melvincarvalho
Copy link
Contributor Author

I think with @semanticfire potentially using this technology for real world fire fighting and I am using inboxes for financial transactions we need to ensure messages dont get sent to the wrong inbox.

@BigBlueHat the issue with your approach is that the author of content has no way of over riding the site wide settings. In fact, even with the "content first" approach the author of the content has no way of turning off inboxes for their content.

What is needed is a mechanism for site wide inboxes and one for content based inboxes in a way that is non conflicting.

@csarven
Copy link
Member

csarven commented Aug 30, 2016

I hope I can summarise a little. It'd be great to acknowledge these points:

  • Any resource (URI) can have an Inbox.
  • Some receivers/publishers need to use the headers because it is their only way to announce an inbox - (re: Web Architecture: URI ownership) People announce HTTP headers as well as the resource body in a way that they see as appropriate. However that's organised (the publishing pipeline so to speak) is not something we can dictate, but rather acknowledge, i.e., the diversity in which parties get to write the headers and body.
  • Publisher don't have to use HTTP headers to announce the Inbox. Senders/consumers MUST check the HTTP headers (whether as a fallback after checking the body first, or in preference, so there is at least some flexibility for senders/consumers to optimise).

From those 3 points above, I think that it is not only required, but reasonable to support HTTP headers possibility for senders/consumers. The intention of LDN is to support these points. Making the Inbox discovery through Link headers a MAY for senders/consumers, or putting it in Social Web Protocols, has the same effect as removing it altogether, thereby ignoring the use cases which are brought up by @semanticfire @BigBlueHat @rhiaro

I would like to suggest that we do not make any changes to the spec.

@melvincarvalho
Copy link
Contributor Author

@csarven

Making the Inbox discovery through Link headers a MAY for senders/consumers, or putting it in Social Web Protocols, has the same effect as removing it altogether

Could you say why you think this is the case?

@csarven
Copy link
Member

csarven commented Aug 30, 2016

If it becomes optional (MAY), then the publishers can't rely on their Inboxes being discovered. Thus, the behaviour is unpredictable. Thus some of the "conforming" implementations will not interoperate.

@melvincarvalho
Copy link
Contributor Author

I dont think making the text optional is quite the same thing as removing it completely. What I am hearing is that if it's optional you think it has the same effect as removing it, ie that no one will do it. Well that's a bad sign for starters! I dont mind it being removed, either, but I wanted to lay out possible options, which I think you are possibly conflating.

So, let's say we do encourage people to use site wide headers. And it seems from this thread at least people will do it as a first resort, rather than a last, because it's convenient and saves coding, which strikes me as dangerous.

How does someone authoring content turn off the site wide inbox? Or will that be essentially impossible?

@csarven
Copy link
Member

csarven commented Aug 30, 2016

I didn't mean that by making it optional nobody would do it. I meant that by making it optional some senders/consumers would choose not to implement HTTP Link header discovery, and therefore any publishers which use Link headers to advertise their inboxes would be unable to be sent to / consumed from by those. It is the responsibility of the spec to ensure all senders/consumers to be able to interact with all publishers, not just some.

it seems from this thread at least people will do it as a first resort

We've already established that for some publishers this is the only option. Of course not for all publishers though - we can't extrapolate from this thread that everyone or even most people want to advertise inboxes this way, and nor are we requiring anyone to do it. The requirement is only on senders/consumers to look for it in the header so that publishers who must advertise with headers can rely on being part of the ecosystem. In the same way, those who can only publish in the content can also rely on senders/consumers to find the inbox through their preferred mechanism.

How does someone authoring content turn off the site wide inbox? Or will that be essentially impossible?

This is an excellent point to discuss further in the Security, Privacy and Content Considerations section. This is a general consideration for URI ownership across the whole Web, and we can't resolve it in this spec, only bring more awareness.

@melvincarvalho
Copy link
Contributor Author

we can't resolve it in this spec

Im trying to get at what you mean here. I dont think you mean this. Because we can resolve this in this spec if you take one of my proposals.

Another way would simply be to have different URIs for the header and for an inbox in the content, which I think is the route im being pushed towards, with solid : inbox currently meeting my needs.

Or perhaps you mean that if someone can control a header, they can over write the content anyway. So it's out of scope of the spec, something like that?

@melvincarvalho
Copy link
Contributor Author

Here's an idea. Why not just register rel="inbox" with IANA/IETF and then have ldp : inbox and/or solid : inbox as a predicate for the linked data people? Wouldnt that solve all the issues, and use cases?

@melvincarvalho
Copy link
Contributor Author

It seems the consensus is a wish to close this issue. I went through all my code this morning and I realized I will personally never need to use the header portion of the spec.

Closing this issue, tho would appreciate answers to outstanding questions, especially how implement this in rdflib.

@csarven
Copy link
Member

csarven commented Aug 31, 2016

Thank you for raising this issue @melvincarvalho , and I'm glad that we've discussed this thoroughly even if we don't necessarily agree on the precise outcome. Along the way, we have clarified the spec at least in two places, and that is a clear plus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants