Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can target parameter be optional? #16

Closed
kevinmarks opened this issue Nov 29, 2015 · 18 comments
Closed

Can target parameter be optional? #16

kevinmarks opened this issue Nov 29, 2015 · 18 comments

Comments

@kevinmarks
Copy link
Contributor

Forking this off from #1 as they are independent issues.

Currently the spec says that the target parameter is required, so that a minimal webmention parser can just check that the URL is in the source document. My naïve example:

https://github.com/kevinmarks/mentiontech/blob/master/main.py#L119

result = urlfetch.fetch(mention.source)
if result.status_code == 200:
    mention.sourceHTML = unicode(result.content,'utf-8')
    if mention.target in mention.sourceHTML:
        mention.verified = True
    else:
        mention.verified = False

Clearly, actually parsing the source document for actual links would be an enhancement here.

in #1 (comment) @csarven says

If we want to talk about what is strictly required, it is just the source. If we want to continue talking about how to let the target know precisely why a target was mentioned, then you need both property and target. Both property and target help with the validation process. If you want to discuss in terms of "extensions", then everything outside of source is an extension.

source is a MUST, property is a SHOULD, target is a SHOULD.

This is true in the specific case that the webmention endpoint is tightly coupled to a particular domain, and thus can know a priori which links are within its purview. That is a common case for webmention, but it is not the only possible case, as webmention receivers can support mutiple target sites,

There is another case where only a source can work - if you are sending webmentions on behalf of a page. indiewebify.me does this. However this is more of a webmention supporting service than an implementation of the protocol (it accepts a url parameter, not source)

Further comments from #1:
@rhiaro:

I'd rather see property required than target optional.

@dissolve:

Lets take taking target as optional first. I think @kevinmarks has a perfect example of anything doing webmention handling as a service being a key place where it is needed to always be there. This is a perfectly valid use-case and likely an important one. I can see this being the same issue for any site that has multiple users (silos even),especially for ones that allow custom domains for users. Moreover this would significantly effect processing of anything that refers to more than one URL. If you mention 1000 URLs in a list of "top 1000 URLs on subject X" you have to check ALL 1000 to see if ANY url you are managing is in there. When given the target URL you can easily verify that you actually care about the URL and that it is referenced by the source.

@csarven:

Making target optional was based on the fact that property and target are not absolutely needed, i.e., an endpoint can still be operational (one counter-example to the raised "issue" against that was: my http://csarven.ca/webmention endpoint).

The proper way forward is to provide both property and target. I was not advocating for target being optional any more than property being optional. They are equally valuable, which is why I last suggested that all source, property, and target should be MUSTs: #1 (comment) , on the basis that having all three represents the complete information of the webmention claim. source, property, target are strictly part of the data.

I hope I have captured everyone's arguments on this point. If not, please comment below.

@sandhawke
Copy link
Contributor

I don't see why the target is much use at all.

Here's a first cut at what I imagine webmention endpoints running:

result = urlfetch.fetch(mention.source)
if result.status_code == 200:
    html = unicode(result.content,'utf-8')
    links = parseForLinks(html)
    for link in links:
      if config.trackURL(link):
         mentions.add( { source: mention.source, target: link, verified: True })

This relies on some parseForLinks function, which could just be a regexp but I hope we'll formally defined for each media type (another issue?), and a config.trackURL function which tells us whether a given URL is one this endpoint is supposed to be tracking. We need that anyway. (I assume your code has something like that, probably a line or two earlier.)

(This reminds me, I wanted to suggest an optional etag parameter in the webmention to avoid even needing the fetch at all if the source hasn't been modified. But I guess that's another thread. Or maybe it's in there already.)

The big upside of this is when a page has multiple links that result in sending webmentions to the same endpoint, they could be skipped. The only downside I see is that maybe people want the existence of a webmention to have semantics, and this would automatically create a bunch that maybe people don't want, but I haven't heard that use case.

@dissolve
Copy link

For me that trackURL function is an expensive operation. You cannot tell immediately if a link is actually pointing to me without following it to see if it's a short link or redirect. Twitter wraps all their links in their own url shortener. How do I ever tell if twitter links to me? I would not want to have to follow every link on a page to make sure it doesn't redirect to me.

Etag sounds interesting. Certainly make a thread for that.

@sandhawke
Copy link
Contributor

Good point @dissolve, I'd forgotten that aspect. So, yeah, the URL shortener use case makes sense as a good reason to allow a target parameter. I'm not sure it justifies making it mandatory, though, since it's just a performance issue.

@dissolve
Copy link

It's actually more than just performance. It allows me to use anyy webmention endpoint to magnify a few requests to a ddos. I create a page with a couple thousand links to some poor souls website. Then I hit your endpoint without saying a target link. You just hit them with all those requests for me and I only had to do one. Huge magnification of the attack. Do that to multiple endpoints and its a pretty easy attack.

Target given in the webmention is not the link on your site, but the exact link I am posting. So you know the single link to follow, to see if it resolves to a page on your site.

@dissolve
Copy link

Honestly I had forgotten this whole reasoning and just remembered it all now. Probably a good thing for an FAQ.

@sandhawke
Copy link
Contributor

@dissolve Would that be addressed by saying the target MUST be provided in the case where there's a redirect (and is optional otherwise)? The service issuing the webmention will know if this is the case, since it had to dereference the target to find out the endpoint's address.

I ask mostly out of curiosity. If it's required in that case, it's probably no help to anyone to make it optional in other cases.

@dissolve
Copy link

Huh. I hadn't thought about it in that way but yes I would say target is a MUST for cases where there is a redirect.

It's also pretty trivial to include since you have to know the target when you send the webmention. So it's not like it adds any work on the sender (other than the case of multiple mentions needing to be sent to one endpoint) and can make the receiver's job much easier.

@kevinmarks
Copy link
Contributor Author

The etag/Last-modified handling belongs in the webmention receiver. That's on my list as indiewebify.me would thrash my server if it pinged homebrew website club notes at the moment.

@csarven
Copy link
Member

csarven commented Nov 29, 2015

When a webmention (claim) is submitted, the verification process simply checks whether it holds, i.e., whether it can be found at the source - after all, we are told that the source is making a statement about the target, so the question is, does it hold? The current spec says:

The receiver SHOULD perform a HTTP GET request on source to confirm that it actually links to target (note that the receiver will need to check the Content-type of the entity returned by source to make sure it is a textual response).

The semantics of whether "it actually links to target" or not can be more clear. I think this is the point we need to clarify better in the spec - created issue #17 - and that will give us a better grounding for this issue.

@csarven
Copy link
Member

csarven commented Nov 29, 2015

If we look at the shortURL example 2 at #17 (comment) , cases b and c essentially result in the same scenario. Therefore, unless the target receives a URL which can be found at the source URL - naturally omitting the follow-your-nose case - it makes no difference if the source is using a shortURL or not. Which is equivalent to not being provided with the target to begin with. Essentially there is no description or guidance on what to do with URLs which are not found at the source.

@dissolve
Copy link

Indeed, we should probably specifically list that if the URL is not found you disregard the webmention.
Implementers can of course do whatever they want but best to put that best practice would be to not waste time with a malformed webmention

@bblfish
Copy link

bblfish commented Nov 29, 2015

Instead of saying what to keep and what not to keep you can speak about Truth conditions: What is it that makes the content of the post True. If the agent wants to keep false propositions, then that's up to it/him/her.

Actually what you are doing is explaining how to verify the truth of the statement made in the POST. Verifiability and truth conditions are closely related of course, even if not as closely as A.J.Ayers believed in "Language, Truth and Logic" pubished in the 1930ies . This has a long history needless to say.

@kevinmarks
Copy link
Contributor Author

Yes Henry, and Gödel showed truth was undecidable by computers shortly afterwards.

Returning to empiricism, in practice we have services (eg http://brid.gy ) that remap twitter short URLs into webmentions by expanding them, and that in effect implement @sandhawke's trackURL model by monitoring twitter broadly. So, like indiewebify.me, this is support infrastructure for the protocol rather than an implementation of it, which is a sign that as implemented it has useful boundary conditions.

@bblfish
Copy link

bblfish commented Nov 30, 2015

@kevinmarks Tarksi, Gödel and other logicians are the bread and butter of philosophical thinking on meaning and truth. Just check Semantic Theory of Truth, or the Stanford encyclopedia of philosophy.

The protocols we are designing are not to make computers distinguish truth from falsety, but for us to be able to reason about them, in a court of law for example, or when building protocols, or software agents. Humans do that, and it helps to take into account the thinking on meaning that has developed throughout the 20th century, just like it helps when building skyscrapers to know about maths, material science, and many other subjects.

@dissolve
Copy link

@bblfish @kevinmarks thats getting completely off topic

@aaronpk
Copy link
Member

aaronpk commented Dec 1, 2015

There are a few reasons why the target parameter is beneficial.

Looking at the implications for DoS attacks, without the target parameter, it becomes trivial to cause a webmention receiver to do unnecessary work of verifying invalid webmentions (webmentions that don't link to the target). At least when the target is required, an attacker has to customize the request per victim.

Without the target parameter, the webmention payload becomes ambiguous, since the source likely links to more than one target. Which target is the sender interested in for a given webmention request then? What kind of error response should the receiver return if it supports receiving mentions for more than one link on the page?

I am glad to see the related issue #17 come out of this thread.

I'm going to close this issue as I don't think anyone was actually advocating for dropping the "target" parameter, but it was an interesting thought experiment.

@aaronpk
Copy link
Member

aaronpk commented Mar 19, 2016

@kevinmarks please comment here if you are satisfied or unsatisfied with the result of this discussion

@kevinmarks
Copy link
Contributor Author

I'm satisfied that we're keeping target as required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants