Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify endpoints scope (was: Should "endpoints" be dropped?) #254

Closed
cwebber opened this issue Aug 27, 2017 · 21 comments
Closed

Clarify endpoints scope (was: Should "endpoints" be dropped?) #254

cwebber opened this issue Aug 27, 2017 · 21 comments

Comments

@cwebber
Copy link
Collaborator

cwebber commented Aug 27, 2017

ie, should the endpoints property be dropped? It kind of clumsily encompasses several other endpoint properties, which isn't consistent.. inbox and outbox are endpoints too but aren't stuffed in here.

I think originally I wanted to separate this so that servers could put "common server endpoints" in one url to deduplicate or something? Hardly seems like a good idea any more. Would be one less property added by ActivityPub.

@gobengo
Copy link

gobengo commented Aug 27, 2017 via email

@cwebber
Copy link
Collaborator Author

cwebber commented Aug 27, 2017

The good news is, with the exception of publicInbox (which we are about to rename to sharedInbox, so that's already changing/moving), all other endpoints are client to server only. While that could affect some implementations, the group has agreed that these are less serious than federation changes, which affect interop between nodes on the network.

@cwebber
Copy link
Collaborator Author

cwebber commented Aug 27, 2017

Note that @zotlabs expressed their thoughts that endpoints were clumsy also, and noted that it's not even in the present context currently.

@ghost
Copy link

ghost commented Aug 27, 2017

It's more important to fix the context; and way more important that the context can't change (only additive changes can be allowed) - or it could break signatures of anything signed in the past.

LD Signatures as defined also introduces a centralisation component into the protocol (you're dependent on single centralised servers to serve the ld schema) unless you cache contexts (which you probably should), but then you are even more vulnerable to any schema change, including additive changes. You might get around all this by including the definitions of all contexts you use inside each document you send; but that is a bit wasteful.

Or use a signing format like magic envelopes. Salmon sigs don't have these issues as they preserve the state of the signed data without outside dependencies, but would require a valid json-ld context to be strictly compliant. I've experimented with them and they work well with AP. The only drawback is a potential duplication of data as the original signed data is armoured against mangling and sent to show exactly what was signed.

This probably belongs in a separate issue but I'm mentioning here because it is all related to LD schema changes. I'm fine with dropping 'endpoints' but mentioning that context definition changes have negative consequences going forward.

@cwebber
Copy link
Collaborator Author

cwebber commented Aug 27, 2017

@zotlabs I agree that it's critical that we don't drop terms that are in the context; one reason I considered this open for discussion now is the discussion of it being missing from the context.

However I just looked, and it isn't missing from the context... it's currently there, both in the html vocabulary page and in the json-ld context. So I guess I was wrong about that.

@tsyesika
Copy link
Collaborator

So I think the URLs that are in the endpoints object are quite different from inbox / outbox / etc. I don't really think it poses a consistency issue.

The inbox and outbox endpoints are useful endpoints for everyone. They're to collections of objects which are very much personal to the actor. I see the URLs as much more similar to a URL in icon rather than for example the oauthTokenEndpoint. The endpoints which are in endpoints are URLs for the client and whilst they can be actor specific they're not personal to the actor. They just happen to the on the actor for convenience.

I think the actor definition would get a lot more messy and cluttered with these definitions. I don't personally love having the endpoints on the actor but it's probably the easiest place for them to be for clients.

@cwebber
Copy link
Collaborator Author

cwebber commented Aug 27, 2017

That's indeed the original reason for the endpoints object, that they aren't specific to the actor. It's also why I thought actors may share an endpoint: multiple actors may simply point to the same uri for their endpoints, which can be shared site-wide.

Seeing that the vocabulary document and context do have endpoints already makes me a bit less certain about this as a late-game change.

Maybe, if we do keep it, we should better document its scope as being just for endpoints that are not necessarily specific to an actor and may (but are not necessarily) site-wide?

@ghost
Copy link

ghost commented Aug 27, 2017

Right - my mistake; I do see 'endpoints' but not the 'publicInbox' property - which is what was failing to fully expand in my signature normalisation. It was listed as an unknown child of endpoints and I read the nquads document wrong. Apologies, nquads isn't one of my fluent languages.

I agree that it's critical that we don't drop terms that are in the context;

But if you cache contexts, even adding them may have negative consequences. @Gargron's implementation preloads contexts which are then hardwired for the duration of a given release. So these releases will all have a shelf life dependent on context definition changes upstream. I'm currently not caching but this has performance issues. The activitystreams document itself has a 6 hour cache time; and this is probably reasonable but I'll have to patch my normalizer to use http caching. It currently does not. And if you use a named element prior to its inclusion in the upstream document your signatures will break.

I'm just saying it's quite a fragile and centralized system and as a consequence signatures are going to break. One can claim that dealing with this is an implementation issue, and implementations will then have to work around it, but it's a systemic problem that all implementations will need to consider in their designs. If your implementation rejects an activity with an invalid signature, it may have actually been a valid signature when it was signed and it was invalidated through an external agent and there's no easy way to prove this. You also pretty much are required to provide a context cache or your signatures could stop validating because of a ddos somewhere else in the world. And Mastodon's implementation with contexts that are frozen in time will have negative consequences on that platform.

@Gargron
Copy link

Gargron commented Aug 27, 2017

I agree with @zotlabs on everything regarding context changes. They should be immutable, or else we're either getting signature breakages or performance issues. I actually don't think their immutability should be a big problem - most standards don't change past release, we're simply in a pre-release phase right now.

That was my assumption when choosing the preloading behaviour, anyway. If we can't guarantee immutablity, we're going to have to do softer caching. Maybe a redis cache for a day. Maybe contexts should be fetched during assets:precompile. However, I do agree that if we can't rely on a permanent cache, it introduces a level of central runtime dependency - all ActivityStreams implementors would be hitting up W3C servers hosting the context definition. That doesn't sound acceptable to me.

Regarding the original issue, it does make sense to separate client-only URLs into some subset of the actor, it feels cleaner somehow.

@ghost
Copy link

ghost commented Aug 28, 2017

most standards don't change past release

ROTFL.

@Gargron
Copy link

Gargron commented Aug 28, 2017

Alternatively, we could use a different verifying hash creation algorithm for the signatures. ActivityPub already says it doesn't use the key expansion/framing stuff from JSON-LD, the keys can be treated as-is. So perhaps we could use a different canonicalization algorithm: instead of doing RDF/n-quads normalization, just take the JSON, sort the keys alphabetically, strip all whitespace, dump to a string and use that. That way the signatures would not break if context definitions change.

@ghost
Copy link

ghost commented Aug 28, 2017

That's why I mentioned salmon/magicsig - just sign the json. Done.

@jaywink
Copy link

jaywink commented Aug 28, 2017

instead of doing RDF/n-quads normalization, just take the JSON, sort the keys alphabetically, strip all whitespace, dump to a string and use that

This really should be the way to go 👍

@Gargron
Copy link

Gargron commented Aug 28, 2017

That's why I mentioned salmon/magicsig - just sign the json. Done.

I still suggest using the bulk of the Linked Data Signatures standard, simply using a different algorithm (which is allowed in the standard). LDS already has all the required designated parts - the signature property, signatureValue, creator, etc. Salmon/magicsig is very old and I cannot find a document that doesn't talk about XML.

@ghost
Copy link

ghost commented Aug 28, 2017

The magic sig concept is the ultimate in simplicity. "Here's the exact data I signed, and here's the algorithm I used". You don't have to go rebuilding and sorting multi-dimensional object arrays after stripping bits out of them.

Still I note that StatusNet, Mastodon, and Diaspora all messed up this part and mangled the base64 signed data in unique ways so it couldn't be verified across platform even with a pretty clear specification. I had to reverse engineer all of these projects to figure out what part they messed up.

I don't know what chances you think of getting multiple projects on different platforms to deconstruct and reconstruct multi-dimensional arrays in the same exact way (and over time). Trust me, reverse engineering crypto technology isn't fun.

But do what you think is best.

@cwebber
Copy link
Collaborator Author

cwebber commented Aug 28, 2017

However, I do agree that if we can't rely on a permanent cache, it introduces a level of central runtime dependency - all ActivityStreams implementors would be hitting up W3C servers hosting the context definition. That doesn't sound acceptable to me.

In fact this problem happened in the past with XML DTDs I think? And W3C servers got DDOS'ed IIRC. So yes, you should include local copies of a context if you can, or at absolute minimum cache (but probably include the contexts you know you want).

Anyway I agree that vocabularies and contexts should never drop terms; I'm not as sure about never appending, though I agree with the problem. The worst case scenario is if you're running an older version of the software, and a new version of the software starts using extensions that aren't in your stale context, and you can't verify their signatures. It shouldn't happen very often, but I agree it's not the most graceful fallback...

I guess schema.org adds things to its context all the time.

Thining out loud: there's another way to do this with extensions that we could recommend that would not hit the same problem we're seeing now where "you can't use new terms when the as2 context is extended, because old implementations might not know what they are." It could be that for extensions, here is the only way you're supposed to use them:

{"@context": "https://www.w3.org/ns/activitystreams"
 "as:sensitive": true}

Here's why this would work, regardless of whether or not users have an updated context. The AS2 json-ld context looks like so (I'm cutting down quite a bit):

{
  "@context": {
    "@vocab": "_:",
    "as": "https://www.w3.org/ns/activitystreams#",
    "id": "@id",
    "type": "@type",
    "Accept": "as:Accept",
    "Activity": "as:Activity",
    ...
    "orderedItems": {
      "@id": "as:items",
      "@type": "@id",
      "@container": "@list"
    }
    ...
}   

So the as: namespace is defined as something you could use... but you can see that some information like orderedItems being a @list rather than an @set is lost. So maybe not.

This conversation is diverging quite a bit from the original issue btw, but I guess it's timely because we're

@cwebber
Copy link
Collaborator Author

cwebber commented Aug 28, 2017

At any rate, I do think we'll be adding new properties to the context very rarely, so I do think this will be rare. But yes, it could be that a server using an extension fresh off the presses fails to have its signatures validated by an older server.

@ghost
Copy link

ghost commented Aug 29, 2017

ActivityStreams is json-ld and therefore can link to any context available in all of json-ld space. This isn't a question of one context changing. This is a question of any referenced context changing. It has implementation considerations because you cannot trust a stored signature that verified in the past. It may not verify in the future because something in its environment evolved. If we restrict AP to only activitystreams contexts, you then can't have LD signatures or any implementation extensions because these are foreign and untrusted (or 'draft') contexts that might change someday in an incompatible way. Where do you draw the line?

Also, knowing the history of implementation issues with salmon signing (which is relatively simple to implement) I question whether defining custom object reconstructors/normalisers and serialisers is a business we want to get into. At the end of the day you implement a spec and see if a signature validates or doesn't. If it doesn't you have to figure out whose fault it is, and this can be a very difficult task. The least amount of data wrangling between the original data and the signature wins and it can still take you weeks/months to nail down a subtle bug in somebody else's code.

If this situation is acceptable with everybody involved, fine. I'll stand down. I see it as a train wreck ahead and would prefer to get onto another track before the carnage hits. Not if, when.

Anyway I've had my say and yes we've diverged from topic.

@cwebber
Copy link
Collaborator Author

cwebber commented Aug 29, 2017

Back to the original topic: it seems to me that there's not enough support for removing the endpoints endpoint for various good reasons. But we should clarify its scope. I suggest one of the following two options:

  • endpoints is for "common" endpoints particular to a server (so that would be c2s endpoints, but also sharedInbox). This is how things are now, but it's not clear based on the current language.
  • endpoints is for client to server endpoints only (no sharedInbox).

@cwebber
Copy link
Collaborator Author

cwebber commented Aug 29, 2017

From SocialWG call today:

<cwebber2> RESOLVED: rephrase description of endpoints to clarify that its                        
           scope is for endpoints that tend to be shared on a domain/server

@cwebber
Copy link
Collaborator Author

cwebber commented Aug 30, 2017

This is done.

@cwebber cwebber closed this as completed Aug 30, 2017
@cwebber cwebber changed the title Should "endpoints" be dropped? Clarify endpoints scope (was: Should "endpoints" be dropped?) Aug 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants