-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote Attachment Content Type Proposal #18
Conversation
} | ||
``` | ||
|
||
The content of the encoded message is a URL that points to encoded Envelope data that would not be limited by the 1MB limit, since it's not being sent on the network. By using the existing `Envelope` format, we can still ensure E2E encryption for this data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be helpful to specify the encryption scheme. How do we pass keys to the reader? What types of keys are allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would use the same encryption we use for publishing envelopes to the network, but instead of publishing to the network it would be to some other host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which type, though? For V1 messages we use X3DH and for V2 messages we use random TopicKey
s. In either case, we would need to figure out where to store the salt and the nonce. Those values are unique and random, and we wouldn't want to re-use the same values that went into the message since nonce re-use is an attack vector that could lead to other messages being compromised.
So, we have a few options:
- In the message payload, include the
nonce
andsalt
and use whatever secret was used to encrypt the message. The remote content would just be the encrypted bytes of the message - Keep the message payload light, and just include a link to the content. The remote content would need to be formatted as
xmtp.Ciphertext
proto message. Re-use the shared secret for the message (either V1 X3DH or V2 TopicKey) - Use an ephemeral key. The message payload would include the key (maybe 32 bytes of randomness), nonce, and salt. The remote message would be the encrypted payload only.
Of those options (those are just the three off the top of my head), I'd lean towards 3. Gives the most security, since each payload is encrypted with a different key, and means that we don't have to decode another protobuf to read the message. And it's fully self-describing. You don't need to know anything about the conversation the message came from the decrypt the content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jazzz @richardhuaaa @michaelx11 curious what the ProtoPod thinks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe I'm missing something with how our encryption works. My thought for the remote payload is that it's an xmtp.Envelope
with a topic, timestamp and message. In the same way we can send those as push notifications, I was thinking we could also make them the remote payload. It was my understanding that anyone can query for those but the only way to decrypt it is if you have the conversation key material as well (my thinking was that we'd be using the V2 message format).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd lean towards option 3. where:
- PointerMessage contains a symmetricKey and all the parameters to decrypt it.
- Server contains the content payload only.
I am weary of xmtp.Envelope as the abstraction interface.
- All possible Envelopes stored remotely requires exponentially more testing.
- "What happens if a Key Revocation message is stored off network?",
- "what happens if an Invite is no longer accessible?"
- "What if a remoteContent points to another removeContent?"
- Coupling these messages to a specific XMTP Message version would make future migrations more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you @jazzz on the potential complexity of having the remote payloads be some sort of Content Type. I'm unsold as xmtp.Envelope
being the right wrapper. But an encrypted EncodedContent
payload will make things a lot more usable for developers than a pile of unstructured bytes. Especially if these are messages that are meant to be displayed inline, where we would want to have the regular set of tools for decoding available that we offer for regular messages.
Otherwise you are going to have one flow to handle remote images, and a totally different decoding flow for on-network images. That seems less than ideal. And you lose things like fallback
content for messages that can't be decoded by the client.
"What happens if a Key Revocation message is stored off network?"
Since we don't have Key Revocation messages yet, hard for me to answer conclusively. Does seem like we may want certain meta-messages to live outside the normal message encoding flow and be typed based on contentTopic. We do this with Invites
currently, which are not a ContentType
at all.
"what happens if an Invite is no longer accessible?"
Are you asking what happens if someone sends a conversation invite as a remote message? That should not be possible, as invites are separate from messages and are not EncodedContent
or sent in message channels.
"What if a remoteContent points to another removeContent?"
Both the spec and the example implementation need a mitigation for this kind of attack. Seems like we can have some sort of recursion limit in the SDK and throw an error after N loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But an encrypted EncodedContent payload will make things a lot more usable for developers than a pile of unstructured bytes.
I'll start by saying I can get behind this. EncodedContent payload seems like the an acceptable place an alevates many of my concerns.
To round out my thoughts: If the xmtp.Envelope wrapper is excluded then I see two options:
OptionA -- Making remoteContent a composable type Leads to maximum utility. Any type can now be stored remotely with no changes required.
OptionB -- Making RemoteContent an implementation pattern for other type means it could be handled on a type by type basis. Larger types (Images, Video, File) could support remoteContent but this is not required of all types.
My argument for B is not particularly strong: There are only few types that could really benefit from OptionA, yet potential remote storage may become a burden for future ContentType Developers . Any future ContentTypes will now need to be evaluated against a wider threat model, as independent ContentServers are outside of the protection of the Network consensus protocol. I don't have any hard examples as there are few formal XIP ContentTypes defined. Given this Option A is compelling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do this with Invites currently, which are not a ContentType at all.
Great point.
That should not be possible
Great point. Despite the payload being able to hold an invite, it would not be possible to get an EncodedContent payload into the invite processing flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with @jazzz in that I'm hesitant to define a RemoteContent
type that can wrap any normal XMTP
content type. I worry about the possible attack surface caused by remote fetches such as SSRF (e.g. attacker puts 127.0.0.1:[some important port]/[some sensitive path]). We'd have to consider the cross product of All RemoteContent actions X All ContentTypes
. I'd be more comfortable if we keep RemoteContent
's capabilities very well described and limited. For example: must be a GET over HTTPS.
|
||
The content of the encoded message is a URL that points to encoded Envelope data that would not be limited by the 1MB limit, since it's not being sent on the network. By using the existing `Envelope` format, we can still ensure E2E encryption for this data. | ||
|
||
The envelope itself can contain any type of message already allowed on the network. The reference implementation uses the `Attachment` type from XIP 15, but if we introduce richer types for things like images or video, those would work here as well, since clients should be able to understand those types once they're settled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the envelope serialized? Protobuf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think protobuf works since clients already need to speak it.
One thing I'm wondering is whether we actually want the remote payload to be able to be stored as a full Not sure I've fully thought through the implications, but it seems like it would let us use all the content decoding tools we already have. Just with much larger payloads. |
Ya, this was my thinking. |
Added a reference implementation example PR in the Inbox iOS app to try to show what I've got in mind here: ephemeraHQ/xmtp-inbox-ios#83 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As currently written this seems problematic.
If this path is desired I would suggest
- The remote content type should point to contentbytes only, not another message.
- Server stored content should be completely independent from XMTP -- e.g. Raw Image bytes. Message Versions and content versions should be able to increment independently without consideration.
- Server stored content should be encrypted using a onetime symmetric key.
- RemoteContent type should contain a digest of the expected stored content.
- RemoteContent should require a user action prior to fetching from server
|
||
## Security considerations | ||
|
||
Having arbitrary data anywhere can be risky, but this is already the case for our messages, since there's no server side validation of message contents (besides size). The same protections we have now would be in place while the same pitfalls we have would still be there as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same protections we have now would be in place
I'm not sure this is true, as content servers are not held to the same requirements as a Node is.
What is stopping a server from serving different messages to different participants? Currently XMTP uses immutable shared history which makes this attack vector difficult. In this case the pointerMessage should also contain a digest of the content stored on the server to lock the message at the time of sending.
|
||
Having arbitrary data anywhere can be risky, but this is already the case for our messages, since there's no server side validation of message contents (besides size). The same protections we have now would be in place while the same pitfalls we have would still be there as well. | ||
|
||
## Copyright |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any guidance on how to preserve user privacy? Whats stopping a malicious sender from using a FB "pixel" like approach to tracking user actions?
Are the attachments auto-loaded? are users required to "Click to Load" messages from untrusted servers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should strongly encourage developers using this content type to implement click to load.
Easy to imagine someone sending 100 messages with 1gb remote attachments as an attack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the app level you could then store an "always download attachments from this sender" preference
} | ||
``` | ||
|
||
The content of the encoded message is a URL that points to encoded Envelope data that would not be limited by the 1MB limit, since it's not being sent on the network. By using the existing `Envelope` format, we can still ensure E2E encryption for this data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd lean towards option 3. where:
- PointerMessage contains a symmetricKey and all the parameters to decrypt it.
- Server contains the content payload only.
I am weary of xmtp.Envelope as the abstraction interface.
- All possible Envelopes stored remotely requires exponentially more testing.
- "What happens if a Key Revocation message is stored off network?",
- "what happens if an Invite is no longer accessible?"
- "What if a remoteContent points to another removeContent?"
- Coupling these messages to a specific XMTP Message version would make future migrations more complicated.
|
||
## Security considerations | ||
|
||
Having arbitrary data anywhere can be risky, but this is already the case for our messages, since there's no server side validation of message contents (besides size). The same protections we have now would be in place while the same pitfalls we have would still be there as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a strategy for avoiding circular loops? If the RemoteContent contains an envelope, can it also contain a another RemoteContent object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, that's a very good point. Would be an easy DOS vector against clients if protections aren't in place.
@jazzz Thanks for the feedback! I'll take another pass at this and hopefully address all your concerns. |
Updated the proposal to go with option 3, adding a required |
```js | ||
{ | ||
// The SHA256 hash of the remote content | ||
checksum: string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkSums are usually used in the context of data corruption. In this context the goal is message Integrity: Can we use "contentDigest" (or something similar) to reflect that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is SHA256 an ok hash to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a good use-case for CIDs from https://github.com/multiformats/js-multiformats (similar to IPFS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious since I'm not super familiar with IPFS, what would that CIDs get us that SHA256 doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multihash/CID is kind of just an abstraction around different hashing methods for content. This is how IPFS docs describe it: https://docs.ipfs.tech/concepts/content-addressing/
```js | ||
{ | ||
authorityId: "xmtp.org" | ||
typeId: "remoteAttachment" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on limiting this specification to the scope of StaticContent only?
Currently it is ambiguous whether the ContentServers can return Dynamic content. By explicitly only supporting static content then caching and Post-Delivery-storage approaches can avoid complications from cache Invalidation etc.
Adding of the contentDigest
mostly achieves this anyways, being explicit would make things cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of maybe remoteStaticContent
?
Jumping in on this a bit late with some dumb questions. My personal focus when evaluating changes like these is always on 1) clarity and 2) easily analyzable security properties. In that sense, two questions:
Example:
The idea here is that given just a content-type, a developer knows exactly how to fetch and interpret the content. Additionally, organizations can allowlist content-types if there are concerns around remote content (e.g. IP leakage, read receipt leakage, malicious payload vectors). Unifying at the content-type level will avoid folks having to do logic like "allow remote content, but only for these content-types, and only fetched in these ways". The logic would instead be "if content type not in this allowlist, do not process" |
I think partially it does, but then clients need to contend with huge messages being fetched and decoded unless they use a separate topic for these types of things.
I think maybe the case could be made for separate remote content types based off how they're fetched, like one for HTTP vs one for IPFS. I still think the remote payloads being |
Thinking about it some more, if an organization ever did want to block certain types of content, they could just block all |
SDK implementation updated: xmtp/xmtp-ios#68 Note that neither are particularly production ready yet (need to handle a bunch more cases) but I was able to get messages sending with encrypted remote payloads and receive them in the app: |
Are there any more pressing security concerns with this approach or would it be alright to ship and learn? Just trying to avoid making perfect the enemy of the good while still keeping things safe for folks. |
I'm okay with shipping the proposal but with a few fixed URL schemes to start, such as |
@michaelx11 What do you think of making the scheme part of the parameters? To start the only valid value would be |
Sounds good to me! We can make it an enum and add more supported schemes as we go |
Rad, I updated the proposal to include the scheme parameter as well as a note that the content must be acccessed via GET request. |
The reference PRs have been updated to include the mandatory scheme parameter, enforcing |
Given @nmalzieu's comment here, I actually don't think bumping the message limit to 10 mb is the best course of action anymore. |
I like where this landed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nakajima great job incorporating all the feedback on this. I think this has ended up in a good place.
Excited to see what developers do with this new power
🚢
``` | ||
|
||
The content of the encoded message is a URL that points to an encrypted `EncodedContent` object. The content must be accessible by an HTTP `GET` request to the URL. The `EncodedContent`'s content type MUST not be another `RemoteAttachment`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
content type MUST not be another
RemoteAttachment
.
I am glad to see this here.
Thanks everyone for your feedback! |
A suggestion for how we can handle larger messages but maintain E2E encryption.