Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Attachment Content Type Proposal #18

Merged
merged 7 commits into from
Feb 23, 2023
Merged

Remote Attachment Content Type Proposal #18

merged 7 commits into from
Feb 23, 2023

Conversation

nakajima
Copy link
Contributor

A suggestion for how we can handle larger messages but maintain E2E encryption.

}
```

The content of the encoded message is a URL that points to encoded Envelope data that would not be limited by the 1MB limit, since it's not being sent on the network. By using the existing `Envelope` format, we can still ensure E2E encryption for this data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful to specify the encryption scheme. How do we pass keys to the reader? What types of keys are allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would use the same encryption we use for publishing envelopes to the network, but instead of publishing to the network it would be to some other host.

Copy link
Contributor

@neekolas neekolas Feb 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which type, though? For V1 messages we use X3DH and for V2 messages we use random TopicKeys. In either case, we would need to figure out where to store the salt and the nonce. Those values are unique and random, and we wouldn't want to re-use the same values that went into the message since nonce re-use is an attack vector that could lead to other messages being compromised.

So, we have a few options:

  1. In the message payload, include the nonce and salt and use whatever secret was used to encrypt the message. The remote content would just be the encrypted bytes of the message
  2. Keep the message payload light, and just include a link to the content. The remote content would need to be formatted as xmtp.Ciphertext proto message. Re-use the shared secret for the message (either V1 X3DH or V2 TopicKey)
  3. Use an ephemeral key. The message payload would include the key (maybe 32 bytes of randomness), nonce, and salt. The remote message would be the encrypted payload only.

Of those options (those are just the three off the top of my head), I'd lean towards 3. Gives the most security, since each payload is encrypted with a different key, and means that we don't have to decode another protobuf to read the message. And it's fully self-describing. You don't need to know anything about the conversation the message came from the decrypt the content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jazzz @richardhuaaa @michaelx11 curious what the ProtoPod thinks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe I'm missing something with how our encryption works. My thought for the remote payload is that it's an xmtp.Envelope with a topic, timestamp and message. In the same way we can send those as push notifications, I was thinking we could also make them the remote payload. It was my understanding that anyone can query for those but the only way to decrypt it is if you have the conversation key material as well (my thinking was that we'd be using the V2 message format).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean towards option 3. where:

  • PointerMessage contains a symmetricKey and all the parameters to decrypt it.
  • Server contains the content payload only.

I am weary of xmtp.Envelope as the abstraction interface.

  • All possible Envelopes stored remotely requires exponentially more testing.
    • "What happens if a Key Revocation message is stored off network?",
    • "what happens if an Invite is no longer accessible?"
    • "What if a remoteContent points to another removeContent?"
  • Coupling these messages to a specific XMTP Message version would make future migrations more complicated.

Copy link
Contributor

@neekolas neekolas Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you @jazzz on the potential complexity of having the remote payloads be some sort of Content Type. I'm unsold as xmtp.Envelope being the right wrapper. But an encrypted EncodedContent payload will make things a lot more usable for developers than a pile of unstructured bytes. Especially if these are messages that are meant to be displayed inline, where we would want to have the regular set of tools for decoding available that we offer for regular messages.

Otherwise you are going to have one flow to handle remote images, and a totally different decoding flow for on-network images. That seems less than ideal. And you lose things like fallback content for messages that can't be decoded by the client.

"What happens if a Key Revocation message is stored off network?"

Since we don't have Key Revocation messages yet, hard for me to answer conclusively. Does seem like we may want certain meta-messages to live outside the normal message encoding flow and be typed based on contentTopic. We do this with Invites currently, which are not a ContentType at all.

"what happens if an Invite is no longer accessible?"

Are you asking what happens if someone sends a conversation invite as a remote message? That should not be possible, as invites are separate from messages and are not EncodedContent or sent in message channels.

"What if a remoteContent points to another removeContent?"

Both the spec and the example implementation need a mitigation for this kind of attack. Seems like we can have some sort of recursion limit in the SDK and throw an error after N loops.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But an encrypted EncodedContent payload will make things a lot more usable for developers than a pile of unstructured bytes.

I'll start by saying I can get behind this. EncodedContent payload seems like the an acceptable place an alevates many of my concerns.


To round out my thoughts: If the xmtp.Envelope wrapper is excluded then I see two options:

remoteContent

OptionA -- Making remoteContent a composable type Leads to maximum utility. Any type can now be stored remotely with no changes required.

OptionB -- Making RemoteContent an implementation pattern for other type means it could be handled on a type by type basis. Larger types (Images, Video, File) could support remoteContent but this is not required of all types.

My argument for B is not particularly strong: There are only few types that could really benefit from OptionA, yet potential remote storage may become a burden for future ContentType Developers . Any future ContentTypes will now need to be evaluated against a wider threat model, as independent ContentServers are outside of the protection of the Network consensus protocol. I don't have any hard examples as there are few formal XIP ContentTypes defined. Given this Option A is compelling.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this with Invites currently, which are not a ContentType at all.

Great point.

That should not be possible

Great point. Despite the payload being able to hold an invite, it would not be possible to get an EncodedContent payload into the invite processing flow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @jazzz in that I'm hesitant to define a RemoteContent type that can wrap any normal XMTP content type. I worry about the possible attack surface caused by remote fetches such as SSRF (e.g. attacker puts 127.0.0.1:[some important port]/[some sensitive path]). We'd have to consider the cross product of All RemoteContent actions X All ContentTypes. I'd be more comfortable if we keep RemoteContent's capabilities very well described and limited. For example: must be a GET over HTTPS.


The content of the encoded message is a URL that points to encoded Envelope data that would not be limited by the 1MB limit, since it's not being sent on the network. By using the existing `Envelope` format, we can still ensure E2E encryption for this data.

The envelope itself can contain any type of message already allowed on the network. The reference implementation uses the `Attachment` type from XIP 15, but if we introduce richer types for things like images or video, those would work here as well, since clients should be able to understand those types once they're settled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the envelope serialized? Protobuf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think protobuf works since clients already need to speak it.

@neekolas
Copy link
Contributor

One thing I'm wondering is whether we actually want the remote payload to be able to be stored as a full EncodedContent with XIP-5 compatible content-types. That allows you to store any message compatible with XMTP remotely and clients already have a flexible set of decoding options. Of course, one of those content types could be the proposed attachment content type that relies on MIME, but you could also use other XIP-5 content types that have built in decoders.

Not sure I've fully thought through the implications, but it seems like it would let us use all the content decoding tools we already have. Just with much larger payloads.

@nakajima
Copy link
Contributor Author

One thing I'm wondering is whether we actually want the remote payload to be able to be stored as a full EncodedContent with XIP-5 compatible content-types.

Ya, this was my thinking.

@nakajima
Copy link
Contributor Author

Added a reference implementation example PR in the Inbox iOS app to try to show what I've got in mind here: ephemeraHQ/xmtp-inbox-ios#83

Copy link

@jazzz jazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently written this seems problematic.

If this path is desired I would suggest

  • The remote content type should point to contentbytes only, not another message.
  • Server stored content should be completely independent from XMTP -- e.g. Raw Image bytes. Message Versions and content versions should be able to increment independently without consideration.
  • Server stored content should be encrypted using a onetime symmetric key.
  • RemoteContent type should contain a digest of the expected stored content.
  • RemoteContent should require a user action prior to fetching from server


## Security considerations

Having arbitrary data anywhere can be risky, but this is already the case for our messages, since there's no server side validation of message contents (besides size). The same protections we have now would be in place while the same pitfalls we have would still be there as well.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same protections we have now would be in place

I'm not sure this is true, as content servers are not held to the same requirements as a Node is.

What is stopping a server from serving different messages to different participants? Currently XMTP uses immutable shared history which makes this attack vector difficult. In this case the pointerMessage should also contain a digest of the content stored on the server to lock the message at the time of sending.


Having arbitrary data anywhere can be risky, but this is already the case for our messages, since there's no server side validation of message contents (besides size). The same protections we have now would be in place while the same pitfalls we have would still be there as well.

## Copyright
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any guidance on how to preserve user privacy? Whats stopping a malicious sender from using a FB "pixel" like approach to tracking user actions?

Are the attachments auto-loaded? are users required to "Click to Load" messages from untrusted servers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should strongly encourage developers using this content type to implement click to load.

Easy to imagine someone sending 100 messages with 1gb remote attachments as an attack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the app level you could then store an "always download attachments from this sender" preference

}
```

The content of the encoded message is a URL that points to encoded Envelope data that would not be limited by the 1MB limit, since it's not being sent on the network. By using the existing `Envelope` format, we can still ensure E2E encryption for this data.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean towards option 3. where:

  • PointerMessage contains a symmetricKey and all the parameters to decrypt it.
  • Server contains the content payload only.

I am weary of xmtp.Envelope as the abstraction interface.

  • All possible Envelopes stored remotely requires exponentially more testing.
    • "What happens if a Key Revocation message is stored off network?",
    • "what happens if an Invite is no longer accessible?"
    • "What if a remoteContent points to another removeContent?"
  • Coupling these messages to a specific XMTP Message version would make future migrations more complicated.


## Security considerations

Having arbitrary data anywhere can be risky, but this is already the case for our messages, since there's no server side validation of message contents (besides size). The same protections we have now would be in place while the same pitfalls we have would still be there as well.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a strategy for avoiding circular loops? If the RemoteContent contains an envelope, can it also contain a another RemoteContent object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, that's a very good point. Would be an easy DOS vector against clients if protections aren't in place.

@nakajima
Copy link
Contributor Author

@jazzz Thanks for the feedback! I'll take another pass at this and hopefully address all your concerns.

@nakajima
Copy link
Contributor Author

Updated the proposal to go with option 3, adding a required checksum field as well. I also added a note about potential pixel tracker concerns.

```js
{
// The SHA256 hash of the remote content
checksum: string,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkSums are usually used in the context of data corruption. In this context the goal is message Integrity: Can we use "contentDigest" (or something similar) to reflect that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SHA256 an ok hash to use?

Copy link

@snormore snormore Feb 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a good use-case for CIDs from https://github.com/multiformats/js-multiformats (similar to IPFS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious since I'm not super familiar with IPFS, what would that CIDs get us that SHA256 doesn't?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multihash/CID is kind of just an abstraction around different hashing methods for content. This is how IPFS docs describe it: https://docs.ipfs.tech/concepts/content-addressing/

Screenshot 2023-02-21 at 2 23 42 PM

```js
{
authorityId: "xmtp.org"
typeId: "remoteAttachment"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on limiting this specification to the scope of StaticContent only?

Currently it is ambiguous whether the ContentServers can return Dynamic content. By explicitly only supporting static content then caching and Post-Delivery-storage approaches can avoid complications from cache Invalidation etc.

Adding of the contentDigest mostly achieves this anyways, being explicit would make things cleaner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of maybe remoteStaticContent?

@michaelx11
Copy link

Jumping in on this a bit late with some dumb questions. My personal focus when evaluating changes like these is always on 1) clarity and 2) easily analyzable security properties.

In that sense, two questions:

  1. If we just bump our message limit up to 10MB, do we need a remote content type at all for our current needs i.e. small videos, photos, etc?
  2. If we separate the RemoteContent concept into its own thing, I think we run the risk of ambiguity about how to fetch things. In my mind, the most unambiguous way to handle this is to have a new Content-Type for each remote use case.

Example:

  • For webm videos meant to be fetched over HTTPS, we can have an xmtp.org/https_webm:1.0
  • For webm videos meant to be fetched over IPFS: xmtp.org/ipfs_webm:1.0

The idea here is that given just a content-type, a developer knows exactly how to fetch and interpret the content. Additionally, organizations can allowlist content-types if there are concerns around remote content (e.g. IP leakage, read receipt leakage, malicious payload vectors). Unifying at the content-type level will avoid folks having to do logic like "allow remote content, but only for these content-types, and only fetched in these ways". The logic would instead be "if content type not in this allowlist, do not process"

@nakajima
Copy link
Contributor Author

If we just bump our message limit up to 10MB, do we need a remote content type at all for our current needs i.e. small videos, photos, etc?

I think partially it does, but then clients need to contend with huge messages being fetched and decoded unless they use a separate topic for these types of things.

If we separate the RemoteContent concept into its own thing, I think we run the risk of ambiguity about how to fetch things. In my mind, the most unambiguous way to handle this is to have a new Content-Type for each remote use case.

I think maybe the case could be made for separate remote content types based off how they're fetched, like one for HTTP vs one for IPFS. I still think the remote payloads being EncodedContent keeps things simplest for clients though. We could also add a content type field to the metadata that gets sent and ensure it matches that of the remote EncodedContent. Clients could use this to ignore remote messages that they aren't cool with.

@nakajima
Copy link
Contributor Author

Thinking about it some more, if an organization ever did want to block certain types of content, they could just block all remoteStaticContent messages and if they still had a need for remote content, they could just make their own content type.

@nakajima
Copy link
Contributor Author

SDK implementation updated: xmtp/xmtp-ios#68
Client usage updated: ephemeraHQ/xmtp-inbox-ios#83

Note that neither are particularly production ready yet (need to handle a bunch more cases) but I was able to get messages sending with encrypted remote payloads and receive them in the app:

Screenshot of xmtp inbox app with a remote image attached

@nakajima
Copy link
Contributor Author

Are there any more pressing security concerns with this approach or would it be alright to ship and learn? Just trying to avoid making perfect the enemy of the good while still keeping things safe for folks.

@michaelx11
Copy link

michaelx11 commented Feb 21, 2023

I'm okay with shipping the proposal but with a few fixed URL schemes to start, such as https://. We should explicitly state this in the proto/spec if possible, since schemes like file:// should be absolutely not allowed. Any time code is processing an untrusted URL there is a rich world of possible attacks. We can do our developers a favor by making these things (like expected scheme) an explicit part of the protocol. Potentially even pulling scheme into a specific enum, then only keeping the scheme-less URI.

@nakajima
Copy link
Contributor Author

@michaelx11 What do you think of making the scheme part of the parameters? To start the only valid value would be https and SDKs could ensure that URLs only start with that. Later if we wanted to add IPFS in a newer version of this content type, we could put it there?

@michaelx11
Copy link

Sounds good to me! We can make it an enum and add more supported schemes as we go

@nakajima
Copy link
Contributor Author

Rad, I updated the proposal to include the scheme parameter as well as a note that the content must be acccessed via GET request.

@nakajima
Copy link
Contributor Author

The reference PRs have been updated to include the mandatory scheme parameter, enforcing https:// and GET requests only.

@nakajima
Copy link
Contributor Author

If we just bump our message limit up to 10MB, do we need a remote content type at all for our current needs i.e. small videos, photos, etc?

I think partially it does, but then clients need to contend with huge messages being fetched and decoded unless they use a separate topic for these types of things.

Given @nmalzieu's comment here, I actually don't think bumping the message limit to 10 mb is the best course of action anymore.

@neekolas
Copy link
Contributor

I like where this landed

Copy link

@jazzz jazzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nakajima great job incorporating all the feedback on this. I think this has ended up in a good place.

Excited to see what developers do with this new power

🚢

```

The content of the encoded message is a URL that points to an encrypted `EncodedContent` object. The content must be accessible by an HTTP `GET` request to the URL. The `EncodedContent`'s content type MUST not be another `RemoteAttachment`.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content type MUST not be another RemoteAttachment.

I am glad to see this here.

@nakajima nakajima merged commit 624890e into xmtp:main Feb 23, 2023
@nakajima
Copy link
Contributor Author

Thanks everyone for your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants