Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

feat(59/STATUS-URL-DATA): initial draft #600

Closed
wants to merge 13 commits into from

Conversation

felicio
Copy link

@felicio felicio commented May 9, 2023

Moved from status-im/specs#169


## Specs

- <https://github.com/status-im/specs/pull/159> for URL scheme
Copy link
Author

@felicio felicio May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@felicio felicio May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #602

@felicio
Copy link
Author

felicio commented May 9, 2023

Cannot add reviewers, so please

To @rymnc @saledjenic @cammellos @iurimatias

@rymnc
Copy link
Contributor

rymnc commented May 23, 2023

@felicio, please can you review the changes?

@felicio
Copy link
Author

felicio commented May 24, 2023

@felicio, please can you review the changes?

@rymnc I really liked them, good job!

Copy link
Contributor

@kaiserd kaiserd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this RFC :).

In its current state, it is quite raw.
I'd suggest providing more context and adding text replacing the bullet poitns (some of which seem like "notes" rather part of the spec)

content/docs/rfcs/59/README.md Show resolved Hide resolved

# Abstract

This document presents a comprehensive overview of the serialization, compression, and encoding techniques used to transmit data within URLs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If this RFC "presents a comprehensive overview" it should be in the Informational category instead of Standards Track.

  2. If this RFC should evolve into a normative specification, this should be phrased as
    "This document specifies serialization, compression, and encoding techniques used to transmit data within URLs in the context of Status protocols."

I assume it is 2).


### Product

- The process of encoding data within URLs should continue until the Waku response becomes sufficiently instantaneous and reliable for link previewing and visiting the onboarding website. Alternatively, this process can be halted once a product roadmap for an alternative decentralized storage solution is finalized.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until the Waku response becomes sufficiently instantaneous and reliable

What does this mean?
Does this doc depend on a future Waku version with lower latency?
In that case, this sentece should be marked as a "note".

What is the "onboarding website"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this doc depend on a future Waku version with lower latency?

No, but if it was low enough product could deem this spec redundant going forward.

Btw, post MVP profile metadata could elsewhere too (not in url nor waku), check out status-im/status-web#327 (comment).


- The process of encoding data within URLs should continue until the Waku response becomes sufficiently instantaneous and reliable for link previewing and visiting the onboarding website. Alternatively, this process can be halted once a product roadmap for an alternative decentralized storage solution is finalized.

- During the loading of the onboarding website, a verification state should be implemented to compare the encoded data against the Waku response. This measure aims to mitigate identity attacks and ensure the integrity of the data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require more context.
For example: "against the Waku response", which Waku response?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A response containing user/community/channel's latest metadata.

a verification state should be implemented to compare the encoded data against the Waku response.

That's outdated and can now be done instantaneously on client with #600 (comment), given that decoding of a public key and signature from the url succeeds.

### Related scope

#### Features

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this needs more context.

Copy link
Author

@felicio felicio May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please be specific in requesting more context? Does it need a product write up for what link previews, sharing, deep linking etc. are? And how the app utilizes them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Onboarding website is a website or set of pages allowing visitors (those that click on the link and don't have the app installed or hasn't been automatically redirect to it) to preview community/channel/user's metadata and guide them in installing the app or allowing them to join or connect with someone for example through redirecting/deep linking or scanning a contact QR.


## Encoding

- Base64url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding, most likely, should not be an implementation suggestion but part of the spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was under "suggestion" due to the "request for comments" and "draft" or "raw" state nature. Especially, since I got very little technical feedback on the whole spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matter of fact, and spec contributing/format aside, what do you think about the (suggested) implementation? Would solve the problem differently?

Comment on lines 97 to 116
<!-- # Security/Privacy Considerations

A standard track RFC in `stable` status MUST feature this section.
A standard track RFC in `raw` or `draft` status SHOULD feature this section.
Informational RFCs (in any state) may feature this section.
If there are none, this section MUST explicitly state that fact.
This section MAY contain additional relevant information, e.g. an explanation as to why there are no security consideration for the respective document. -->
# Security Considerations
- There is currently no method to verify the validity of the encoded data. This limitation is acknowledged in the current implementation.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here i am talking about the metadata itself, and the fact that you trust someone to sign and send the correct metadata to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats why it says "validity of encoded data" :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I might have the terms mixed up. I'd say "validity" can be quite broad.

To be more specific, the integrity should be ensured and validated with what I've shared above. And authenticity, while yes a link can be send to you, but you could also obtain it from a profile in the app or from its bio on social. In which case, validating the authenticity depends on you trusting the source or relying on other mechanism like identicon ring, emoji hash or public key.

That's why I think the sentence is not true.

Also,

This limitation is acknowledged in the current implementation.

acknowledged how and where?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you say this better describes it?

The integrity of the encoded data is ensured by the signature, however, the authenticity of the data is not guaranteed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, in this use case. And it's different than saying"verifying that extracted public key == public key", wouldn't you say?

Sorry, maybe I didn't explain myself well. Basically there are 2 operations that we perform, one is extracting the key from hash/signature, while the other is verification (which we use in the codebase).
Verification is essentially this:

func verify(hash, signature, expectedPk) {
   return extractPk(hash,signature) == expectedPk
}

Which is what I was referring to as non meaningful, since it makes sense if you can trust expectedPk (you retrieve it from a trusted third party), but in this case, all 3 are relayed, therefore we should assume that all 3 can be tampered with.

We're verifying the signature to preserve the integrity and authenticity as discussed earlier, don't we?

Maybe I used the wrong words, apologies, as explained in the previous point, best to use "extract" rather than "verify", since what we are really doing is to pin the content/signature to a specific public key, but verification would require to verify against something, something trusted, which in this case is not available (we would if for example there was a ENS name, since we can trust the ENS name resolver).

So if we can agree on that we don't mind NOT catching cases of

Just to make sure, it's not that we are not catching those cases, the only way you can catch those cases is if you have some expectations about a pk, so you are validating something against a known/trusted pk, which I don't believe is the case here (but it could also be lack of context). It would be possible to validate in some cases at sharing time (if you share a link with me on status saying that it's your link, I can check that's the case, as I know your pk).

Either of the cases above will result in a different pk being extracted, which is the only thing we can do given that we don't have a trusted source or something to verify against that is tamper proof.
So, I would be for not including the change myself, since I think it makes things a bit more opaque and it's a bit narrow in what it prevents (an attacker tampering with hash/signature but not tampering with pk). Though there might be some genuine cases that I am missing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically there are 2 operations that we perform, one is extracting the

Currently it is WITHOUT extracting. That's what the very first comment points to.

therefore we should assume that all 3 can be tampered with.

That may be, but under the current implementation and with the verification in place I wouldn't be able to construct a tampered url and put your PK in it without our UIs catching it and letting anybody know it's been tampered with by flagging such url or NOT displaying the content in the first place. Like in the case of our new onboarding/previewing page replacing https://join.status.im/u/0x04ac78d5b61ea49eab8849826c628e01aed9d38ef8abe2fe53444e47bec31788049a5099ebf6dbbe5bc1c39fffee1e012faf201e3775296bec0aa38a5ddcdf1df7.

verification would require to verify against something, something trusted, which in this case is not available
...
the only way you can catch those cases is if you have some expectations about a pk, so you are validating something against a known/trusted pk,

I see, and possibly mistakenly, verifying whether the url in this scenario has been tampered with and whether I ultimately can trust the PK behind it as two different things.

Copy link
Author

@felicio felicio May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, if we talk about an implementation WITH extracting, or recovering,

what we are really doing is to pin the content/signature to a specific public key

...to a key that the process of extracting produces. That I think is a good way to put it.

Either of the cases above will result in a different pk being extracted

By the way, is it common for other pk/signature algorithms to give you different pk for my signature and your tampered data instead of failing too?

an attacker tampering with hash/signature but not tampering with pk

Ditto previous comment at #600 (comment)

So, I would be for not including the change myself

I'll revert it then 👍. Unless anybody else weighs in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, is it common for other pk/signature algorithms to give you different pk for my signature and your tampered data instead of failing too?

No, it's only a feature of some (not all) ECDSA schemes, as far as I know. Other schemes (RSA/DSA), do not offer recovery of the public key, so you can only verify the signature, but you would need to have a key to validate against, pulled from some kms.

Thanks for the work on the specs!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@felicio felicio May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In its current state, it is quite raw.
I'd suggest providing more context and adding text replacing the bullet poitns (some of which seem like "notes" rather part of the spec)

May be "raw" and contain notes, but I'm afraid that trimming it with c665051 loses the added context as to why and what for, and becomes just an abstract document that's hard to follow and reevaluate its relevancy in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have prefaced this that it's just my opinion and that I'll side with you and your experience in writing specs.

However, I'd be interested in what others I've mentioned for review at #600 (comment) think and how they look at it from a point of you of someone implementing the spec.

Also, Cc @jrainville @borismelnik

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'd be interested in what others I've mentioned for review at #600 (comment) think and how they look at it from a point of you of someone implementing the spec.

The spec right now is ok, but it's missing some small parts to make it at least easily usable to implement.

Firstly, I think the part you had here called Requirements > Product was pretty useful https://github.com/vacp2p/rfc/blob/6abe7552d931af76f225909f86611e3c76ced5a2/content/docs/rfcs/59/README.md
I wouldn't mind if it was re-introduced in someway.

One other point would be to have some kind of pseudo code of how the different algorithms are supposed to be used, for the order of them mostly.
The link to the Example is not good enough, because it requires to dig in the files and can be tied to the language in someway.

So something simple like

data = Community{...}
dataEncoded = encodingFunction(data)
dataCompressed = compressingFunction(dataEncoded)
finalUrl = URL_PREFIX + "/" + dataCompressed

Anyway, my example is not great, but at least it gives a good idea on how to encode the data to get to the URL.

Doing the same for the other way around would be good too.

Copy link
Author

@felicio felicio May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind if it was re-introduced in someway.

Cool, thanks for the feedback 👍.

The link to the Example is not good enough, because it requires to dig in the files and can be tied to the language in someway.

Fair point 👍.

Cc @rymnc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Thanks for the great conversation @felicio @jrainville
I'll update the spec with some pseudocode

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felicio

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mentioning @borismelnik since he's having hard time following the spec. @borismelnik, please when you get your "footing" could you propose changes?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felicio yeah of course :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrainville @borismelnik just wanted to let you know that've dropped the signatures both in web's implementation and in the specs as per our meeting and status-im/status-web#459.

Encoded data didn't change in tests at https://github.com/status-im/status-web/pull/461/files#diff-378ed45ccbd1b990783b9cc2f2ee73430328530f17339176b4fa658f2b6eda08.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felicio Awesome! We already have it in desktop master :)

Comment on lines +165 to +167
# Proof of concept

- See <https://github.com/felicio/status-web/blob/825262c4f07a68501478116c7382862607a5544e/packages/status-js/src/utils/encode-url-data.compare.test.ts#L4>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prichodko about simplifying or solving the problem differently, I'm pointing this out here for better overview and future reference.

It's not the prettiest thing to navigate, but basically the results vary in

{
      serialization: 'csv',
      compression: 'noop',
      encoding: 'encodeURIComponent',
    }

and

expect(characterLength).toBe(184)
expect(encodedData).toHaveLength(243)

Previsouly, I shared with you status-im/status-web#345 (comment). And with that, I'm still inclined towards

{
        serialization: 'protobuf',
        compression: 'brotli',
        encoding: 'base64url',
      }

@rymnc rymnc mentioned this pull request Jul 27, 2023
4 tasks
Comment on lines +71 to +76
message URLData {
// Community, Channel, or User
bytes content = 1;
uint32 shard_cluster = 2;
uint32 shard_index = 3;
}
Copy link
Author

@felicio felicio Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richard-ramos @cammellos, I updated the spec in 1c38bb1.

I forgot to ask, do you think that these fields (and their "informational value") should be after "#" (i.e. not exposed to servers) as well as the public keys are now, or don't have to be?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO with the idea of reducing as much metadata as possible stored in the logs of servers, these fields should appear after the # but I don't know the ramifications this could cause?. Do the URLs for sharing communities with its public key follow this format?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the URLs for sharing communities with its public key follow this format?

If you mean if they are after the #, then yes. Merged implementations up until now should follow https://github.com/vacp2p/rfc/pull/602/files.

I don't know the ramifications this could cause?

When I asked you, I could only think of targeting the community's topic. Given that public key isn't needed to construct it. @rymnc @cammellos @jakubgs could you weigh in as well please?

Latest privacy consideration on this topic being mentioned in status-im/status-web#327 (comment).

these fields should appear after the #

Given above, I'd agree. In which case I'd propose mapping it with the public key and a new protobuf, similarly to how the content is encoded.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these fields should appear after the #

Given above, I'd agree. In which case I'd propose mapping it with the public key and a new protobuf, similarly to how the content is encoded.

@cammellos agrees.

@rymnc
Copy link
Contributor

rymnc commented Oct 10, 2023

hey @felicio can we merge this soon?

@jimstir
Copy link
Contributor

jimstir commented Mar 1, 2024

Continue discussion :vacp2p/rfc-index#13

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants