-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZIP 302] Memo field format specification #366
Comments
@whyrusleeping was playing with use of the memo, and pointed out IPFS is working on a standard for self-describing data that may be relevant: https://github.com/multiformats/multicodec/blob/master/multicodec-packed.md |
Yes: "A start byte of 0xF5 is reserved for use by automated software by private agreement." |
To close this ticket, write a formatting convention and put it into the Zcash protocol spec that answers the following questions:
(Note: according to https://en.wikipedia.org/wiki/UTF-8#Description, any sequence whose first byte is Here's a proposal:
|
How about for the case of |
Proposal: use multiformats varint TLV In https://github.com/multiformats, we use the construction:
with an associated "codec table" that maps
It's a "compact and extensible" TLV thing. I think it could fit here too. It would use up at least 2 bytes per value, but it would make sure:
cons:
Proposal Qs answered
(assuming this means "users of the code)
Subproposal: mutlicodec-packed table You can further choose to use the "multicodec-packed" table as the "codec table", if our types are desirable to you. But this is strictly a separate concern. basically:
where |
Okay, here's an update of https://github.com/zcash/zcash/issues/1849#issuecomment-260093393 with the varint/TLV suggestions from @whyrusleeping and @jbenet
|
The specification is correct in saying that the range of valid UTF-8 start bytes is from 0 to 0xF4 inclusive. @zookozcash, you may not have taken into account that the most significant 3 bits of a 21-bit code point are necessarily 100, because code points only go up to 0x10FFFF. |
Okay here's an update of https://github.com/zcash/zcash/issues/1849#issuecomment-260151170 with the correct value of the lowest number that can't be a valid first byte for a UTF-8 encoded string:
|
@whyrusleeping: Would you please update https://github.com/whyrusleeping/zmsg/blob/master/main.go#L43 to reflect https://github.com/zcash/zcash/issues/1849#issuecomment-260169726 and tell me, based on that implementation, if you think https://github.com/zcash/zcash/issues/1849#issuecomment-260169726 is good enough? |
@zookozcash Updated! And yeah, that makes sense to me, i'll go ahead and implement your proposal in zmsg's message parsing |
I haven't reviewed @jbenet or @zookozcash s' proposals yet. Instead I want to brain dump some potential use cases: Sender's Address: encode the sender's preferred address in the memo field.
|
Chatting with @nathan-at-least today, I think with the length prefixed codes, we can have multiple in a given memo field. This would allow us to do things like have a zmsg and a return address (using a standardized return address format) in the same memo. |
I'm a little confused or maybe concerned about Zooko's comments above beginning with
I hope the memo field when beginning with 0xF5 remains a free-for-all (unrestricted binary). I'm working on a English compression scheme for it for private messaging. It appears 3x compress (1500 bytes of ascii text) is the best that can be done because it's too short to hardly benefit from message-specific lookup tables in the stream. |
@zawy12 per the current state of the ZIP 302 draft, you'd use I've been looking into this again (after my last comment on the ZIP draft five months ago), and I see that |
Having read further back, I also see that the TLV suggestion also came with a "use the multicode table" sub-proposal. I can't find In any case, the current draft maintains @zookozcash's "arbitrary application-defined" language, which I think is not precise enough - we do want to coordinate within the Zcash ecosystem at least on payload types. I am however dubious about using the multicode table for this, given that we only have 512 bytes to work with, and if we define our type field as a multicode value, we inherently have the restriction "for multicodes with valid lengths within X fluidly-defined available space", which is very messy. Instead, I'm inclined to tighten the ZIP draft up so that we have a Zcash-specific table, and then if we want to settle on the multicode table, |
I have implemented almost all of the current draft ZIP in zcash/librustzcash#177. The only part I have not implemented yet is the |
@str4d defining your own table makes sense. I just want to make sure that things are extensible, so in zmsg, when using 0xf5, we read two leb128 uvarints from the front of the buffer, use the first as the 'type' of the field, and the second is the length. Which i think is exactly what zooko proposes |
Yep, that's almost what is in the current ZIP 302 draft. The length field should be at most 2 bytes, and we should also be consistent about enforcing canonicity. Having thought about this overnight, I really like the idea of having multiple payloads packed into a single memo field - in particular, specifying a machine-readable return address alongside a human-readable text memo. It looks like My suggestion is that we define the type 0x00 to be the "no more payloads" marker, so that it coincides with the padding. Then we read as many TLV tuples as we can until we either reach the end of the memo field or a zero type (at which point we verify that all remaining bytes are zeroes). The size requirements on the length fields would be cumulative, meaning that implementations should aim to pack in smaller or fixed-width payloads before variable-width ones (so in my example above, a wallet GUI would have a "include return address" checkbox, and clicking it deducts space from the memo size indicator). |
@whyrusleeping could you also expand on your 0xa0 type? I see a mention of ASCII in the zmsg source; do you intend it to only be ASCII, or could that type be the indicator for a UTF-8-encoded text memo (with no zero padding, c/f an all-text memo field)? |
I've added an implementation of structured memos with multiple payloads to zcash/librustzcash#177. |
Looking at zmsg's memo parsing, it looks like it parses the contents of the |
Be aware that, at some point, we should consider specifying how to attach Travel Rule information in the memo field. See https://electriccoin.co/blog/the-fatf-recommendations/ for background info. A Joint Working Group on interVASP Messaging Standards has been spun up to define a "Universal common language for communication of required originator and beneficiary information between virtual asset service providers". Once that's complete, we could define how VASPs should format that information for transmission via the encrypted memo field. Off the top of my head, I can imagine it looking something like a ;-separated set of fields, with:
Example: |
Since space is at a premium but extensibility is important, I suggest encoding the required fields first, and then any optional fields using a key-value encoding. |
I propose we include a bit of "standardization process" for payload "type_id" to strike a good balance between permissionless prototyping versus cross-ecosystem standardization. On example of how to do this is to segregate the 64bit type_id range between "standardized" and "non-standardized / prototypical" ranges (similar to the "X-" header convention/standard from HTTP). For example, we might say a Then we can "guard" the small compact |
I have a concern about the multiple payload approach: the implementation of multipayload in zcash/librustzcash#177 has. Here's the enum Payload definition, where a memo contains a
My concern is about the relationship between multiple payloads with arbitrary Example Issue For example imagine if one Furthermore, the more distinct I'm not sure how concerned I should be about this, but it bothers me because I don't know what all future use cases or apps might need/do. Example of Alternative - Single Payload-per-Memo An alternative approach is to say there's only a single item in a memo, then when combinations are desired, they are codified explicitly by introducing a specific Spec Examples for MPPM vs SPPM: An MPPM spec might specify
Meanwhile, an SPPM equivalent might look like this:
The SPPM approach trades off flexibility for predictability. Notice that it's possible (with yet more overhead) to embed the MPPM approach inside the SPPM approach:
Is it worth the overhead? If we took that approach and most applications ended up just using "Multipart TLV" anyway, all of the same UX/security concerns just resurface. Follow up Thoughts I notice this reminds me a bit of Bitcoin Script versus Transparent Zcash Extensions. I like the explicit less flexible approach of the latter compared to the former. It's less dynamic and flexible but safer to reason about. Some exploratory questions:
|
Brainstorm of use cases: User-facing Wallet:
Zapp use cases Zapps just a term I'm using to mean backend automation that reacts to memos:
Plumbing Usages
More? Submit your own! Concern Note as per my concern above, mixing and matching all these cases might lead to unexpected or unanticipated behavior across wallets or Zapps. We also need to consider confused deputy attacks. For example, could Alice trick Bob's wallet into sending a refund with an Unauthenticated Reply-To to Charlie's service that causes the Bob's zaddr to get registered with some service that Bob is unaware of? |
With my concerns and use case brainstorms shared, I'm fine for the rest of the Zcash community to decide on the ZIP 302 contents. My only direct request is for "prototyping vs standardized" range for |
@hdevalence has suggested that ZIP 302 treat invalid UTF-8 text memos as completely invalid, instead of replacing invalid sequences. The change would enable parsing APIs to preserve the exact byte serialization of the memo (c/f zcash/librustzcash#177 where |
I've also removed the TLV structured memo field logic from zcash/librustzcash#177 for now, while we thrash this out. It's currently living in https://github.com/str4d/librustzcash/tree/structured-memo-types. |
Zbay, Zmsg has implemented some of this so we should probably try to be consistent with them on whatever decision we make with what to do with invalid UTF-8 sequences. |
AIUI the main point of contention remaining is the meaning of
In an effort to make progress on this, I have removed |
I have now opened the separate PR for the TLV spec: #638. |
In ZIP Sync today, we decided that:
We decided that using up another top-level prefix byte here is fine, because it opens up the typecode space significantly (it adds >250 new single-byte typecodes that can encode 509 bytes, which is not a significant reduction from the 511 bytes that a top-level prefix byte can encode). |
DRAFT ZIP: #105 - This ticket is being addressed by a draft ZIP specification. If you are interested in this topic, please contribute to that specification.
Our current memo field content specification reads like this in the protocol spec version 2016.0-beta-1.10:
Let's begin a use case study with early users, eg exchanges, wallets, embedded apps, wrt to how they would like to use the memo field and begin proposing a standard.
Is it true that third parties wishing to implement their own "app specific / non-standard" protocols should use 0xF5 as the initial byte with no further restrictions on subsequent bytes?
The text was updated successfully, but these errors were encountered: