Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZIP 302] Memo field format specification #366

Closed
nathan-at-least opened this issue Nov 12, 2016 · 34 comments · Fixed by #105
Closed

[ZIP 302] Memo field format specification #366

nathan-at-least opened this issue Nov 12, 2016 · 34 comments · Fixed by #105

Comments

@nathan-at-least
Copy link
Contributor

DRAFT ZIP: #105 - This ticket is being addressed by a draft ZIP specification. If you are interested in this topic, please contribute to that specification.

Our current memo field content specification reads like this in the protocol spec version 2016.0-beta-1.10:

The usage of the memo field is by agreement between the sender and recipient of the note. The memo field SHOULD be encoded either as:

• a UTF-8 human-readable string [Unicode], padded by appending zero bytes; or
• an arbitrary sequence of 512 bytes starting with a byte value of 0xF5 or greater, which is therefore not a valid UTF-8 string.

In the former case, wallet software is expected to strip any trailing zero bytes and then display the resulting UTF-8 string to the recipient user, where applicable. Incorrect UTF-8-encoded byte sequences should be displayed as replacement characters (U+FFFD).

In the latter case, the contents of the memo field SHOULD NOT be displayed. A start byte of 0xF5 is reserved for use by automated software by private agreement. A start byte of 0xF6 or greater is reserved for use in future Zcash protocol extensions.

Let's begin a use case study with early users, eg exchanges, wallets, embedded apps, wrt to how they would like to use the memo field and begin proposing a standard.

Is it true that third parties wishing to implement their own "app specific / non-standard" protocols should use 0xF5 as the initial byte with no further restrictions on subsequent bytes?

@nathan-at-least
Copy link
Contributor Author

@whyrusleeping was playing with use of the memo, and pointed out IPFS is working on a standard for self-describing data that may be relevant: https://github.com/multiformats/multicodec/blob/master/multicodec-packed.md

@zookozcash
Copy link

zookozcash commented Nov 12, 2016

Is it true that third parties wishing to implement their own "app specific / non-standard" protocols should use 0xF5 as the initial byte with no further restrictions on subsequent bytes?

Yes: "A start byte of 0xF5 is reserved for use by automated software by private agreement."

@zookozcash
Copy link

zookozcash commented Nov 12, 2016

To close this ticket, write a formatting convention and put it into the Zcash protocol spec that answers the following questions:

  1. How do you indicate whether your memo matches the rest of this specification or not?
  2. How do you indicate what is the length of your memo?
  3. How do you indicate what is the encoding, codec, or type of the data in your memo?
  4. Should you display it to the user, and if so how?

(Note: according to https://en.wikipedia.org/wiki/UTF-8#Description, any sequence whose first byte is 0xF80xFF is not a valid UTF-8 sequence. The current protocol spec (v2016.0-beta-1.10), says "starting with a byte value of 0xF5 or greater, which is therefore not a valid UTF-8 string", but I wonder if that is an error, because https://en.wikipedia.org/wiki/UTF-8#Description seems to indicate that a byte sequence whose first byte is 0xF00xF7 can be a valid UTF-8 encoding for a 4-byte-encoded codepoint.)

Here's a proposal:

  • If the first byte (byte 0)'s value is 0xF7 or smaller, then the reader should:

    • strip any trailing zero bytes,
    • decode it as UTF-8 (replacing any incorrect UTF-8-encoded byte sequences with the replacement character U+FFFD), and display it to the user as a human-readable string.

    (Note that this is correcting an error in Zcash protocol v2016.0-beta-1.10, which said to use 0xF5 instead of 0xF7 as the upper end of this range.)

  • If byte 0's value is 0xF8, then:

    • interpret the first 9 bits from byte 1 and byte 2 as an unsigned integer, and let that be the length of the payload in bytes (between 0 and 512); If this length is 510 or greater then treat this memo as invalid (i.e. do not display anything to the user except an error message saying that the memo is invalid.) Inspect the padding after the end of the indicated length, and if it contains anything other than 0 bytes treat the memo as invalid.
    • take the remaining 7 bits from byte 1 and byte 2, and treat that as an arbitrary application-defined "type" field
    • deliver to the application a 3-tuple of the following data:
      • the type (an unsigned integer between 0 and 127 inclusive on both ends)
      • the length (an unsigned integer between 0 and 509 inclusive on both ends)
      • a byte string of that length with contents equal to the payload
  • If the first byte's value is between 0xF9 and 0xFE inclusive on both ends, then this memo is from the future, because first byte of 0xF90xFE are reserved for future specifications of this protocol.

  • If byte 0's value is 0xFF then the reader should not make any other assumption about it. If you want to put data into a memo field which is inconsistent with this spec, then just put 0xFF as the first byte, and then do whatever you want with the remaining 511 bytes.

@whyrusleeping
Copy link

How about for the case of 0xf8, we use a varint for length, and a varint for the type field. That way, for small messages, we only take one byte for the length (and two bytes for longer messages) and we don't limit ourselves to only 128 message 'types'.

@jbenet
Copy link

jbenet commented Nov 12, 2016

Proposal: use multiformats varint TLV

In https://github.com/multiformats, we use the construction:

<varint-type><varint-len><value>

with an associated "codec table" that maps <varint-type> to the types you care about. These tables can be:

  • system or use-case specific (we started this way, one per system/use-case, like multihash)
  • universal (we'll be moving to a single table soon)

It's a "compact and extensible" TLV thing. I think it could fit here too. It would use up at least 2 bytes per value, but it would make sure:

  • length is always unambiguous.
  • type is unambiguous (if in the codec table)
  • type is extensible forever
  • types can offer both agreement-based types (common table, shipped with software) and permissionless innovation (reserved ranges, as in zooko's proposal above, for people to extend themselves)

cons:

  • varints (i dont think this is a big deal, but some do)
  • codec table (must maintain a table. but i think this is unavoidable, as "range X represents Y" is another form of "table" hardcoded in the software).

Proposal Qs answered

  1. How do you indicate whether your memo matches the rest of this specification or not?
  • the type codec table is infinitely extensible.
  • the type codec table can have one entry to say "raw binary"
  • the type codec table can have one entry to say "explicitly NOT TLV".
    • This could be done with and without the length. i would keep the length, but for completeness, <varint-code-saying-NOT-TLV><value> would not specify the length,. This would abandon the invariant that "length is always specified", but there may be valid cases for this.
  1. How do you indicate what is the length of your memo?
  • a varint length in the prefix. L in TLV.
  1. How do you indicate what is the encoding, codec, or type of the data in your memo?
  • a varint code in the prefix. mapping to an agreed-upon codec table, which may or may not have reserved ranges.
  1. Should you display it to the user, and if so how?

(assuming this means "users of the code)

  • Yes, i would return the entire TLV value to the user, so they can make use of the type and length information in applications. An optional "without prefix" accessor could exist that removes the prefix for convenience, but this is a utility function, that leverages only the knowledge that there's two varints at the beginning.

Subproposal: mutlicodec-packed table

You can further choose to use the "multicodec-packed" table as the "codec table", if our types are desirable to you. But this is strictly a separate concern. basically:

<multicodec-packed><varint-len><value>

where <multicodec-packed> means a code in the multicodec table, being defined in this PR. You are very welcome to come and add your types to the table as needed. But "which types we care about" and "keeping the codec table small to avoid wasting one more byte" can be a contentious detail. (i personally think it's not very serious thing, given just 2 bytes = 16K types... and i dont expect to get to 3 in 10 years).

@zookozcash
Copy link

Okay, here's an update of https://github.com/zcash/zcash/issues/1849#issuecomment-260093393 with the varint/TLV suggestions from @whyrusleeping and @jbenet

  • If the first byte (byte 0)'s value is 0xF7 or smaller, then the reader should:

    • strip any trailing zero bytes,
    • decode it as UTF-8 (replacing any incorrect UTF-8-encoded byte sequences with the replacement character U+FFFD), and display it to the user as a human-readable string.

    (Note that this is correcting an error in Zcash protocol v2016.0-beta-1.10, which said to use 0xF5 instead of 0xF7 as the upper end of this range.)

  • If byte 0's value is 0xF8, then:

    • Interpret the next few bytes (1 to 9 of them) as a 64-bit unsigned varint/ULEB, and use it as an arbitrary application-defined "type" field.
    • Interpret the next bytes (1 to 2 of them) as a 16-bit unsigned ULEB, and use it as the length field. If 1 + the number bytes used for the type field + the number of bytes used for the length field + the length > 512 then error out, i.e. do not do any further processing, and do not return any information about the memo to the caller other than that it was incorrectly formatted.
    • Inspect the padding after the end of the indicated length, and if it contains anything other than 0 bytes treat then error out.
    • Return to the caller a 3-tuple of the following data:
      • the type — an integer in [0…2^64)
      • the length — an integer in [0 and 509]
      • a byte string of that length which contains the payload
  • If the first byte's value is between 0xF9 and 0xFE inclusive on both ends, then this memo is from the future, because first byte of 0xF90xFE are reserved for future specifications of this protocol.

  • If byte 0's value is 0xFF then the reader should not make any other assumption about it. If you want to put data into a memo field which is inconsistent with this spec, then just put 0xFF as the first byte, and then do whatever you want with the remaining 511 bytes.

@daira
Copy link
Collaborator

daira commented Nov 13, 2016

The specification is correct in saying that the range of valid UTF-8 start bytes is from 0 to 0xF4 inclusive. @zookozcash, you may not have taken into account that the most significant 3 bits of a 21-bit code point are necessarily 100, because code points only go up to 0x10FFFF.

@zookozcash
Copy link

zookozcash commented Nov 13, 2016

Okay here's an update of https://github.com/zcash/zcash/issues/1849#issuecomment-260151170 with the correct value of the lowest number that can't be a valid first byte for a UTF-8 encoded string:

  • If the first byte (byte 0)'s value is 0xF4 or smaller, then the reader should:
    • strip any trailing zero bytes,
    • decode it as UTF-8 (replacing any incorrect UTF-8-encoded byte sequences with the replacement character U+FFFD), and display it to the user as a human-readable string.
  • If byte 0's value is 0xF5, then:
    • Interpret the next few bytes (1 to 9 of them) as a 64-bit unsigned varint/ULEB, and use it as an arbitrary application-defined "type" field.
    • Interpret the next bytes (1 to 2 of them) as a 16-bit unsigned ULEB, and use it as the length field. If 1 + the number bytes used for the type field + the number of bytes used for the length field + the length > 512 then error out, i.e. do not do any further processing of the memo, and do not return any information about the memo to the caller other than the fact that it was incorrectly formatted.
    • Inspect the padding after the end of the indicated length, and if it contains anything other than 0 bytes then error out.
    • Return to the caller a 3-tuple of the following data:
      • the type — an integer in [0…2⁶⁴)
      • the length — an integer in [0…510)
      • a byte string of that length which contains the payload
  • If the first byte's value is between 0xF6 and 0xFE inclusive on both ends, then this memo is from the future, because first byte of 0xF60xFE are reserved for future specifications of this protocol.
  • If byte 0's value is 0xFF then the reader should not make any other assumption about the memo. If you want to put data into a memo field that doesn't use the type-length-value scheme above, then put 0xFF as the first byte, and then do whatever you want with the remaining 511 bytes.

@zookozcash
Copy link

@whyrusleeping
Copy link

@zookozcash Updated! And yeah, that makes sense to me, i'll go ahead and implement your proposal in zmsg's message parsing

@jbenet
Copy link

jbenet commented Nov 13, 2016

@nathan-at-least
Copy link
Contributor Author

nathan-at-least commented Nov 18, 2016

I haven't reviewed @jbenet or @zookozcash s' proposals yet. Instead I want to brain dump some potential use cases:

Sender's Address: encode the sender's preferred address in the memo field.

  • basic scheme to support two-way relationships.
  • A common use might be refunding incorrect amounts, over payments, etc…
  • Vendors may be able to use this as a rudimentary customer tracking system (wallet UIs should obviously make this a clear opt in/out feature). Eg: Customer sends deposit, vendor misses it due to a transient db failure, customer says "I want a refund", vendor says: "If you a wallet that supports the Standard Sender's Address Scheme, tell us your address and we can scan the blockchain to find your deposit." (There is a lot of room for better ways to manage this.)
  • Other uses might be to bootstrap a two-way relationship, such as when Alice sends Bob a tip to his public tipping address, then Bob uses something like a 'Z message app' to tell Alice 'Thanks!'.

@whyrusleeping
Copy link

Chatting with @nathan-at-least today, I think with the length prefixed codes, we can have multiple in a given memo field. This would allow us to do things like have a zmsg and a return address (using a standardized return address format) in the same memo.

@daira daira changed the title Memo field format specification. ZIP: Memo field format specification Apr 27, 2017
@zawy12
Copy link

zawy12 commented Jan 21, 2018

I'm a little confused or maybe concerned about Zooko's comments above beginning with

If byte 0's value is 0xF5, then:....

I hope the memo field when beginning with 0xF5 remains a free-for-all (unrestricted binary). I'm working on a English compression scheme for it for private messaging. It appears 3x compress (1500 bytes of ascii text) is the best that can be done because it's too short to hardly benefit from message-specific lookup tables in the stream.

@daira daira changed the title ZIP: Memo field format specification [ZIP 302] Memo field format specification Apr 10, 2018
@str4d
Copy link
Contributor

str4d commented Nov 28, 2019

@zawy12 per the current state of the ZIP 302 draft, you'd use 0xFF for unrestricted binary instead of 0xF5. If you had a specific format in mind that you wanted to generalize, you'd write a ZIP that requests a particular type ID.


I've been looking into this again (after my last comment on the ZIP draft five months ago), and I see that zmsg appears to both implement the current version of this draft, and define a type byte that hasn't been requested in a ZIP. I'm going to take this as an indication that the current ZIP draft is stable, and implement it in our Rust crates. However, I see that @whyrusleeping suggested packing multiple payloads inside 0xF5, which is incompatible with the current ZIP draft (which mandates checking that the padding is all-zeroes). It doesn't look like zmsg implements this check. Thoughts? Is it worth allowing 0xF5 to be a vector of payloads, or is that too much complexity?

@str4d
Copy link
Contributor

str4d commented Nov 28, 2019

Having read further back, I also see that the TLV suggestion also came with a "use the multicode table" sub-proposal. I can't find 0xa0 in there though, so it's not clear to me whether zmsg is using it or not.

In any case, the current draft maintains @zookozcash's "arbitrary application-defined" language, which I think is not precise enough - we do want to coordinate within the Zcash ecosystem at least on payload types. I am however dubious about using the multicode table for this, given that we only have 512 bytes to work with, and if we define our type field as a multicode value, we inherently have the restriction "for multicodes with valid lengths within X fluidly-defined available space", which is very messy.

Instead, I'm inclined to tighten the ZIP draft up so that we have a Zcash-specific table, and then if we want to settle on the multicode table, we could do so in #247; it currently enables a single TLV of up to around 64KiB, which should be plenty large enough for most multicode types. nope, I see there's a multicode type for a Zcash block, whick by definition cannot fit inside a Zcash block. So we'd still have to restrict ourselves to a rather arbitrary subset of the multicode types; I'm back in favour of just defining our types through the ZIP process.

@str4d
Copy link
Contributor

str4d commented Nov 28, 2019

I have implemented almost all of the current draft ZIP in zcash/librustzcash#177. The only part I have not implemented yet is the 0xF5 logic, per the above questions.

@whyrusleeping
Copy link

@str4d defining your own table makes sense. I just want to make sure that things are extensible, so in zmsg, when using 0xf5, we read two leb128 uvarints from the front of the buffer, use the first as the 'type' of the field, and the second is the length. Which i think is exactly what zooko proposes

@str4d
Copy link
Contributor

str4d commented Nov 28, 2019

Yep, that's almost what is in the current ZIP 302 draft. The length field should be at most 2 bytes, and we should also be consistent about enforcing canonicity.

Having thought about this overnight, I really like the idea of having multiple payloads packed into a single memo field - in particular, specifying a machine-readable return address alongside a human-readable text memo. It looks like zmsg isn't enforcing the zero-padding yet, so this change should be compatible.

My suggestion is that we define the type 0x00 to be the "no more payloads" marker, so that it coincides with the padding. Then we read as many TLV tuples as we can until we either reach the end of the memo field or a zero type (at which point we verify that all remaining bytes are zeroes). The size requirements on the length fields would be cumulative, meaning that implementations should aim to pack in smaller or fixed-width payloads before variable-width ones (so in my example above, a wallet GUI would have a "include return address" checkbox, and clicking it deducts space from the memo size indicator).

@str4d
Copy link
Contributor

str4d commented Nov 28, 2019

@whyrusleeping could you also expand on your 0xa0 type? I see a mention of ASCII in the zmsg source; do you intend it to only be ASCII, or could that type be the indicator for a UTF-8-encoded text memo (with no zero padding, c/f an all-text memo field)?

@str4d
Copy link
Contributor

str4d commented Nov 28, 2019

I've added an implementation of structured memos with multiple payloads to zcash/librustzcash#177.

@str4d
Copy link
Contributor

str4d commented Dec 3, 2019

Looking at zmsg's memo parsing, it looks like it parses the contents of the 0xa0 type using the same string function as a regular text memo. This means that the 0xa0 type for Zcash should be treated as UTF-8-encoded text.

@jackgavigan
Copy link
Contributor

Be aware that, at some point, we should consider specifying how to attach Travel Rule information in the memo field. See https://electriccoin.co/blog/the-fatf-recommendations/ for background info.

A Joint Working Group on interVASP Messaging Standards has been spun up to define a "Universal common language for communication of required originator and beneficiary information between virtual asset service providers". Once that's complete, we could define how VASPs should format that information for transmission via the encrypted memo field.

Off the top of my head, I can imagine it looking something like a ;-separated set of fields, with:

  • Zcash interVASP Messaging Protocol version
  • Fields specification
  • Payload
    • Originating VASP (Required)
    • Originator's Name (Required)
    • Originator's Account Number (Optional)
    • Originator's Address (1 of these 4 is required)
    • Originator's National Identity Number (1 of these 4 is required)
    • Originator's Customer Identification Number (1 of these 4 is required)
    • Originator's Date and Place of Birth (1 of these 4 is required)
    • Beneficiary's VASP (Optional)
    • Beneficiary's Name (Required)
    • Beneficiary's Account Number (Optional)
    • Freetext message (Optional)

Example: 0.1a;7A6;gemini.com;Satoshi Nakamoto;GEMUS-SN0001;839-08-6886;Coinbase;Hal Finney

@daira
Copy link
Collaborator

daira commented Feb 13, 2020

Since space is at a premium but extensibility is important, I suggest encoding the required fields first, and then any optional fields using a key-value encoding.

@nathan-at-least
Copy link
Contributor Author

I propose we include a bit of "standardization process" for payload "type_id" to strike a good balance between permissionless prototyping versus cross-ecosystem standardization. On example of how to do this is to segregate the 64bit type_id range between "standardized" and "non-standardized / prototypical" ranges (similar to the "X-" header convention/standard from HTTP).

For example, we might say a type_id <= 0xFFffFFff should only be used if ZIP 302 has been extended to define that type_id. OTOH, it is possible and encouraged for developers to select random type_id values in the range above that for prototyping their own protocols. If a prototyped type_id becomes widely used, that's a great candidate for standardizing with a ZIP. In that case, there may be a period (block height range?) where the large and also a new small (standardized range) type_id have identical semantics.

Then we can "guard" the small compact type_id with the ZIP process. Those should be reserved for known uses that all (or many) wallets should support, are well known, and are demonstrated to be useful.

@nathan-at-least
Copy link
Contributor Author

nathan-at-least commented Feb 13, 2020

I have a concern about the multiple payload approach: the implementation of multipayload in zcash/librustzcash#177 has. Here's the enum Payload definition, where a memo contains a Vec<Payload> (ie: multiple payloads):

pub enum Payload {
    /// A payload type we don't know about.
    Unknown {
        type_id: u64,
        length: u16,
        value: Vec<u8>,
    },
}

My concern is about the relationship between multiple payloads with arbitrary type_id that can be defined over time by different authors?

Example Issue

For example imagine if one type_id is for "alleged sending zaddr" (I say "alleged" because there's no authentication of the included address). Now what happens if there are two such payloads in a single memo? Assuming there's no clear guidance in the specification for that type_id different wallets may do different things, and this could lead to security issues (or just general UX problems).

Furthermore, the more distinct type_id types that get defined, the more complex their relationships.

I'm not sure how concerned I should be about this, but it bothers me because I don't know what all future use cases or apps might need/do.

Example of Alternative - Single Payload-per-Memo

An alternative approach is to say there's only a single item in a memo, then when combinations are desired, they are codified explicitly by introducing a specific type_id and its semantics. Let's call this approach "single-payload-per-memo" or SPPM for short whereas the code linked above is "multi-payload-per-memo" / MPPM.

Spec Examples for MPPM vs SPPM:

An MPPM spec might specify type_id 's in a table that's updating with future ZIPs, and one entry might look like this:

type_id 0x<BLAH> - "Unauthenticated Reply-To Address"

  The encoded bytes are a zcash address. Wallets should make it convenient for the user to send funds or memos to that address. Wallets SHOULD help users understand the implications that this address is unauthenticated, and the address owner may not be aware of this memo or transaction. (FIXME: How could wallets do that?). If more than one Unauthenticated Reply-To Address TLV item appears in a single memo, wallets MUST display and use ONLY the first one present.

Meanwhile, an SPPM equivalent might look like this:

type_id 0x<FOO> - "Anonymous User Message"

  The encoded bytes are a utf8 string to be displayed to the user. This behavior is identical to the ZIP 302 case where the initial memo byte is < 0xf5, except in this case the memo is encapsulated in a TLV container. (FIXME: Why? Is this helpful or does it just add confusion / complexity?)

type_id 0x<BLAH> - "Unauthenticated Reply-To Address and User Message"

  The encoded bytes contain an unauthenticated reply-to message and a UTF8 message. Wallets MUST display the reply-to address and the message to the user. Wallets MUST ensure the two fields are distinct, so that they can recognize the difference between this message versus an Anonymous User Message that includes text indicating a reply-to address (ie: avoid injection / confusion attack possibility).

The SPPM approach trades off flexibility for predictability.

Notice that it's possible (with yet more overhead) to embed the MPPM approach inside the SPPM approach:

type_id 0x<QUUZ> - Multipart TLV

  Contains a sequence of "Sub-SPPM TLVs". Each of those has `(subtype_id, length, bytes)`. The subtypes are defined in Appendix Blah

Is it worth the overhead? If we took that approach and most applications ended up just using "Multipart TLV" anyway, all of the same UX/security concerns just resurface.

Follow up Thoughts

I notice this reminds me a bit of Bitcoin Script versus Transparent Zcash Extensions. I like the explicit less flexible approach of the latter compared to the former. It's less dynamic and flexible but safer to reason about.

Some exploratory questions:

  • which approach is more likely to lead to consistent behavior across wallets in the ecosystem?
  • which approach is more likely to lead to rapid development and prototyping and "zapp" development?
  • which approach makes it easier to implement a wallet or automated "zapp" infrastructure (here I mean backend automation that responds to incoming payments/memos)? Conversely which approach makes it easier to introduce UX or security problems unwittingly?

@nathan-at-least
Copy link
Contributor Author

Brainstorm of use cases:

User-facing Wallet:

  • Display user message (with or without reply address)
  • Unauthenticated Reply-To Address and streamline wallet actions around it (ex: "Send reply" button)
  • Authenticated Reply-To Address - Same as above but the address is authenticated. How? One example might be a proof that the zaddr is derived from the spending authorization of at least one of the inputs to the transaction. (Is this better than Unauthenticated Reply-To? In what circumstance?)
  • Referencing Previous Memos - a transfer may wish to reference a previous transfer for a variety of reasons. For example, suppose a Reply-To Address is used for a refund, so the refunding transaction may wish to reference the initial sending transfer. (Note: This may rely on payment disclosure.)
  • Memo to Self - Wallets could allow users to enter a private memo for their own record keeping / memory for various transactions. This memo would live on the change output.
  • Travel Rule streamlining for financial institutions following USG regulation or FATF guidance.
  • Contact Info Sharing - there may be future cases where users want to introduce each other through memos somehow.

Zapp use cases

Zapps just a term I'm using to mean backend automation that reacts to memos:

  • register a zaddr for service
  • post to messaging board
  • paid hidden oracle - pay a zaddr to get a reply with the latest oracle value for some public or private value
  • various cross-chain protocols... maybe the second leg of an XCAT-like thing?

Plumbing Usages

  • memo compression
  • multi-part memos
  • links to external storage. Tricky and dangerous in terms of UX, security, and privacy. ex: large memo stored off chain in IPFS. Or more simply/likely: URLs.

More?

Submit your own!

Concern

Note as per my concern above, mixing and matching all these cases might lead to unexpected or unanticipated behavior across wallets or Zapps. We also need to consider confused deputy attacks.

For example, could Alice trick Bob's wallet into sending a refund with an Unauthenticated Reply-To to Charlie's service that causes the Bob's zaddr to get registered with some service that Bob is unaware of?

@nathan-at-least
Copy link
Contributor Author

With my concerns and use case brainstorms shared, I'm fine for the rest of the Zcash community to decide on the ZIP 302 contents. My only direct request is for "prototyping vs standardized" range for type_id fields, because I want to ensure it's easy to permissionlessly prototype.

@str4d
Copy link
Contributor

str4d commented Feb 29, 2020

@hdevalence has suggested that ZIP 302 treat invalid UTF-8 text memos as completely invalid, instead of replacing invalid sequences. The change would enable parsing APIs to preserve the exact byte serialization of the memo (c/f zcash/librustzcash#177 where Memo::from_bytes does not). I don't currently believe that actually matters; the exact byte serialization of memos is perserved inside the note ciphertexts, and I don't know of any use-cases where parsing and then serializing a memo is required to preserve the exact byte serialization. However, I also ignore invalid UTF-8 text memos instead of replacing invalid sequences in z_viewtransaction, so I'm not strongly against it.

@str4d
Copy link
Contributor

str4d commented Feb 29, 2020

I've also removed the TLV structured memo field logic from zcash/librustzcash#177 for now, while we thrash this out. It's currently living in https://github.com/str4d/librustzcash/tree/structured-memo-types.

@daira daira transferred this issue from zcash/zcash May 5, 2020
@dconnolly
Copy link
Contributor

dconnolly commented May 26, 2020

Zbay, Zmsg has implemented some of this so we should probably try to be consistent with them on whatever decision we make with what to do with invalid UTF-8 sequences.

@str4d
Copy link
Contributor

str4d commented Mar 23, 2021

AIUI the main point of contention remaining is the meaning of 0xF5:

  • There's an argument that 0xF5 should be used for the given TLV format, because we had it written up since the beginning (in a GitHub issue and ZIP draft PR), and at least one project implemented it (though unsure whether it is actually used much currently).

  • There's another argument that 0xF5 should be abandoned to private agreement, as was in the protocol spec from the beginning, because some devs interpret that to mean "no public agreement / standard". That would mean 0xF5 and 0xFF are functionally identical. (We are definitely keeping 0xFF because there is no ambiguity there.)

In an effort to make progress on this, I have removed 0xF5 from the ZIP 302 draft. We can merge the draft as-is so developers have a published URL to work from, and then I'll make a separate PR re-introducing the TLV spec that we can continue discussing.

@str4d
Copy link
Contributor

str4d commented Oct 6, 2022

I have now opened the separate PR for the TLV spec: #638.

@str4d
Copy link
Contributor

str4d commented Nov 23, 2022

In ZIP Sync today, we decided that:

  • We would specify the multi-TLV format using prefix byte 0xF7.
  • We would abandon 0xF5 as "for legacy private agreement", and document that newer applications should prefer 0xFF which is unambiguously for this purpose.

We decided that using up another top-level prefix byte here is fine, because it opens up the typecode space significantly (it adds >250 new single-byte typecodes that can encode 509 bytes, which is not a significant reduction from the 511 bytes that a top-level prefix byte can encode).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants