-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZIP ???] Multipart Memos #247
Conversation
@str4d this is an interesting ZIP in relation to Hush, who is exclusively focused on the memo field and using it to the full potential. It's expected and inevitable that Hush will diverge from how Zcash will use the memo field, but I hope the following constructive criticism is useful to you in relation to how Zcash will use memos. You say You say You say Additionally, why do you not have a checksum for the full data that will be reconstructed by the multi-part memo? HushList protocol uses checksums/hashes of the full data to make sure that no data has been corrupted. Lastly, the word "dummy" has negative connotations and might be misunderstood by non-native English speakers. A transaction with amount=0 is just as valid as any other transaction, and it has full use of the 512 byte memo field. I don't think referring to them as "dummy" outputs is useful nomenclature. Something like "amount=0 outputs" or "zero value outputs" is more clear. |
That may be the case, but there is still benefit in reducing the complexity for implementations handling any divergence. Does Hush use ZIP 302 or something similar for handling its binary encodings?
At a consensus level, there's nothing we can enforce here (by design), so I agree it is important to be clear about the defaults. My own position is that a transaction should not contain more than one value-carrying output per recipient address, but that additional zero-value outputs are fine (as they can be easily pruned by nodes so there is no ongoing witnessing cost). I would appreciate @daira's feedback on this ZIP draft.
Yep, I'm fully aware that 1MiB worth of memo to a single recipient can be created in current transactions, as I alluded to in the rationale:
The privacy leakage rationale above is based on the hypothesis that a Schelling point around multipart memo usage in the low kiB range means that such transactions are less likely to stand out compared to e.g. shielded mining pool payouts or batched exchange payouts. As for the suggestion of a 64kiB limit, it's a reasonable MTU to work within and it fits nicely in a 2-byte length field. A sender following this ZIP could send full-length multipart memos to at most 16 recipients within a single transaction; again this seems like a reasonable Schelling point. It would be very useful to have input from potential users of this specification, to see where their use-cases would sit. There's no need IMHO for this ZIP to specify any change to the consensus rules, as the contents of the memo fields are inherently not part of the core consensus rules (only standard rules). Any proposals to alter the consensus rules regarding the maximum number of shielded outputs would be better served by a separate ZIP. The 64kiB length limit is however a strong standard rule, in that anyone working with this ZIP's format cannot create compatible longer multipart memos, because of the fixed-width 2-byte length field in the header. Lower limits could potentially be surpassed by implementations that don't place appropriate bounds checks on the values within the length fields.
The policies that individual mining pools choose to enforce have no bearing on the validity of that statement, which is derived from the consensus rules.
Checksums are used to detect errors during transmission or storage of data. Zcash uses AEAD ciphertexts which natively provide this functionality, and additionally all ciphertext fields are covered by the transaction signatures. This makes corruption of the encrypted memos in-flight from sender to recipient impossible without invalidating the entire transaction. The only other source of corruption would be bugs in the implementation of the sender or recipient, and I'm not (yet) convinced that a checksum baked into the protocol (which would also be subject to implementation bugs) is more beneficial than e.g. test vectors. But it's a relatively small overhead, so I wouldn't be opposed to including one.
I refer to them as "dummy outputs" in the ZIP because that is the nomenclature used in the protocol specification. If the specification terminology changes, I will update this ZIP to match. Please file an issue for tracking this. |
@str4d thank you, I really appreciate your detailed feedback. In an attack scenario, where an attacker sends a 1MB memo data with "faked" 2 byte length field, what are wallets supposed to do? Without consensus rules to protect against this situation, many weird bugs could potentially be tickled.
|
I can't quite tell if this is what you mean by "faked" length field, but I assume you are asking what happens if the length field is set to a shorter value than the actual amount of multipart data that the adversary is trying to convey. Getting the trivial case out of the way: A well-written recipient implementation would reject this, as the length of the reconstructed data does not match the length field. But let's assume the recipient has a bug and is not enforcing this. In the ZIP draft as-written, the non-header multipart chunks are identified by memo part number, which is a fixed 1-byte integer. This means that at most 255 multipart chunks could be used in a multipart memo (recalling that the header itself is identified by memo part number zero), for a total maximum reconstructed length of around 126kiB. This is just under double the maximum specified in the ZIP, and 8x smaller than the 1MiB maximum data that could be fitted into shielded outputs. If an adversary attempted to actually send 1MiB of data to the same recipient address using the multipart format in this ZIP, they would have on average 8 chunks with the same memo part number. Recipient implementations would reject the entire multipart memo in this case (under the no-duplicate-part-numbers rule); maybe they should also reject the entire transaction? If the implementation had another bug where it was not enforcing unique memo part numbers, then the most likely result is that later-parsed chunks would overwrite earlier-parsed chunks (if pending chunks were stored in slots), or subsequent duplicate chunks would simply be ignored (if the recipient implemented a scanning reconstruction). Whether or not this introduces an exploitable bug (due to potentially-differing interpretations of the memo by correct and buggy recipients) would then depend on how the reconstructed multipart memo is used by the recipient, and thus a different-layer concern (for the ZIP defining the protocol inside the multipart memo). This ZIP already points out that invalid memos MUST NOT be acted on in the Security and Privacy Considerations, but I'll have a think about how I can make it clearer that ZIPs defining a multipart memo type need to take into account the possibility of bugs in the recipient implementation of this ZIP. Note that a checksum in this ZIP's format would not help at all here, because in this thought experiment the entire transaction is adversarial, so the adversary can ensure that the checksum passes for their target recipient (even if it wouldn't necessarily pass on all recipient implementations). However, checksums in the higher-layer protocol may indeed make sense, where this multipart memo format is itself the transport layer. Note also that all of the above discussion applies to adversarial multipart memos that are smaller than 64kiB, which can just as easily be created with duplicate part numbers. So a consensus rule affecting the number of shielded outputs doesn't prevent recipient implementation bugs from being a potential problem.
The ZIP in this PR is not ZIP 302 (it hasn't had a number allocated by the editors yet). ZIP 302 (PR) standardises how to parse a single memo field when it contains non-UTF-8 data. I'm unaware of any deployed applications using it yet within the Zcash ecosystem; if there were any, I'd have expected to see a ZIP requesting to reserve a type byte, like this ZIP does. |
4864dde
to
c333fe2
Compare
@leto wrote:
I don't remember having decided that. As @str4d says, it's clearly not enforceable at the protocol level. |
Zecwallet Lite is implementing multi part memos, but with a simple text-only scheme. It splits the memos as The text-only encoding is motivated in part by backwards compatibility with zecwallet lite itself and other wallets that haven't implemented this zip. I'm happy to add support for this ZIP as well, if we can agree with the format (The current format looks good to me) |
transaction, and it is also possible for some or all of these outputs to have zero value | ||
(dummy outputs). This can be leveraged to convey more information to the recipient than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transaction, and it is also possible for some or all of these outputs to have zero value | |
(dummy outputs). This can be leveraged to convey more information to the recipient than | |
transaction, and it is also possible for some or all of these shielded outputs to have zero value | |
(dummy shielded outputs). This can be leveraged to convey more information to the recipient than |
A multipart memo is treated for the purpose of this ZIP as an opaque data blob of length | ||
at most 65536 bytes. It has an associated type that defines the internal encoding. We | ||
define type 0x00 to indicate human-readable text, which should be encoded as UTF-8 with no | ||
trailing zero bytes, and decoded replacing any incorrect UTF-8-encoded byte sequences with | ||
the replacement character U+FFFD. Specifications of other possible encodings are left for | ||
future ZIPs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is reinventing the wheel that has been addressed in ZIP 302 (specifically the type
field) and therefore they should be merged.
|
||
- 0x00 | ||
- Total number of parts, including this header (1 byte) | ||
- Type (1-9 bytes, as a 64-bit ULEB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Type (1-9 bytes, as a 64-bit ULEB) | |
- Type (1-9 bytes, a unsigned variable-length integer, corresponding to a ZIP-302 type [#zip-0302]_) |
- Length of data (1-2 bytes, as a 16-bit ULEB) | ||
- data (2-508 bytes): | ||
|
||
- Memo part number (1 byte) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Memo part number (1 byte) | |
- Memo part number (1 byte) (MUST be greater than zero as the header is always part #0) |
Recipients SHOULD reject multipart memos as invalid if any of the following issues are | ||
encountered: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHOULD, not MUST?
Transactions containing invalid multipart memo encodings may pose a privacy threat | ||
depending on how the recipient acts on the transaction. In particular: | ||
|
||
- Recipients SHOULD NOT accept any value received alongside a ``MultipartChunk``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this needs clarifying. What about multipart memos that include payment in the header and every chunk?
Standardizing around this proposal is a great idea. In addition to responding to @dconnolly's great suggestions, it seems like the next step may be to move the underlying proposal forward? ZIP 302 Are there any blockers on these two zips? |
This proposal will probably be subsumed by a proposal to decouple memos from outputs in NU6. |
Closing because we're going to pursue #627 instead. |
Depends on #105.