Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZIP ???] Multipart Memos #247

Closed
wants to merge 1 commit into from
Closed

Conversation

str4d
Copy link
Collaborator

@str4d str4d commented Jun 27, 2019

Depends on #105.

@leto
Copy link

leto commented Jul 25, 2019

@str4d this is an interesting ZIP in relation to Hush, who is exclusively focused on the memo field and using it to the full potential. It's expected and inevitable that Hush will diverge from how Zcash will use the memo field, but I hope the following constructive criticism is useful to you in relation to how Zcash will use memos.

You say It is possible to send multiple outputs to the same recipient address within a single transaction but that is a contentious feature that has been decided against by Daira in the past. That decision needs to be undone to move forward with this ZIP. Hush allows "duplicate receivers" as I call them.

You say total length of an encoded multipart memo is restricted to 64kiB, but will that be a consensus rule?
I have created 1MB "multipart" memos on Zcash mainnet, it's possible right now. This means that this ZIP requires new consensus rules, which is not clear from the ZIP.

You say the maximum number of shielded outputs that a transaction could contain while still fitting into a block is around 2100 but that is only theoretical, perhaps achievable on testnet. Real Zcash mining pools do not allow a transaction to fill a block. From my research, all the big pools limit a transaction to half a block. They do this via custom patches, which is dangerous, because it could lead to a weird chain fork. I describe this in more detail in the Sapling Woodchipper CVE (which has still not been addressed): https://saplingwoodchipper.github.io/

Additionally, why do you not have a checksum for the full data that will be reconstructed by the multi-part memo? HushList protocol uses checksums/hashes of the full data to make sure that no data has been corrupted.

Lastly, the word "dummy" has negative connotations and might be misunderstood by non-native English speakers. A transaction with amount=0 is just as valid as any other transaction, and it has full use of the 512 byte memo field. I don't think referring to them as "dummy" outputs is useful nomenclature. Something like "amount=0 outputs" or "zero value outputs" is more clear.

@str4d
Copy link
Collaborator Author

str4d commented Aug 3, 2019

It's expected and inevitable that Hush will diverge from how Zcash will use the memo field

That may be the case, but there is still benefit in reducing the complexity for implementations handling any divergence. Does Hush use ZIP 302 or something similar for handling its binary encodings?

You say It is possible to send multiple outputs to the same recipient address within a single transaction but that is a contentious feature that has been decided against by Daira in the past. That decision needs to be undone to move forward with this ZIP. Hush allows "duplicate receivers" as I call them.

At a consensus level, there's nothing we can enforce here (by design), so I agree it is important to be clear about the defaults. My own position is that a transaction should not contain more than one value-carrying output per recipient address, but that additional zero-value outputs are fine (as they can be easily pruned by nodes so there is no ongoing witnessing cost). I would appreciate @daira's feedback on this ZIP draft.

You say total length of an encoded multipart memo is restricted to 64kiB, but will that be a consensus rule?
I have created 1MB "multipart" memos on Zcash mainnet, it's possible right now. This means that this ZIP requires new consensus rules, which is not clear from the ZIP.

Yep, I'm fully aware that 1MiB worth of memo to a single recipient can be created in current transactions, as I alluded to in the rationale:

The total length of an encoded multipart memo is restricted to 64kiB in order to bound the
privacy leakage associated with generating transactions containing many shielded outputs.
A 64kiB multipart memo would require 130 shielded outputs, whereas the maximum number of
shielded outputs that a transaction could contain while still fitting into a block is
around 2100.

The privacy leakage rationale above is based on the hypothesis that a Schelling point around multipart memo usage in the low kiB range means that such transactions are less likely to stand out compared to e.g. shielded mining pool payouts or batched exchange payouts. As for the suggestion of a 64kiB limit, it's a reasonable MTU to work within and it fits nicely in a 2-byte length field. A sender following this ZIP could send full-length multipart memos to at most 16 recipients within a single transaction; again this seems like a reasonable Schelling point. It would be very useful to have input from potential users of this specification, to see where their use-cases would sit.

There's no need IMHO for this ZIP to specify any change to the consensus rules, as the contents of the memo fields are inherently not part of the core consensus rules (only standard rules). Any proposals to alter the consensus rules regarding the maximum number of shielded outputs would be better served by a separate ZIP.

The 64kiB length limit is however a strong standard rule, in that anyone working with this ZIP's format cannot create compatible longer multipart memos, because of the fixed-width 2-byte length field in the header. Lower limits could potentially be surpassed by implementations that don't place appropriate bounds checks on the values within the length fields.

You say the maximum number of shielded outputs that a transaction could contain while still fitting into a block is around 2100 but that is only theoretical, perhaps achievable on testnet.

The policies that individual mining pools choose to enforce have no bearing on the validity of that statement, which is derived from the consensus rules.

Additionally, why do you not have a checksum for the full data that will be reconstructed by the multi-part memo? HushList protocol uses checksums/hashes of the full data to make sure that no data has been corrupted.

Checksums are used to detect errors during transmission or storage of data. Zcash uses AEAD ciphertexts which natively provide this functionality, and additionally all ciphertext fields are covered by the transaction signatures. This makes corruption of the encrypted memos in-flight from sender to recipient impossible without invalidating the entire transaction. The only other source of corruption would be bugs in the implementation of the sender or recipient, and I'm not (yet) convinced that a checksum baked into the protocol (which would also be subject to implementation bugs) is more beneficial than e.g. test vectors. But it's a relatively small overhead, so I wouldn't be opposed to including one.

I don't think referring to them as "dummy" outputs is useful nomenclature.

I refer to them as "dummy outputs" in the ZIP because that is the nomenclature used in the protocol specification. If the specification terminology changes, I will update this ZIP to match. Please file an issue for tracking this.

@leto
Copy link

leto commented Aug 4, 2019

@str4d thank you, I really appreciate your detailed feedback.

In an attack scenario, where an attacker sends a 1MB memo data with "faked" 2 byte length field, what are wallets supposed to do? Without consensus rules to protect against this situation, many weird bugs could potentially be tickled.

Does Hush use ZIP 302 or something similar for handling its binary encodings? No, Hush does not plan to use ZIP 302. Since we will allow storing more data, a 2 byte length field is not sufficient and we plan to store various other kinds of metadata as well.

@str4d
Copy link
Collaborator Author

str4d commented Aug 4, 2019

In an attack scenario, where an attacker sends a 1MB memo data with "faked" 2 byte length field, what are wallets supposed to do? Without consensus rules to protect against this situation, many weird bugs could potentially be tickled.

I can't quite tell if this is what you mean by "faked" length field, but I assume you are asking what happens if the length field is set to a shorter value than the actual amount of multipart data that the adversary is trying to convey.

Getting the trivial case out of the way: A well-written recipient implementation would reject this, as the length of the reconstructed data does not match the length field. But let's assume the recipient has a bug and is not enforcing this.

In the ZIP draft as-written, the non-header multipart chunks are identified by memo part number, which is a fixed 1-byte integer. This means that at most 255 multipart chunks could be used in a multipart memo (recalling that the header itself is identified by memo part number zero), for a total maximum reconstructed length of around 126kiB. This is just under double the maximum specified in the ZIP, and 8x smaller than the 1MiB maximum data that could be fitted into shielded outputs.

If an adversary attempted to actually send 1MiB of data to the same recipient address using the multipart format in this ZIP, they would have on average 8 chunks with the same memo part number. Recipient implementations would reject the entire multipart memo in this case (under the no-duplicate-part-numbers rule); maybe they should also reject the entire transaction?

If the implementation had another bug where it was not enforcing unique memo part numbers, then the most likely result is that later-parsed chunks would overwrite earlier-parsed chunks (if pending chunks were stored in slots), or subsequent duplicate chunks would simply be ignored (if the recipient implemented a scanning reconstruction). Whether or not this introduces an exploitable bug (due to potentially-differing interpretations of the memo by correct and buggy recipients) would then depend on how the reconstructed multipart memo is used by the recipient, and thus a different-layer concern (for the ZIP defining the protocol inside the multipart memo).

This ZIP already points out that invalid memos MUST NOT be acted on in the Security and Privacy Considerations, but I'll have a think about how I can make it clearer that ZIPs defining a multipart memo type need to take into account the possibility of bugs in the recipient implementation of this ZIP. Note that a checksum in this ZIP's format would not help at all here, because in this thought experiment the entire transaction is adversarial, so the adversary can ensure that the checksum passes for their target recipient (even if it wouldn't necessarily pass on all recipient implementations). However, checksums in the higher-layer protocol may indeed make sense, where this multipart memo format is itself the transport layer.

Note also that all of the above discussion applies to adversarial multipart memos that are smaller than 64kiB, which can just as easily be created with duplicate part numbers. So a consensus rule affecting the number of shielded outputs doesn't prevent recipient implementation bugs from being a potential problem.

No, Hush does not plan to use ZIP 302. Since we will allow storing more data, a 2 byte length field is not sufficient and we plan to store various other kinds of metadata as well.

The ZIP in this PR is not ZIP 302 (it hasn't had a number allocated by the editors yet). ZIP 302 (PR) standardises how to parse a single memo field when it contains non-UTF-8 data. I'm unaware of any deployed applications using it yet within the Zcash ecosystem; if there were any, I'd have expected to see a ZIP requesting to reserve a type byte, like this ZIP does.

@daira
Copy link
Collaborator

daira commented Feb 13, 2020

@leto wrote:

You say It is possible to send multiple outputs to the same recipient address within a single transaction but that is a contentious feature that has been decided against by Daira in the past.

I don't remember having decided that. As @str4d says, it's clearly not enforceable at the protocol level.

@str4d
Copy link
Collaborator Author

str4d commented Feb 13, 2020

@daira requested that this proposal instead be combined with ZIP 302, and that we use ZIP 302 as the canonical memo field specification with future updates as necessary. Once #105 is merged I'll rebase this PR to instead update ZIP 302.

@adityapk00
Copy link

Zecwallet Lite is implementing multi part memos, but with a simple text-only scheme. It splits the memos as (1/3) part1, (2/3) part2, (3/3) part 3 etc...

The text-only encoding is motivated in part by backwards compatibility with zecwallet lite itself and other wallets that haven't implemented this zip.

I'm happy to add support for this ZIP as well, if we can agree with the format (The current format looks good to me)

Comment on lines +36 to +37
transaction, and it is also possible for some or all of these outputs to have zero value
(dummy outputs). This can be leveraged to convey more information to the recipient than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
transaction, and it is also possible for some or all of these outputs to have zero value
(dummy outputs). This can be leveraged to convey more information to the recipient than
transaction, and it is also possible for some or all of these shielded outputs to have zero value
(dummy shielded outputs). This can be leveraged to convey more information to the recipient than

Comment on lines +49 to +54
A multipart memo is treated for the purpose of this ZIP as an opaque data blob of length
at most 65536 bytes. It has an associated type that defines the internal encoding. We
define type 0x00 to indicate human-readable text, which should be encoded as UTF-8 with no
trailing zero bytes, and decoded replacing any incorrect UTF-8-encoded byte sequences with
the replacement character U+FFFD. Specifications of other possible encodings are left for
future ZIPs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reinventing the wheel that has been addressed in ZIP 302 (specifically the type field) and therefore they should be merged.


- 0x00
- Total number of parts, including this header (1 byte)
- Type (1-9 bytes, as a 64-bit ULEB)
Copy link
Contributor

@dconnolly dconnolly Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Type (1-9 bytes, as a 64-bit ULEB)
- Type (1-9 bytes, a unsigned variable-length integer, corresponding to a ZIP-302 type [#zip-0302]_)

- Length of data (1-2 bytes, as a 16-bit ULEB)
- data (2-508 bytes):

- Memo part number (1 byte)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Memo part number (1 byte)
- Memo part number (1 byte) (MUST be greater than zero as the header is always part #0)

Comment on lines +118 to +119
Recipients SHOULD reject multipart memos as invalid if any of the following issues are
encountered:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOULD, not MUST?

Transactions containing invalid multipart memo encodings may pose a privacy threat
depending on how the recipient acts on the transaction. In particular:

- Recipients SHOULD NOT accept any value received alongside a ``MultipartChunk``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this needs clarifying. What about multipart memos that include payment in the header and every chunk?

@gmale
Copy link
Contributor

gmale commented Jan 28, 2021

Standardizing around this proposal is a great idea. In addition to responding to @dconnolly's great suggestions, it seems like the next step may be to move the underlying proposal forward? ZIP 302

Are there any blockers on these two zips?

@nuttycom
Copy link
Contributor

nuttycom commented Apr 9, 2024

This proposal will probably be subsumed by a proposal to decouple memos from outputs in NU6.

@str4d
Copy link
Collaborator Author

str4d commented Apr 23, 2024

Closing because we're going to pursue #627 instead.

@str4d str4d closed this Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants