Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Spike: Mempool support in lightwalletd #169

Closed
2 tasks
braddmiller opened this issue Jan 13, 2020 · 39 comments
Closed
2 tasks

Research Spike: Mempool support in lightwalletd #169

braddmiller opened this issue Jan 13, 2020 · 39 comments
Assignees

Comments

@braddmiller
Copy link
Contributor

braddmiller commented Jan 13, 2020

Create a methodology and related tickets to provide mempool access to connected clients. The goal is for clients to be able to see incoming transactions that they had no prior knowledge of.

Criteria:

  • Bandwidth aware
  • Preserves the privacy of the clients requesting information to the same degree as the existing lightwalletd/client contract.

Time Box: 16 hours

@gmale
Copy link
Contributor

gmale commented Jan 13, 2020

We should consider this from the perspective of both transparent and shielded transactions. It might be the case that we can already see transparent mempool information.

@LarryRuane
Copy link
Collaborator

See also #136. Lightwalletd currently does not provide any mempool related information. Here's a list of all zcashd mempool-related rpcs; it would be relatively easy for lightwalletd to implement gRPCs for any of this information:

  • getaddressmempool -- from Insightexplorer (which we have available), given a taddr or list of taddrs, returns information about any transactions in the mempool that reference those addresses.
  • getrawmempool -- takes no arguments; returns a list of txids in the mempool; there is a verbose version (same txid list, plus a little information on each tx).
  • gettxout -- given a txid and an output index, returns detailed information about that output.
  • getmempoolinfo -- not useful to us, a summary of the mempool (size, bytes summaries).

Here's an example of retrieving the entire mempool (mainnet):

$ src/zcash-cli getrawmempool 
[
  "212425aba4c3686b43cc643ca1181d4ddc0aa32abe6bf96a1803748d682a9006",
  "cf2a6fd3a12045c8a1ce5d3e78041a61bc68e3f5c8e50a4a9eef7396e8028456",
  "08f42df9408cce12706c30dd77b310fe2787a700a3b2034d694240715bafe077",
  "5b0fc43fa93ebc3af58e0dc98750f85f76134bfa1325289620506778168341b3"
]
$ 

We should make that a "streaming" gRPC, since the list could be quite large. @defuse does raise an important DoS risk in #136. To address that, maybe the wallet should request a bloom filter of the mempool? That's limited and small size, and the wallet would be able to very efficiently determine that a given txid that it cares about is in the mempool (with very high probability). It has perfect privacy properties since the lightwalletd is returning (a representation of) the entire mempool.

@braddmiller braddmiller changed the title Research Spike: Mempool support in lightwalletd (Incomplete) Research Spike: Mempool support in lightwalletd Feb 11, 2020
@lindanlee lindanlee added this to the wallet sprint 2020:30 milestone Jul 22, 2020
@LarryRuane
Copy link
Collaborator

We've had conversations in slack, and, although @pacu, @gmale, and @defuse haven't agreed to this yet, for now at least, I'm proposing the following mempool interface to lightwalletd as related to shielded transactions. We will need to discuss further what should be done for transparent transactions (see Kevin's comment above), but it's likely the following proposal can be extended to account for those as well.

The simplest possible way for the lightwalletd to provide mempool information to the wallets is to add a gRPC that simply returns the mempool (as a list of compact transactions). But if the wallets are calling this gRPC every few seconds, and if there are hundreds or even thousands of transactions in the mempool, this is very bad from a bandwidth (and CPU, battery) point of view. The first criteria in Brad's original comment above (bandwidth aware) suggests an incremental design. This means there should be a way for the wallets to be incrementally updated with the new entries in the mempool, and not have to re-fetch transactions redundantly. (We're not concerned about the lightwalletd being incrementally updated from zcashd, because they're on the same system, at least currently, and communication between them uses localhost which is very efficient. Plus, they don't have mobile memory, battery, and CPU constraints.)

The proposal is as follows:

  • A new in-memory only (non-persistent) container data structure, MempoolCache consisting of compact transactions
    • When needed (on-demand, see below) but no more often than every 2 seconds, lightwalletd re-populates this cache (from scratch) using the getrawmempool and getrawtransaction zcashd RPCs.
    • When lightwalletd detects a reorg, it clears MempoolCache (forcing it to be re-populated the next time it's needed).
  • A new gRPC, GetMempoolTx, that takes as its streaming argument a list of transaction IDs. This list tells lightwalletd which transactions the caller (wallet) already has in its mempool cache (from earlier calls to GetMempoolTx) and so are not needed in the reply. This gRPC's handler in lightwalletd will:
    • populate or refresh MempoolCache so that it's no more than 2 seconds out of date
    • respond with a streaming list of compact transactions consisting of all entries in MempoolCache whose txids are not in the argument list (thereby allowing the wallets to be incrementally updated)
    • This response list will also include "empty" compact transactions, each with a hash (txid) from the argument list but which are not in MempoolCache.

That last point needs elaboration. A compact transaction has this format:

message CompactTx {
    uint64 index = 1;   // the index within the full block
    bytes hash = 2;     // the ID (hash) of this transaction, same as in block explorers
    uint32 fee = 3;
    repeated CompactSpend spends = 4;   // inputs
    repeated CompactOutput outputs = 5; // outputs
}

The argument list to GetMempoolTx may contain txids that are no longer in MempoolCache (either because the tx was mined into a block, or it has expired, or for some other reason it was dropped from the mempool). The wallet thinks this tx is still in the mempool (it must have received it via an earlier call to GetMempoolTx and is now saying not to include it in the reply, because that would be redundant). It's helpful to tell the wallet that this tx is no longer in the mempool. The GetMempoolTx reply does this by including a CompactTx with all fields set to zero except the hash (txid). A transaction with no spends and no outputs (both zero-length) is invalid, and so can be recognized by the wallet as being this special case. The wallet should respond by removing the transaction with the specified txid from its local mempool cache (however it implements it).

An example sequence might be:

  1. Transactions A, B, C enter the mempool
  2. GetMempoolTx([]) --> [A, B, C]
  3. Transactions D, E enter the mempool
  4. GetMempoolTx([A, B, C]) --> [D, E]
  5. Transaction F enters the mempool
  6. Transaction B leave the mempool
  7. GetMempoolTx([A, B, C, D, E]) --> [B*, F] (B* is an "empty" compact transaction)
  8. Transaction G enters the mempool
  9. GetMempoolTx([A, C, D, E, F]) --> [G]

You might wonder why GetMempoolTx can't just return a separate list of txids that are no longer in the mempool. The reason is that gRPC doesn't allow a streaming reply and another reply (streaming or non-streaming). And it really does make sense for the reply to be streaming, because there could be hundreds or even thousands of compact transactions in the response.

An alternative design could be to have a second gRPC, GetMempoolDeleted that takes the same argument list (txids in the wallet's mempool) and just return the txids in its argument list that are not in the mempool, but that would require the wallet to make a separate gRPC call. But this could be done instead of what I'm proposing.

The special-case "empty" compact transactions are fairly small, just 32 bytes for the txid (which is needed in any case), plus 8 bytes for the index, 4 bytes for the fee, and two zero-length arrays (which would have some small overhead). So it's pretty efficient.

@LarryRuane
Copy link
Collaborator

A pretty simple bandwidth-efficiency enhancement to what's described in the previous comment is to have txids in the argument list be truncted txids, just the first 4 bytes or 6 bytes or so, instead of sending all 32 bytes. The only downside is the possibility of a collision -- suppose two transactions, A and B, start with the same 4 (or 6) bytes, and the wallet has received A but not B. It calls GetMempoolTx with the truncated txid; lightwalletd thinks it has sent both, so it sends neither.

One approach to this would be to just ignore this problem, since all it means is there's a 1 in 4 billion chance that the wallet won't be informed of a mempool transaction (it will discover the tx when it's mined into a block). Or we could actually solve the problem by having lightwalletd detect the collision and send both transactions -- it should always be okay for GetMempoolTx to respond with unnecessary transactions.

@LarryRuane
Copy link
Collaborator

LarryRuane commented Jul 27, 2020

UPDATE: I no longer think this is a good idea, please ignore it. (I'll leave it here at least for now.)

I just thought of another way to address this problem, but, again, I'm doubtful it's even necessary to address at all. But the client (wallet) could, in its GetMempoolTx request, include an integer from 0 to 28, which lwd interprets byte offset into the txid, and returns the 4 bytes beginning at that offset (instead of always returning the first 4 bytes). The wallet could choose this number randomly. That way, if two txids collide for this request, chances are there would be no collision on the next request a few seconds later (the two colliding txids wouldn't collide on a different set of 4 bytes).

@LarryRuane
Copy link
Collaborator

A proposed set of lightwalletd tickets, each line is a ticket:

  • Darksidewalletd changes
    • Add darkside gRPC for populating fake mempool
    • Add stub zcashd handler for getrawmempool
    • Add stub zcashd handler for getrawtransaction
  • Add mempool data structure
  • Add GetMempoolTx() gRPC

@holmesworcester
Copy link

holmesworcester commented Aug 13, 2020

@LarryRuane — Did you consider an approach where the zecwallet-light-cli or, better, some persistent process in the SDK itself would connect directly to the network as other nodes do, and pick up all new transactions as they are broadcast to the network, as miners do? What would be the downsides of this approach?

We were considering this approach because:

  1. Latency should be lower. Low latency matters especially for our use case because we're sending and receiving messages via memo, but also is also good UX for the more normal wallet use case.
  2. It seems more scalable and won't break or slow down if the lightwalletd is overwhelmed.

@gmale
Copy link
Contributor

gmale commented Aug 13, 2020

This sounds like a good idea. I'm not familiar enough with how nodes work to understand whether this is feasible on mobile. I also curious whether this could be done easily with "Zebra" nodes. (update: checked with Zebra devs and the answer is not yet)

Depending on how the networking is done, this could be a challenge on mobile because you shouldn't keep long-lived connections.

@defuse
Copy link
Collaborator

defuse commented Aug 13, 2020

Connecting directly to nodes would add a new attack surface to the wallet so we'd need weigh the risk vs reward carefully. If there's a remotely-exploitable bug in the lightwalletd<->wallet protocol, it's not too bad, because you'd still need to compromise lightwalletd to exploit it. Bugs in the protocol for talking to nodes wouldn't have that same defense-in-depth.

@holmesworcester
Copy link

holmesworcester commented Aug 13, 2020

Connecting directly to nodes would add a new attack surface to the wallet so we'd need weigh the risk vs reward carefully. If there's a remotely-exploitable bug in the lightwalletd<->wallet protocol, it's not too bad, because you'd still need to compromise lightwalletd to exploit it. Bugs in the protocol for talking to nodes wouldn't have that same defense-in-depth.

Yes. On balance I agree that the slightly lower latency isn't worth it. Scalability might be, if lightwalletd's get overwhelmed easily by lots of clients connecting to them for the latest transactions, but until that's a problem it's probably better to not add a new attack surface.

@holmesworcester
Copy link

Is there agreement on the approach here? Is there a corresponding ticket for adding this functionality to the light-cli?

@gmale
Copy link
Contributor

gmale commented Aug 21, 2020

Summary

We discussed this, in depth, today. To summarize that discussion:

There are several key phases to implement:

  1. the lightwalletd services
  2. the client interactions with (1)
  3. the privacy and security improvements that can be made in (1) and (2).

Work on (1) will begin immediately, (2) needs more research and (3) won't begin anytime soon but needs to be kept in mind while building (1) and (2).

Details

1. Lightwalletd Services

This component is the most straight-forward and can be implemented almost exactly as described in this github issue. This is unblocked, ready for work, and will be picked up by @LarryRuane next week.

2. Light Clients

Consuming the service on light clients has broader implications for the threat model and privacy. To paraphrase @defuse :

Client actions like sending a transaction or receiving a memo use an identifiable amount of bandwidth, so it's possible to reconstruct an approximate transaction graph merely by intercepting LWD's encrypted internet traffic. In order to fix that, the wallet's traffic needs to be the same, regardless of what the user is doing. Polling the mempool provides some convenient cover traffic in which to hide sends/receives.

Even with Tor, the amount of data transmitted is observable. Consequently, the current lightclient + lightwalletd model exposes weaknesses, including the following that are documented as counter-intuitive:

An adversary can
- tell that and when the user received a fully-shielded transaction
- tell that and when the user sends a fully-shielded transaction
- learn who the user is sending/receiving funds to/from in fully-shielded transactions
- tell how many transactions the user has sent or received over time
- determine whether or not the user’s wallet owns a particular address
- tell who the user is
- tell where the user is
- tell that the user spends or receives money according to a certain pattern
  (e.g. recurring payments) using fully-shielded transactions
- tell when the user spends shielded funds sent to them by the adversary

Ideally, a light client implementation should avoid these weaknesses, as much as possible. At a minimum, it should not further weaken the privacy properties of the library. Work in this area is considered blocked until further research and discussion is completed.

Additionally, on the Android/iOS side, we discussed effectively "quarantining" mempool data so that it cannot "contaminate" valid on-chain data and be used as an attack vector for tricking users into thinking that things have happened on chain. @gmale will begin working on a proof of concept in the mobile SDKs for decrypting transactions into a separate database from on-chain information.

3. Privacy Improvements

The mempool work could be leveraged to help address some of the primary weaknesses in the current privacy properties of light clients. For example, if the client and server always communicated through regularly timed, constant-sized messages, then most weaknesses related to bandwidth measurement could be eliminated.

One potential way to do this would be to wrap the lightwalletd service response in a fixed-size container, similar to how packets or frames function in networking. This container would hold a small header, describing the contents (i.e. length, checksum, etc.) and a payload of bytes that are padded, as needed, in order to achieve a fixed size for the outer container:

.==========================.             
|   Fixed-Size Container   |
.==========================.
|  +---------+----------+  |
|  | Length  | Checksum |  |
|  +--------------------+  |
|  |      Payload       |  |
|  +--------------------+  |
|  |      Padding       |  |
|  +---------+----------+  |
+--------------------------+

Both the request and response could utilize this general approach, sending and receiving data at regular intervals, and this would be fairly trivial to implement via gRPC: the container would be defined in the proto file and used as the both the input and output for the mempool service call.

Since this represents a fundamental change in design for the light client libraries, this work is also blocked. More research would be needed on the implementation details to strike a balance between performance and privacy with minimal impacts to existing code.

@gmale
Copy link
Contributor

gmale commented Aug 21, 2020

Pinging @holmesworcester and @adityapk00 and @nighthawk24 for any input here, since things are still in the research phase.

@holmesworcester
Copy link

holmesworcester commented Aug 22, 2020 via email

@leto
Copy link

leto commented Aug 23, 2020

@gmale FYI I implemented the suggestion 3 "Fixed sized containers" for the websocket Wormhole protocol of our fork of ZecWallet, which has almost all of the same privacy concerns and threat models as the lite wallet:

ZcashFoundation/zecwallet#212 (comment)
https://twitter.com/dukeleto/status/1231583675017519113

This issue has been ignored for about 1.5 years by Zcash Foundation and Company, which greatly reduces the privacy of any users who use the Zecwallet Wormhole service. It's a one line fix, there is absolutely no reason except malice or stupidity, to not fix this bug.

To ignore this fundamental issue for the Lite wallet, and kick it down the street for another few years, is a great disservice to the perceived privacy of ZEC mainnet users. Yes it's more than a one line fix here, but it's the only way to correctly solve bandwidth metadata leaks related to mempool support.

@pacu
Copy link
Contributor

pacu commented Aug 24, 2020

@holmesworcester

  1. Will you release the functionality in lightwalletd as soon as it's complete?
    Or will you wait until the client libraries are done?

It's usually developed 'out in the open' but that's more of a question for @LarryRuane

  1. Are there ways you're sure or suspect that fetching zero confirmation
    transactions will worsen the privacy properties from the existing version?
    Or Is it that you're trying to address both issues at the same time?

if I'm not wrong, as SDK clients are already asking LWD for specific tx IDs to fetch memo's and existing outgoing transactions. So I don't believe that this would make privacy 'worse' comparatively to what out threat model contemplates. the wallet team debated about the inconvenience of mixing 0 confirmation data with confirmed data on a payment app like ours and thought that it would be inadequate so we would take action on our SDK implementation to avoid that. I guess that @defuse can bring on specific security and privacy issues if any.

  1. When does basic default Tor support come to the light wallet? How does
    that fit in here?

I checked current issues and we don't have that on the roadmap, but @braddmiller as PO could give you a better answer.

  1. Will your work on all this become part of zecwallet-light-cli? Or will
    it just be available to mobile apps?

this is probably an @adityapk00 question.

(Generally, if we want to be using the
official ECC approved light wallet client libraries in an Electron app that
currently uses zecwallet-light-cli, what will be the best way for us to do
that?)

maybe we (the wallet team) could pair up with you guys and talk about that!

@defuse
Copy link
Collaborator

defuse commented Aug 24, 2020

@holmesworcester, these are great points:

  1. Are there any other data leaks that are bigger or easier to fix and
    unrelated? Like the memo fetching leak to the light wallet server, for
    example?

Then I think we should talk through personas or threat models for specific
users. Or, if we don't feel like we're clear enough on that, talk through
the hierarchy of potential attackers. For example, there are many more
attackers who can subvert a lightwalletd run by a tiny team, or request
connection logs from an ISP, than there are attackers who can link and
deanonymize users by monitoring Tor network traffic, no?

What are everyone's thoughts on this? Do we have any perspective from potential/actual users on these questions? To make the contrast clear, here are the two most pressing problems with the current protocol in my opinion:

  • Problem 1: An attacker who compromises lightwalletd can recover the entire, exact graph of who's-paying-who, by watching which wallets send which transactions and which other wallets express interest in those transactions' memos.
  • Problem 2: An attacker who's only observing lightwalletd's (encrypted) traffic can learn some information about the transaction graph, but it's not exact. They only learn the timing of when wallets send and receive, which they'll have to match up with what they're seeing on the blockchain to try to make inferences about particular transactions.

In Problem 1, the attacker is stronger (they've had to break in to the lightwalletd server), and the result is a bad breach of privacy for all users. On the other hand, the attacker in problem 2 is weaker (they just have to intercept traffic), and the outcome of the attack is not as well understood (it could be just as bad, or slightly less bad, or maybe not that much of a problem, depending on the number of simultaneous users and user behavior).

Putting yourself in the shoes of a prospective user, how do you all feel about the relative priority of these problems? Would either (or both) be a showstopper to using the wallet in certain situations?

@defuse
Copy link
Collaborator

defuse commented Aug 24, 2020

@leto Thanks for thinking about those issues and raising them with patches! It's important to acknowledge and highlight privacy/security shortcomings so they can be prioritized relative to each other and so that users are informed about the software they're using. If there's ever a case where there's a weakness in an ECC mobile wallet that isn't covered in the threat model please let me know! (The items in bold are what I personally feel are the highest priorities to fix).

@leto
Copy link

leto commented Aug 24, 2020

@defuse The Hush team has gone deep into implementing mempool support at the lightwalletd layer and we are interested to see what ECC comes up with. If you want a comment about weaknesses of your threat model, it does not mention the mempool a single time. If mempool support will be added to lightwalletd, it's time for the threat model (which is quite well-written, I may add) to expand to cover a mempool Threat Model.

It would be great to cover the recent security/hardening changes to the Zcash mempool, such as randomly ejecting things from the mempool when it gets full, in this threat model. As far as I know, only some code comments and tests document these security features.

Your threat model would be more useful if it broke things out into lite vs full node mempool attack scenarios. Lite wallets are vastly inferior and potentially vulnerable to entire classes of attacks that full nodes never have to worry about, such as Covert Channels in zk-snarks, good old chain forks and the inherent privacy issues of trusting a 3rd party server to help make shielded tx's.

Fixed size packets are essential to not leak metadata such as if a tx is using zaddrs, how many zaddr recipients are being sent to, which RPC is being used, and many other little metadata tidbits that erode ztx privacy. I highly encourage ECC to enforce all Wormhole and lightwalletd clients do this. If reference servers all require it, then all 3rd party wallet software will follow along.

To summarize, the current state of affairs is that way more metadata is leaked than Problem 2 already in the normal way Zcash Wormhole works, but not quite as bad as Problem 1. I have not done review of the latest JS Zcash wallets because we do not use that code.

@defuse
Copy link
Collaborator

defuse commented Aug 24, 2020

One note on the fixed-size container proposal: an attacker monitoring the network traffic will also learn the timing of when traffic gets sent. That can potentially leak information, even if all the requests and responses are the same size. For example, the wallet sends mempool queries at a fixed interval like every 10 seconds, the attacker can infer that any requests happening outside of that predictable polling are something other than mempool access, like sending a transaction, or something else.

@leto
Copy link

leto commented Aug 24, 2020

@defuse agreed, a simple fixed polling interval is not great. This can be fixed by adding random noise to the polling interval and using exponential backoffs, which will be very hard to differentiate from everything else

@LarryRuane
Copy link
Collaborator

LarryRuane commented Aug 24, 2020

Seems like the fixed-size container idea could be implemented as a separate, invisible layer on top of gRPC (or within gRPC itself, but that team likely won't want to dedicate resources to implementing it). Invisible meaning, no change to the gRPC interface. Much like standard network packet switching, it could break large requests or replies up or merge small ones together into fixed-sized requests and fixed-size responses.

Of course, the latencies would be greater in both directions than today (nothing worth having is free). Requests and replies could follow the Poisson distribution (whether there's any data to send or not -- fill to the fixed size), which I think @leto meant by exponential backoff. Initial sync (block download) would be much slower; that may need to be excepted from this mechanism somehow.

If this is its own separate layer, nothing in the server or client-side would need to change.

@LarryRuane
Copy link
Collaborator

Here's an interesting performance-efficiency idea from Bitcoin Core: https://bitcoincore.org/en/2016/06/07/compact-blocks-faq/

If the wallets have a copy of the mempool, then when a new block propagates, they likely already have all or most of the transactions in that block. So compact blocks the lightwalletd sends to the wallets could contain txids instead of compact transactions (and they even have some way of compressing the txids so the compact blocks are even smaller); the wallet could ask for any transactions it doesn't have.

We'd need a new gRPC for this, because currently there's no way to fetch compact transactions (GetTransaction returns full, zcashd blockchain, transactions), but that's very simple.

@leto
Copy link

leto commented Aug 25, 2020

@LarryRuane to clarify, this is what I mean by "exponential backoff": https://en.wikipedia.org/wiki/Exponential_backoff

For example, instead of polling at a fixed interval, you wait an interval equal to the next Fibonacci number (or some other exponential sequence) until you have waited long enough. With added "noise", there is no way to isolate the mempool requests from any other requests. It's a solved problem, decades ago.

Seems like the fixed-size container idea could be implemented as a separate, invisible layer on top of gRPC is a very complex suggestion and not at all what I am suggesting. Lite wallets cannot request individual txid's, that leaks metadata. The only thing they should ever do is ask lightwalletd for all mempool data, possibly paginated if it's large. Doing anything else tells the server which txid's the client is interested in.

There are some issues if the mempool is under attack and being spammed, with bandwidth usage of lite clients. That is still open to research, in terms of the best ways for lite clients to handle that, without magnifying the attack like a reflected DDoS.

@lindanlee lindanlee reopened this Aug 26, 2020
@lindanlee
Copy link
Contributor

@leto leto mentioned this issue Aug 27, 2020
2 tasks
@leto
Copy link

leto commented Aug 27, 2020

@lindanlee those 2 issues do not replace and ignore 90% or more of the privacy bugs described in this issue. Is it the stance of ECC to ignore those metadata leakage bugs?

@holmesworcester
Copy link

holmesworcester commented Aug 28, 2020

Putting yourself in the shoes of a prospective user, how do you all feel about the relative priority of these problems? Would either (or both) be a showstopper to using the wallet in certain situations?

@defuse it seems like both problems # 1 and # 2 above result from the fact that the lightwallet client only fetches memos for transactions that it is interested in. My memory of the justification for fetching only memos the user is interested in is that it results in a 70% decrease in bandwidth consumption. Is that correct? If so, that seems like a very small benefit relative to the privacy lost. It seems like we should start there and change the default behavior.

@holmesworcester
Copy link

@pacu thanks! I'll ask bradmiller about Tor support.

@gmale
Copy link
Contributor

gmale commented Aug 31, 2020

My memory of the justification for fetching only memos the user is interested in is that it results in a 70% decrease in bandwidth consumption. Is that correct?

It depends on the wallet's use but I don't think 70% is the right number. It would be a function of #_of_your_memos vs. #_of_all_memos which grows as the total universe of Zcash transactions increases. In other words, we are saving 580 bytes for every output that does not belong to our wallet--the more that aren't yours, the more you save. In concrete terms, not a single message that has been sent on Zbay was for my Android wallet. So I'm saving 0.5K per output for all of those transactions, and most transactions have at least 2 outputs. Aditya mentioned testing one account that had 10K transactions. I don't need to download a single one of those memos on my Android wallet.

Lastly, bandwidth isn't the only thing being consumed. Device storage and battery use are also constrained resources to consider and both are drained when processing additional memos. Storage can be mitigated by discarding unnecessary memos but additional battery use is hard to avoid yet, arguably, one of the things the user cares about most.

To be fair, I'm not saying we can't download all memos. I'm just enumerating some of the tradeoffs that factored into the current model.

@holmesworcester
Copy link

holmesworcester commented Sep 1, 2020 via email

@holmesworcester
Copy link

holmesworcester commented Sep 1, 2020

Also, here's a source for the 70% number. Is this out of date?

The memo field is ~70% of a Zcash block.
https://electriccoin.co/blog/zcash-reference-wallet-light-client-protocol/

@gmale
Copy link
Contributor

gmale commented Sep 1, 2020

In the case where I have never been sent a memo, what's the bandwidth increase? Like, what's the difference between the size of the two feeds of everything, the memoless one and the memoful one? (This shouldn't depend on the user's case.)

If we ignore the 70% for a moment and stick with the 580 bytes per output, instead, then that results in the following:

    .====================.        .====================.
    |   Memoless Block   |        |    Memoful Block   |
    .====================.        .====================.
    |        10 T        |        |  10 * (T + 1160B)  |
    +--------------------+        +--------------------+

In other words, if T is the size of a "memoless" transaction in bytes, then a single memoful compactblock with 10 transactions, averaging 2 outputs each, would be 11.6KB larger than a memoless compactblock. That's a single block. Currently, the chain tip is about 540,000 blocks away from sapling activation. Meaning, a complete "memoful" chain of compact blocks of this size would be 6.264GB larger than the "memoless" one.

And that size difference would only increase over time.

I think the real-world numbers will approach this number as a limit as the ratio of "transactions I care about" to "all transactions" approaches zero as that denominator increases.

I agree with what you're saying here--basically that the number above is a decent upper limit for how much data we're considering. No matter how you slice it, it is a substantial amount of data for mobile devices and it is non-trivial for most other use cases, as well. For every 100,000 blocks we process, a single extra byte on a block creates nearly a megabyte of extra download size. Therefore, a 580-byte savings PER OUTPUT is very significant.

Should we assume that the impact on battery life is proportional to the amount of data received over the network?

Yes. It takes a substantial amount of battery to operate a wireless radio. Battery drain is also a function of the amount of processing done (parsing, decryption, storage, etc).

@holmesworcester
Copy link

holmesworcester commented Sep 2, 2020 via email

@gmale
Copy link
Contributor

gmale commented Sep 4, 2020

You might be preaching to the choir...

We're in violent agreement that privacy is paramount 😄 . Your original question was whether a 70% reduction in bandwidth is the justification for not downloading memos and I'm attempting to clarify that it is deeper and more nuanced than that. Especially on mobile.

  • it's not just a 70% difference, it's 580-bytes 464 bytes per shielded output per transaction (which in many cases is more than 70%)
    For example: while most txs have 1 or 2 outputs, I just checked 50,000 transactions and 13,000 of them had over 100 outputs. 4658 of them had over 800 outputs!
  • it's not just bandwidth but also device storage, processing time, battery use and server cost
  • the average entire "memoless" compact block is smaller than a single memo
    counting only blocks with shielded outputs, in a sample of 50,000 blocks the average compactblock serialized size was 402 bytes

Unfortunately, the differences are substantial. Fortunately, though, nothing in the lightwalletd protocol prohibits Zbay from downloading all memos for all blocks.

@holmesworcester
Copy link

holmesworcester commented Sep 8, 2020 via email

@gmale
Copy link
Contributor

gmale commented Sep 8, 2020

Effectively, yes. My numbers are a bit off [corrected to 464, above] but the concept is the same: the savings is per output. Fundamentally, no matter how you slice it, 70% is probably not the right number and it glosses over the other costs associated with the additional bandwidth.

On a related note, I recently came across this explanation that is helpful for calculating the raw numbers which are closer to an 80% difference per output :

For light clients however, there is an additional bandwidth cost: every ciphertext on the block chain must be received from the server (or network node) the light client is connected to. This results in a total of 580 bytes per output that must be streamed to the client.
However, we don't need all of that just to detect payments. The first 52 bytes of the ciphertext contain the contents and opening of the note commitment, which is all of the data needed to spend the note and to verify that the note is spendable. If we ignore the memo and the authentication tag, we're left with a 32-byte ephemeral key, the 32-byte note commitment, and only the first 52 bytes of the ciphertext for each output needed to decrypt, verify, and spend a note. This totals to 116 bytes per output, for an 80% reduction in bandwidth use.

@holmesworcester
Copy link

holmesworcester commented Sep 28, 2020

I think I have a better way to do this.

The problem with achieving the ideal UX of "user A sends to B, user B gets notified" is that creating a transaction takes so long. It takes 15 seconds using zecwallet-light-cli on a modern fast computer, and sometimes longer. And almost all this time is spent creating the proof.

So what if we first sent a summary of the transaction (including amount and encrypted memo) to the network, encrypted to the recipient, but without the zk-proof?

This way, the summary transaction will hit the mempool very fast, and the recipient can be notified.

We'd need to make sure there was no way for the user to turn off their device or close the app before the full transaction was sent. But as long as that's covered I think it achieves the same thing, much faster.

It'd be almost instantaneous, which is great UX, and there's no way you can achieve this UX if you're waiting ~15 seconds just for the transaction to be created on the sender's device.

@gmale thoughts?

@gmale
Copy link
Contributor

gmale commented Sep 28, 2020

My first impression is I really like the creativity of this idea. From a technical perspective, I'd be concerned about the risk of things like DoS attacks, or "Big Spender" issues, where an adversary is essentially flooding a user with transactions that their machine is forced to process that could also contain spoofed information, attempting to trick the user into thinking something happened when it didn't.

I pinged a few people for opinions and security mentioned that without a proof, funds can be created out of thin air until the final moment where a proof is required. The Core team pointed out that these transactions would also get rejected from the mempool (Sapling proofs are checked at the end of ContextualCheckTransaction).

Intuitively, the approach makes me nervous but it also gets my creative thoughts flowing. There has to be a better solution for payment notification! It almost feels like this is something that can or should be done out-of-band. Unfortunately, adding another layer gets difficult in terms of potentially leaking private information. I recall a conversation where Tromer mentioned an interesting idea of pre-computing proofs--keeping a cache of tiny notes, ready to send to an intermediate address. In general, introducing a 3rd address that both parties "control" allows for creative payment solutions.

This might be a great topic to discuss in this week's Light Client working group newsletter.

@holmesworcester
Copy link

holmesworcester commented Sep 29, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants