New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low memory proving integration #2243

Closed
wants to merge 8 commits into
base: master
from

Conversation

@ebfull
Contributor

ebfull commented Apr 7, 2017

The purpose of this PR is to integrate @arielgabizon's work on low memory proving in libsnark.

We start by removing the old loadVerifyingKey/loadProvingKey API in 3a62fba62. This unsightly API was the product of the fact that the JoinSplit abstraction might have had various combinations of parameters loaded into memory - something we want to abandon with this PR. Now, in order to construct the JoinSplit context, you must have paths to the parameters. The tests now operate over our actual parameters instead of generating their own. I've left GenerateParams around... for now.

To be continued...

Closes #2235, Closes #789

@ebfull

This comment has been minimized.

Show comment
Hide comment
@ebfull

ebfull Apr 7, 2017

Contributor

@zkbot try

Contributor

ebfull commented Apr 7, 2017

@zkbot try

@zkbot

This comment has been minimized.

Show comment
Hide comment
@zkbot

zkbot Apr 7, 2017

Contributor

⌛️ Trying commit 3a62fba with merge 7b187b2...

Contributor

zkbot commented Apr 7, 2017

⌛️ Trying commit 3a62fba with merge 7b187b2...

zkbot added a commit that referenced this pull request Apr 7, 2017

Auto merge of #2243 - ebfull:lowmem, r=<try>
Low memory proving integration

The purpose of this PR is to integrate @arielgabizon's work on low memory proving in libsnark.

We start by removing the old `loadVerifyingKey`/`loadProvingKey` API in `3a62fba62`. This unsightly API was the product of the fact that the JoinSplit abstraction *might* have had various combinations of parameters loaded into memory - something we want to abandon with this PR. Now, in order to construct the JoinSplit context, you must have paths to the parameters. The tests now operate over our _actual_ parameters instead of generating their own. I've left `GenerateParams` around... for now.

To be continued...
@zkbot

This comment has been minimized.

Show comment
Hide comment
@zkbot

zkbot Apr 7, 2017

Contributor

☀️ Test successful - zcash

Contributor

zkbot commented Apr 7, 2017

☀️ Test successful - zcash

Show outdated Hide outdated src/zcash/JoinSplit.cpp Outdated

@arielgabizon arielgabizon reopened this Apr 7, 2017

@ebfull

This comment has been minimized.

Show comment
Hide comment
@ebfull

ebfull Apr 7, 2017

Contributor

Closing the PR even once will break homu I think.

Contributor

ebfull commented Apr 7, 2017

Closing the PR even once will break homu I think.

test_full_api(js);
delete js;
test_full_api(params);

This comment has been minimized.

@daira

daira Apr 8, 2017

Contributor

This no longer tests ZCJoinSplit::Generate(...). Is that okay?

@daira

daira Apr 8, 2017

Contributor

This no longer tests ZCJoinSplit::Generate(...). Is that okay?

This comment has been minimized.

@ebfull

ebfull Apr 8, 2017

Contributor

I want to get rid of it entirely, it's only there because GenerateParams needed it. I feel fine removing it but thought others would consider it "locking the keys in the car."

Would like your input on that!

@ebfull

ebfull Apr 8, 2017

Contributor

I want to get rid of it entirely, it's only there because GenerateParams needed it. I feel fine removing it but thought others would consider it "locking the keys in the car."

Would like your input on that!

This comment has been minimized.

@daira

daira Jun 8, 2017

Contributor

I feel uncomfortable about removing param generation. In particular, I'd like it to continue to be possible to sanity-check the MPC code by confirming that it produces the same thing as the standard libsnark parameter generation does when the shards are combined.

Edit: I guess you can do that still by using a release before this change is merged, and we're not going to be changing the parameters before Sapling which will use a completely different proving/verifying code base.

@daira

daira Jun 8, 2017

Contributor

I feel uncomfortable about removing param generation. In particular, I'd like it to continue to be possible to sanity-check the MPC code by confirming that it produces the same thing as the standard libsnark parameter generation does when the shards are combined.

Edit: I guess you can do that still by using a release before this change is merged, and we're not going to be changing the parameters before Sapling which will use a completely different proving/verifying code base.

@daira

Do we have a regression test for #659 (attempting to load the key before parsing command-line arguments)?

@ebfull

This comment has been minimized.

Show comment
Hide comment
@ebfull

ebfull Apr 10, 2017

Contributor

@zkbot try

Contributor

ebfull commented Apr 10, 2017

@zkbot try

@zkbot

This comment has been minimized.

Show comment
Hide comment
@zkbot

zkbot Apr 10, 2017

Contributor

⌛️ Trying commit f286bc0 with merge 3f29602...

Contributor

zkbot commented Apr 10, 2017

⌛️ Trying commit f286bc0 with merge 3f29602...

zkbot added a commit that referenced this pull request Apr 10, 2017

Auto merge of #2243 - ebfull:lowmem, r=<try>
Low memory proving integration

The purpose of this PR is to integrate @arielgabizon's work on low memory proving in libsnark.

We start by removing the old `loadVerifyingKey`/`loadProvingKey` API in `3a62fba62`. This unsightly API was the product of the fact that the JoinSplit abstraction *might* have had various combinations of parameters loaded into memory - something we want to abandon with this PR. Now, in order to construct the JoinSplit context, you must have paths to the parameters. The tests now operate over our _actual_ parameters instead of generating their own. I've left `GenerateParams` around... for now.

To be continued...

Closes #2235, Closes #789
@zkbot

This comment has been minimized.

Show comment
Hide comment
@zkbot

zkbot Apr 10, 2017

Contributor

☀️ Test successful - zcash

Contributor

zkbot commented Apr 10, 2017

☀️ Test successful - zcash

@arielgabizon

This comment has been minimized.

Show comment
Hide comment
@arielgabizon

arielgabizon Apr 25, 2017

Contributor

Are both this and #2252 needed?
Seems this one subsumes that?

Contributor

arielgabizon commented Apr 25, 2017

Are both this and #2252 needed?
Seems this one subsumes that?

@ebfull

This comment has been minimized.

Show comment
Hide comment
@ebfull

ebfull Apr 25, 2017

Contributor

This PR (currently WIP) will be based on #2252 and rebased when that is merged into master.

Contributor

ebfull commented Apr 25, 2017

This PR (currently WIP) will be based on #2252 and rebased when that is merged into master.

@ebfull

This comment has been minimized.

Show comment
Hide comment
@ebfull

ebfull Apr 25, 2017

Contributor

@zkbot try

Contributor

ebfull commented Apr 25, 2017

@zkbot try

@zkbot

This comment has been minimized.

Show comment
Hide comment
@zkbot

zkbot Apr 25, 2017

Contributor

⌛️ Trying commit 5ddd890 with merge 93771a0...

Contributor

zkbot commented Apr 25, 2017

⌛️ Trying commit 5ddd890 with merge 93771a0...

zkbot added a commit that referenced this pull request Apr 25, 2017

Auto merge of #2243 - ebfull:lowmem, r=<try>
Low memory proving integration

The purpose of this PR is to integrate @arielgabizon's work on low memory proving in libsnark.

We start by removing the old `loadVerifyingKey`/`loadProvingKey` API in `3a62fba62`. This unsightly API was the product of the fact that the JoinSplit abstraction *might* have had various combinations of parameters loaded into memory - something we want to abandon with this PR. Now, in order to construct the JoinSplit context, you must have paths to the parameters. The tests now operate over our _actual_ parameters instead of generating their own. I've left `GenerateParams` around... for now.

To be continued...

Closes #2235, Closes #789
@karel-3d

This comment has been minimized.

Show comment
Hide comment
@karel-3d

karel-3d Jun 8, 2017

I tried this branch and now I have no zcash :/ not sure what happened; I have never used zcash shielded txs before, because all my PCs have low memory.

I sent zcash on a transparent address from web wallet, then re-sent by using z_sendmany on CLI. Then the operation ended and I saw it as successful after a while with z_getoperationstatus; however, I mistakenly called z_getoperationresult later, the operation got removed from memory, and now I see all zeroes in z_gettotalbalance.

Is there any way to debug what happened exactly? z_getoperationresult seem to be destructive, I didn't know that.

I know this is not a support forum, but still, I can't even tell you if the bug is between chair and keyboard or a bug in this fork

....edit: oh, it just didn't have enough confirmations, now it seems to work. Sorry!

So yeah, this branch works. (Well, I rebased this branch on top of latest release tag, to be sure. But that version works for me.)

karel-3d commented Jun 8, 2017

I tried this branch and now I have no zcash :/ not sure what happened; I have never used zcash shielded txs before, because all my PCs have low memory.

I sent zcash on a transparent address from web wallet, then re-sent by using z_sendmany on CLI. Then the operation ended and I saw it as successful after a while with z_getoperationstatus; however, I mistakenly called z_getoperationresult later, the operation got removed from memory, and now I see all zeroes in z_gettotalbalance.

Is there any way to debug what happened exactly? z_getoperationresult seem to be destructive, I didn't know that.

I know this is not a support forum, but still, I can't even tell you if the bug is between chair and keyboard or a bug in this fork

....edit: oh, it just didn't have enough confirmations, now it seems to work. Sorry!

So yeah, this branch works. (Well, I rebased this branch on top of latest release tag, to be sure. But that version works for me.)

@arielgabizon

This comment has been minimized.

Show comment
Hide comment
@arielgabizon

arielgabizon Jun 8, 2017

Contributor

Great!! @Runn1ng thanks a lot for testing the branch! When I saw your message I thought it was a confirmations thing. Glad it worked out. How much GB did you have on the computer you ran it on?

Contributor

arielgabizon commented Jun 8, 2017

Great!! @Runn1ng thanks a lot for testing the branch! When I saw your message I thought it was a confirmations thing. Glad it worked out. How much GB did you have on the computer you ran it on?

@karel-3d

This comment has been minimized.

Show comment
Hide comment
@karel-3d

karel-3d Jun 8, 2017

Well, I wanted to record the signing with asciinema, but instead I got segfault. And I get segfault now every time (maybe it's with the confirmations also?)

https://asciinema.org/a/dynj5p3n02u7k6q9ujd4600ks

Anyway, this is screenshot from the first signing that worked out great

zc1

Interesting thing - after the transaction is sent, the zcashd still uses a lot memory, lot more than before the transaction started (and if I restart the daemon, it is back at the low level)

zc2

karel-3d commented Jun 8, 2017

Well, I wanted to record the signing with asciinema, but instead I got segfault. And I get segfault now every time (maybe it's with the confirmations also?)

https://asciinema.org/a/dynj5p3n02u7k6q9ujd4600ks

Anyway, this is screenshot from the first signing that worked out great

zc1

Interesting thing - after the transaction is sent, the zcashd still uses a lot memory, lot more than before the transaction started (and if I restart the daemon, it is back at the low level)

zc2

@arielgabizon

This comment has been minimized.

Show comment
Hide comment
@arielgabizon

arielgabizon Jun 8, 2017

Contributor

I'll check to see if I get this segmentation fault.

Contributor

arielgabizon commented Jun 8, 2017

I'll check to see if I get this segmentation fault.

@karel-3d

This comment has been minimized.

Show comment
Hide comment
@karel-3d

karel-3d Jun 8, 2017

I get it only with 0 conf transactions.

I feel like I also posted here a link to successful asciinema of signing, but maybe I never actually sent it :D ok here it is again - https://asciinema.org/a/2ezspyi6h6f9hvaqiytp9fvot

You can see how the memory increases even after the signing is done and it goes back to low again when I reset the daemon. However I don't really care that much, since I can send/receive fine.

karel-3d commented Jun 8, 2017

I get it only with 0 conf transactions.

I feel like I also posted here a link to successful asciinema of signing, but maybe I never actually sent it :D ok here it is again - https://asciinema.org/a/2ezspyi6h6f9hvaqiytp9fvot

You can see how the memory increases even after the signing is done and it goes back to low again when I reset the daemon. However I don't really care that much, since I can send/receive fine.

test_full_api(js);
delete js;
test_full_api(params);

This comment has been minimized.

@daira

daira Jun 8, 2017

Contributor

I feel uncomfortable about removing param generation. In particular, I'd like it to continue to be possible to sanity-check the MPC code by confirming that it produces the same thing as the standard libsnark parameter generation does when the shards are combined.

Edit: I guess you can do that still by using a release before this change is merged, and we're not going to be changing the parameters before Sapling which will use a completely different proving/verifying code base.

@daira

daira Jun 8, 2017

Contributor

I feel uncomfortable about removing param generation. In particular, I'd like it to continue to be possible to sanity-check the MPC code by confirming that it produces the same thing as the standard libsnark parameter generation does when the shards are combined.

Edit: I guess you can do that still by using a release before this change is merged, and we're not going to be changing the parameters before Sapling which will use a completely different proving/verifying code base.

libsnark/algebra/curves/alt_bn128/alt_bn128_pairing.cpp \
libsnark/algebra/curves/alt_bn128/alt_bn128_pp.cpp \
libsnark/common/utils.cpp \
libsnark/common/profiling.cpp

This comment has been minimized.

@daira

daira Jun 8, 2017

Contributor

Are these just the sources that are used directly?

@daira

daira Jun 8, 2017

Contributor

Are these just the sources that are used directly?

Show outdated Hide outdated src/libsnark/algebra/fields/bigint.tcc Outdated
const Fr<ppT> d1 = Fr<ppT>::random_element(),
d2 = Fr<ppT>::random_element(),
d3 = Fr<ppT>::random_element();

This comment has been minimized.

@daira

daira Jun 8, 2017

Contributor

Nit: align d2 and d3 with d1.

@daira

daira Jun 8, 2017

Contributor

Nit: align d2 and d3 with d1.

enter_block("Compute answer to A-query", false);
{
knowledge_commitment_vector<G1<ppT>, G1<ppT> > A_query;

This comment has been minimized.

@daira

daira Jun 8, 2017

Contributor

If large structures are put on the stack, that memory will never be reclaimed in Linux. Well, the physical pages currently backing that part of the stack might be reused for other processes, but the memory will be reported as being used for the rest of the lifetime of that process, and I believe that it will also influence OOM killer decisions (increasing the likelihood of the zcashd process being OOM killed).

I suggest that we merge this, subject to other review comments, and then file another ticket about the memory reclamation issue. (Using mmap directly would be one way to solve it.)

@daira

daira Jun 8, 2017

Contributor

If large structures are put on the stack, that memory will never be reclaimed in Linux. Well, the physical pages currently backing that part of the stack might be reused for other processes, but the memory will be reported as being used for the rest of the lifetime of that process, and I believe that it will also influence OOM killer decisions (increasing the likelihood of the zcashd process being OOM killed).

I suggest that we merge this, subject to other review comments, and then file another ticket about the memory reclamation issue. (Using mmap directly would be one way to solve it.)

This comment has been minimized.

@daira

daira Jun 8, 2017

Contributor

Confirmed that the OOM killer uses memory estimates that include reserved memory, not just mapped physical pages: https://unix.stackexchange.com/a/153586/82702
(I think this is a ridiculous policy, but we can't change it.)

@daira

daira Jun 8, 2017

Contributor

Confirmed that the OOM killer uses memory estimates that include reserved memory, not just mapped physical pages: https://unix.stackexchange.com/a/153586/82702
(I think this is a ridiculous policy, but we can't change it.)

This comment has been minimized.

@karel-3d

karel-3d Jun 9, 2017

that would explain the retained memory

@karel-3d

karel-3d Jun 9, 2017

that would explain the retained memory

This comment has been minimized.

@daira

daira Oct 22, 2017

Contributor

Filed as #2679.

@daira

daira Oct 22, 2017

Contributor

Filed as #2679.

Show outdated Hide outdated src/zcash/JoinSplit.cpp Outdated
@karel-3d

This comment has been minimized.

Show comment
Hide comment
@karel-3d

karel-3d Jun 26, 2017

I finally had time to test it, and it seems the segfault issue is present in master too, and it has nothing to do with this branch. I will make another issue. Other than that, this branch is working for me!

karel-3d commented Jun 26, 2017

I finally had time to test it, and it seems the segfault issue is present in master too, and it has nothing to do with this branch. I will make another issue. Other than that, this branch is working for me!

@karel-3d

This comment has been minimized.

Show comment
Hide comment
@karel-3d

karel-3d Jul 28, 2017

Is this on track? It would really help people with low memory, and even people with high memory :)

karel-3d commented Jul 28, 2017

Is this on track? It would really help people with low memory, and even people with high memory :)

@daira

This comment has been minimized.

Show comment
Hide comment
@daira

daira Jul 28, 2017

Contributor

Proposed for 1.0.12.

Contributor

daira commented Jul 28, 2017

Proposed for 1.0.12.

@daira daira added this to Proposed in Release planning Aug 3, 2017

@nathan-at-least nathan-at-least moved this from Proposed to Performance Improvements in Release planning Aug 15, 2017

@str4d str4d added this to the 1.0.13 milestone Oct 4, 2017

@nathan-at-least

This comment has been minimized.

Show comment
Hide comment
@nathan-at-least

nathan-at-least Oct 12, 2017

Contributor

I haven't reviewed this PR yet, but I have a feature request:

  • make this an opt-in experimental feature, not a default feature change.

[Edit: I've withdrawn the experimental feature request. See below]

Also, I want to brainstorm the worst possible disaster scenarios that could come from this change, and make sure we convince ourselves they aren't possible or else extremely well tested. What if:

(a) This changes transaction verification subtly and we don't realize this, leading to a catastrophic network failure?
(b) This new prover somehow leaks data about inputs (thereby violating the whole premise of ZKPs) due to some bug we don't notice until after deployed?
(c) This new prover has some bug, and somehow transactions made with this change "tie up" the input coins, but can never be mined, rendering the poor users funds inaccessible?

... Any other disaster scenarios? Put your paranoid tinfoil hat infosec geek hats on!

I think (a) is plausible and I hope source inspection convinces me and everyone else it's utterly impossible. It seems like (b) is a general concern with all ZKP systems including our pre-PR prover, also something to consider during source review.

Importantly, either (a) or (b) coooould occur with memory management bugs.

I can't think of a way for (c) to perma-wedge notes. First, unless (a) occurs, a locally generated transaction that would be invalid for mining should be rejected immediately from the mempool, right? Secondly, even if the local node accepts it and propagates it, then we're "merely" in "Bitcoin wedged-transaction hell" which resolves after some period of weeks. Still terrible, but not permanent loss-of-funds.

Contributor

nathan-at-least commented Oct 12, 2017

I haven't reviewed this PR yet, but I have a feature request:

  • make this an opt-in experimental feature, not a default feature change.

[Edit: I've withdrawn the experimental feature request. See below]

Also, I want to brainstorm the worst possible disaster scenarios that could come from this change, and make sure we convince ourselves they aren't possible or else extremely well tested. What if:

(a) This changes transaction verification subtly and we don't realize this, leading to a catastrophic network failure?
(b) This new prover somehow leaks data about inputs (thereby violating the whole premise of ZKPs) due to some bug we don't notice until after deployed?
(c) This new prover has some bug, and somehow transactions made with this change "tie up" the input coins, but can never be mined, rendering the poor users funds inaccessible?

... Any other disaster scenarios? Put your paranoid tinfoil hat infosec geek hats on!

I think (a) is plausible and I hope source inspection convinces me and everyone else it's utterly impossible. It seems like (b) is a general concern with all ZKP systems including our pre-PR prover, also something to consider during source review.

Importantly, either (a) or (b) coooould occur with memory management bugs.

I can't think of a way for (c) to perma-wedge notes. First, unless (a) occurs, a locally generated transaction that would be invalid for mining should be rejected immediately from the mempool, right? Secondly, even if the local node accepts it and propagates it, then we're "merely" in "Bitcoin wedged-transaction hell" which resolves after some period of weeks. Still terrible, but not permanent loss-of-funds.

@str4d

This comment has been minimized.

Show comment
Hide comment
@str4d

str4d Oct 12, 2017

Contributor

I'll rebase this once #2652 is merged.

I haven't reviewed this PR yet, but I have a feature request:

  • make this an opt-in experimental feature, not a default feature change.

That may not be possible, as this PR required modifications to libsnark, and IDK if the old method works simultaneously. But I'll look into it when rebasing.

Contributor

str4d commented Oct 12, 2017

I'll rebase this once #2652 is merged.

I haven't reviewed this PR yet, but I have a feature request:

  • make this an opt-in experimental feature, not a default feature change.

That may not be possible, as this PR required modifications to libsnark, and IDK if the old method works simultaneously. But I'll look into it when rebasing.

@daira

This comment has been minimized.

Show comment
Hide comment
@daira

daira Oct 12, 2017

Contributor

I'm quite skeptical that we should be using the experimental feature mechanism to delay the deployment of fairly straightforward optimizations like this one.

This is a refactoring of the old prover, not a brand new prover. It's doing the same mathematical operations as the old prover in a different order, and that is evident from the PR. Therefore for this specific change, and given the review that we've already done, I don't think a bug in category b) is plausible.

(c) This new prover has some bug, and somehow transactions made with this change "tie up" the input coins, but can never be mined, rendering the poor users funds inaccessible?

IMHO the right way to mitigate this possibility is to verify the proof just after producing it, as suggested by @tromer at #2005 (comment) (this also results in a performance improvement because it replaces an is_satisfied check). In that case any failure to produce a valid proof would be detected and we won't treat the inputs as spent locally. (In such a case the inputs wouldn't actually be spent anyway because the transaction could never be mined in a valid block. Note that producing a valid transaction that never gets mined does tie up inputs locally already, regardless of any bugs in the prover. I'm not sure whether this is restricted to transparent inputs.)

Contributor

daira commented Oct 12, 2017

I'm quite skeptical that we should be using the experimental feature mechanism to delay the deployment of fairly straightforward optimizations like this one.

This is a refactoring of the old prover, not a brand new prover. It's doing the same mathematical operations as the old prover in a different order, and that is evident from the PR. Therefore for this specific change, and given the review that we've already done, I don't think a bug in category b) is plausible.

(c) This new prover has some bug, and somehow transactions made with this change "tie up" the input coins, but can never be mined, rendering the poor users funds inaccessible?

IMHO the right way to mitigate this possibility is to verify the proof just after producing it, as suggested by @tromer at #2005 (comment) (this also results in a performance improvement because it replaces an is_satisfied check). In that case any failure to produce a valid proof would be detected and we won't treat the inputs as spent locally. (In such a case the inputs wouldn't actually be spent anyway because the transaction could never be mined in a valid block. Note that producing a valid transaction that never gets mined does tie up inputs locally already, regardless of any bugs in the prover. I'm not sure whether this is restricted to transparent inputs.)

@nathan-at-least nathan-at-least self-requested a review Oct 12, 2017

@nathan-at-least

This comment has been minimized.

Show comment
Hide comment
@nathan-at-least

nathan-at-least Oct 12, 2017

Contributor

I withdraw my request to make this an experimental feature, because this introduces more code complexity, as well as more UX/configurability complexity.

Contributor

nathan-at-least commented Oct 12, 2017

I withdraw my request to make this an experimental feature, because this introduces more code complexity, as well as more UX/configurability complexity.

@daira

This comment has been minimized.

Show comment
Hide comment
@daira

daira Oct 22, 2017

Contributor

Superceded by #2670 (but note the discussion here about stack memory retention, and about mitigating potential regressions by verifying the proof just after producing it).

Contributor

daira commented Oct 22, 2017

Superceded by #2670 (but note the discussion here about stack memory retention, and about mitigating potential regressions by verifying the proof just after producing it).

@daira daira closed this Oct 22, 2017

@daira daira removed this from 1.0.13: Performance Improvements in Release planning Oct 30, 2017

@daira daira moved this from Work queue to Complete in Performance Nov 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment