New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't ban peers when loading pre-overwinter blocks #3410

Merged
merged 1 commit into from Jul 30, 2018

Conversation

@LarryRuane
Copy link
Contributor

LarryRuane commented Jul 21, 2018

Addresses #3399. If we see an invalid transaction (from our peers who are up to date, post-overwinter) during initial block download, don't invoke DoS (logging messages, blocking these peers). Instead, simply ignore the transaction.

@LarryRuane

This comment has been minimized.

Copy link
Contributor

LarryRuane commented Jul 21, 2018

To test this, I started a zcashd on mainnet from scratch (moved ~/.zcash out of the way) so it had to download blocks from genesis. The problem reproduced very readily. With these changes, the problem does not occur, at least within the first couple of hours.

I don't know if this is the right way to fix this problem! But I thought I'd offer this PR to help along the discussion and at least help narrow down where the problem is for other more knowledgeable developers.

I haven't written a unit test yet but would be glad to do so if this turns out to be the right fix, or for someone else's (better) fix.

@LarryRuane LarryRuane requested review from bitcartel , str4d and arcalinea Jul 21, 2018

@bitcartel

This comment has been minimized.

Copy link
Contributor

bitcartel commented Jul 21, 2018

Might be better to use the ternary operator as we still want to return an appropriate error to log why the tx was ignored/rejected in the first place. In this example, the ban score is 0: return state.DoS( isInitialBlockDownload() ? 0 : ....

@bitcartel bitcartel added this to the v2.0.0 milestone Jul 21, 2018

@daira

This comment has been minimized.

Copy link
Contributor

daira commented Jul 21, 2018

This change is in ContextualCheckTransaction, which takes the current block height as a parameter. During initial block download, we arguably shouldn't even be calling this function, since we don't know the current block height! If we do need to call it then at least we should make the block height a std::optional.

@bitcartel

This comment has been minimized.

Copy link
Contributor

bitcartel commented Jul 21, 2018

I quite like the approach where if the height is unknown, use a ban score of 0.

It would have to be a boost::optional though, since std::optional is C++17 only!

@ichthuss

This comment has been minimized.

Copy link

ichthuss commented Jul 22, 2018

Actually I don't see any way to address this issue without at least estimating current block height. We either disable "tx from future hard fork" peer check at all, or we should somehow estimate is the tx possibly valid even if our daemon didn't come to the fork yet.

I believe the best option here is to use a code similar to Checkpoints::GuessVerificationProgress to estimate current block height, and use it instead of isSprout condition.

@ichthuss

This comment has been minimized.

Copy link

ichthuss commented Jul 22, 2018

Also, I believe that isInitialBlockDownload() check isn't relevant at all. The bug may come out if daemon was physically disconnected from the network before hard fork, and then eventually connected back. Or if host was hibernated before the fork, and woken after hard fork finally happened. Both may happen after initial block download were successfully completed. I admit that this situation would never happen with fixed daemon - the fork has already happened, and there is no way some installation of zcashd that has this bug fixed may be hibernated before it happens. But I'm sure we need a solution that would also work well for future hard forks.

@daira

This comment has been minimized.

Copy link
Contributor

daira commented Jul 23, 2018

Actually I don't see any way to address this issue without at least estimating current block height. We either disable "tx from future hard fork" peer check at all, or we should somehow estimate is the tx possibly valid even if our daemon didn't come to the fork yet.

@icthuss that's not correct. The indication that the current height is unreliable would affect only the ban score (as @bitcartel said), not whether the tx is rejected.

@ichthuss

This comment has been minimized.

Copy link

ichthuss commented Jul 23, 2018

@daira

This comment has been minimized.

Copy link
Contributor

daira commented Jul 23, 2018

The proposal is to not ban peers that send us a "not yet activated" tx version before initial download has finished. This seems entirely reasonable to me and should not compromise DoS-resistance in general.

@ichthuss

This comment has been minimized.

Copy link

ichthuss commented Jul 23, 2018

The proposal is to not ban peers that send us a "not yet activated" tx version before initial download has finished. This seems entirely reasonable to me and should not compromise DoS-resistance in general.

That's right. But initial blockchain download is not an only option for this bug to appear. Daemon may have been disconnected from network during hard fork. The host may have been hibernated. The VM snapshot that had been made before hard fork may have been restored. "Before initial download has finished" condition doesn't cover all cases.

@LarryRuane LarryRuane force-pushed the LarryRuane:3399-peer-ban branch from afe0f42 to 204787a Jul 23, 2018

@LarryRuane

This comment has been minimized.

Copy link
Contributor

LarryRuane commented Jul 23, 2018

Latest commit (force-pushed) implements Simon's suggestion (call state.DoS() but with a ban score of zero). I tested this the same way as before, I've got a zcashd that's downloading blocks and is at around 184k (out of around 360k current block height), and it does detect transactions it doesn't like as before, but now only logs as follows and does not ban peers:

2018-07-23 18:47:22 ERROR: ContextualCheckTransaction(): overwinter is not active yet
2018-07-23 18:47:22 ERROR: AcceptToMemoryPool: ContextualCheckTransaction failed

Let me know if there's more that should be done (I don't understand @ichthuss 's comment).

@bitcartel

This comment has been minimized.

Copy link
Contributor

bitcartel commented Jul 24, 2018

@LarryRuane Thanks, has it synced past Overwinter successfully?

@daira I think the above, with or without the height parameter as discussed is a stop-gap (which we should still consider merging).

So @ichthuss is right that there are cases missed by the IsInitialBlockDownload() check. Coming out of hibernation, a node may have to sync past one or more upgrades, but since IsInitialBlockDownload() uses a latch, once it is set to false, it can be never be set to true, so even when the node wakes up from its frozen state of say block 300,000 and is told by a peer "Hi, we're now at block 1,000,000" the node can't go into initial block download state, so the node will end up rejecting upgraded transactions and ban the peer.

@str4d str4d added this to Review Backlog in Zcashd Team Jul 25, 2018

@str4d str4d moved this from Review Backlog to In Review in Zcashd Team Jul 25, 2018

@str4d

str4d approved these changes Jul 25, 2018

Copy link
Contributor

str4d left a comment

utACK. I agree that this is only a stop-gap, and that we should estimate the current height to figure out what to do, but this is an effective solution in the short term:

  • Transactions are still rejected, so we don't have to deal with the possibility of having Overwinter transactions in the mempool while syncing Sprout blocks. We would likely need to deal with this in the height-estimation case.
  • The nodes we are no longer banning are by definition Overwinter nodes, and thus exactly the nodes we want to be connected to after we reach Overwinter activation (since this change is going into a codebase that will upgrade to Overwinter, and in fact Sapling). We do still need to think about future upgrades however, and what might happen if this change is present in the software versions that won't upgrade; this is part of why I consider this a stop-gap.
@ebfull

This comment has been minimized.

Copy link
Contributor

ebfull commented Jul 25, 2018

LGTM, but don't know if there are other bans we should change as well.

@ebfull

ebfull approved these changes Jul 25, 2018

@ichthuss

This comment has been minimized.

Copy link

ichthuss commented Jul 25, 2018

I'm not sure if disabling a correct joinsplit check during initial block load is a good idea. The first change in this PR definitely resolves problem, but the second change may create DOS vulnerablilty.

AFAIK joinsplit check is a CPU expensive operation, so banning peer that sends invalid joinsplit signature really makes sense. A malicious node may send transactions that have invalid joinsplit to exhaust CPU resource of peers. It wouldn't affect peers that are in sync, but peers performing an initial block download would be vulnerable.

I think joinsplit check should be either omitted at all for future tx version, or peers that send invalid joinsplit should still be banned. Current code doesn't perform any checks on future tx version, so the second change looks both redundant and dangerous to me.

@str4d

This comment has been minimized.

Copy link
Contributor

str4d commented Jul 26, 2018

AFAIK joinsplit check is a CPU expensive operation, so banning peer that sends invalid joinsplit signature really makes sense. A malicious node may send transactions that have invalid joinsplit to exhaust CPU resource of peers. It wouldn't affect peers that are in sync, but peers performing an initial block download would be vulnerable.

It isn't generally possible to distinguish between an invalid transaction and a wrong-epoch transaction, because the SigHash algorithm changed in Overwinter (and by design its output changes after each upgrade), and so valid Overwinter signatures will look like invalid Sprout signatures. This is another reason I consider this a stop-gap fix; we should instead be validating (when receiving a transaction for the mempool) against the consensus rules we expect to be in place using an estimated height (with the usual banning), and then probably just dropping the transaction if we still haven't reached Overwinter.

I think joinsplit check should be either omitted at all for future tx version, or peers that send invalid joinsplit should still be banned. Current code doesn't perform any checks on future tx version, so the second change looks both redundant and dangerous to me.

In this particular case, we could make the zero-ban conditional on the transaction version (because pre-Overwinter transaction versions and Overwinter transaction versions explicitly do not appear in the same epochs). However, I strongly advocate against making other checks conditional in that way, because these same code checks are used when accepting transactions into the mempool, and when enforcing consensus on the block chain. Extending the scope of this stop-gap fix increases the risk of an unintentional consensus-rule change. The fix I outlined above instead only changes AcceptToMemoryPool's usage of these functions, which is not a consensus-critical operation.

@daira

This comment has been minimized.

Copy link
Contributor

daira commented Jul 26, 2018

Apart from disagreeing that we should be estimating the height for the eventual fix (which is an entirely separable issue that we should discuss later), I endorse @str4d's reasoning here.

@daira

daira approved these changes Jul 26, 2018

Copy link
Contributor

daira left a comment

utACK. A test would be nice but that's not a blocker.

@daira

This comment has been minimized.

Copy link
Contributor

daira commented Jul 26, 2018

@zkbot r+

@zkbot

This comment has been minimized.

Copy link
Collaborator

zkbot commented Jul 26, 2018

📌 Commit 204787a has been approved by daira

@bitcartel

This comment has been minimized.

Copy link
Contributor

bitcartel commented Jul 26, 2018

Reviewing the changes to the tests now.

Comment: since the tests are receiving 0 ban score, that means the test environment is in the initial block download state. Hmm...

@LarryRuane

This comment has been minimized.

Copy link
Contributor

LarryRuane commented Jul 27, 2018

@bitcartel asked:

has it synced past Overwinter successfully?

(sorry for the delay in answering) Yes, no problems, as expected the error messages did not appear after syncing block height 347500.

@daira

This comment has been minimized.

Copy link
Contributor

daira commented Jul 27, 2018

Please file a ticket to adjust the tests, so that they check the ban score in both cases (when the bad tx is received both before and after initial block download).

@bitcartel
Copy link
Contributor

bitcartel left a comment

ACK. IsInitialBlockDownload() returns true when chainActive.tip == NULL, which is the setup for the tests.

@bitcartel

This comment has been minimized.

Copy link
Contributor

bitcartel commented Jul 29, 2018

@zkbot r+

@zkbot

This comment has been minimized.

Copy link
Collaborator

zkbot commented Jul 29, 2018

📌 Commit 772f87a has been approved by bitcartel

@zkbot

This comment has been minimized.

Copy link
Collaborator

zkbot commented Jul 29, 2018

⌛️ Testing commit 772f87a with merge 1e052a4...

zkbot added a commit that referenced this pull request Jul 29, 2018

Auto merge of #3410 - LarryRuane:3399-peer-ban, r=bitcartel
don't ban peers when loading pre-overwinter blocks

Addresses #3399. If we see an invalid transaction (from our peers who are up to date, post-overwinter) during initial block download, don't invoke DoS (logging messages, blocking these peers). Instead, simply ignore the transaction.
@zkbot

This comment has been minimized.

Copy link
Collaborator

zkbot commented Jul 29, 2018

💥 Test timed out

@bitcartel

This comment has been minimized.

Copy link
Contributor

bitcartel commented Jul 29, 2018

@zkbot retry

@zkbot

This comment has been minimized.

Copy link
Collaborator

zkbot commented Jul 29, 2018

⌛️ Testing commit 772f87a with merge fc0dcf3...

zkbot added a commit that referenced this pull request Jul 29, 2018

Auto merge of #3410 - LarryRuane:3399-peer-ban, r=bitcartel
don't ban peers when loading pre-overwinter blocks

Addresses #3399. If we see an invalid transaction (from our peers who are up to date, post-overwinter) during initial block download, don't invoke DoS (logging messages, blocking these peers). Instead, simply ignore the transaction.
@zkbot

This comment has been minimized.

Copy link
Collaborator

zkbot commented Jul 30, 2018

💔 Test failed - pr-merge

@bitcartel

This comment has been minimized.

Copy link
Contributor

bitcartel commented Jul 30, 2018

Looks like a transient failure with the Mac builder.
@zkbot retry

zkbot added a commit that referenced this pull request Jul 30, 2018

Auto merge of #3410 - LarryRuane:3399-peer-ban, r=bitcartel
don't ban peers when loading pre-overwinter blocks

Addresses #3399. If we see an invalid transaction (from our peers who are up to date, post-overwinter) during initial block download, don't invoke DoS (logging messages, blocking these peers). Instead, simply ignore the transaction.
@zkbot

This comment has been minimized.

Copy link
Collaborator

zkbot commented Jul 30, 2018

⌛️ Testing commit 772f87a with merge 6b9c962...

@zkbot

This comment has been minimized.

Copy link
Collaborator

zkbot commented Jul 30, 2018

☀️ Test successful - pr-merge
Approved by: bitcartel
Pushing 6b9c962 to master...

@zkbot zkbot merged commit 772f87a into zcash:master Jul 30, 2018

1 check passed

homu Test successful
Details

Zcashd Team automation moved this from In Review to Released (Merged in Master) Jul 30, 2018

@LarryRuane

This comment has been minimized.

Copy link
Contributor

LarryRuane commented Jul 31, 2018

@daira wrote (above):

Please file a ticket to adjust the tests, so that they check the ban score in both cases (when the bad tx is received both before and after initial block download).

I haven't created a ticket yet, but thought I'd just take a quick look at what it would take to do this, and I ran into a problem. I was able to modify one of the tests (I arbitrarily chose ContextualCheckBlock.BlockSproutRulesRejectSaplingTx()) to not be in initial download when it submits the Sapling tx, so that the test can verify that the ban score is 100 (instead of 0) (I know we want to add a new test, so we test both ways, but I was just experimenting). I did this by mining a block, so that chainActive.tip is non-null (see bitcartel's comment above).

The problem is that once IsInitialBlockDownload() returns false, that return value is latched (the function returns false from then on; this comes from upstream btw). This state affects later tests: even though they don't mine a block (and the first test tore down the mined block), IsInitialBlockDownload() returns false so they see ban score 100. Of course, I could make these later test expect ban score to be 100, but then these tests can't be run independently (using --gtest_filter) and that's completely unacceptable.

In general, latching state like this makes testing difficult, because you can't really make each test start with known, fixed state.

The only thing I can think of is to add a test-only function that resets the latch. I just implemented that, and it actually is not bad, was very easy to do. (I could show you the code but it's obvious.) Do you like this idea? If so, should I create a new issue and PR (since this one has been merged)?

@daira

This comment has been minimized.

Copy link
Contributor

daira commented Aug 1, 2018

Down with the state! Ⓐ

The Right Thing is for the initial-download flag to be associated with the chain state created for each test, not global.

@LarryRuane

This comment has been minimized.

Copy link
Contributor

LarryRuane commented Aug 1, 2018

I love that idea! But, unfortunately, the chain state (chainActive and mapBlockIndex, maybe some other stuff) is declared globally, and constructed only once, not per test. See lines 59 and 60 in src/main.cpp. Not a good design, upstreamers! Unless you're referring to some other chain state (per-test)? If so, I don't see it.

LarryRuane added a commit to LarryRuane/zcash that referenced this pull request Aug 8, 2018

LarryRuane added a commit to LarryRuane/zcash that referenced this pull request Aug 10, 2018

zkbot added a commit that referenced this pull request Sep 19, 2018

Auto merge of #3452 - LarryRuane:3399-peer-ban-test, r=bitcartel
Test peer banning logic in both pre- and post-initial block download states

The DoS ban scores are different for each, and it's nice to test both.

Follow-on from #3410.

@LarryRuane LarryRuane removed the request for review from arcalinea Nov 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment