Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evidence: ignore when a peer sends committed evidence #5574

Merged
merged 6 commits into from
Oct 27, 2020

Conversation

cmwaters
Copy link
Contributor

@cmwaters cmwaters commented Oct 26, 2020

Description

What I am observing seems to only be occurring with full nodes who are persistent peers with a validator. If the validator witnesses duplicate votes it adds it to the evidence pool and broadcasts it. When the evidence is committed the evidence pool deletes the evidence both in the store and in the clist, However a couple of heights later the validator broadcasts the evidence that should no longer exist to the full node that it is persistent peers with. This seems to only occur once.

I am not familiar enough with the concurrent lists to be able to decipher what exactly is going on (i.e. it might be that the element is being incorrectly detached) so I have stopped returning an error when a peer sends committed evidence. This seems sensible in any case because nodes notion of whether evidence is committed or not is updated at different times. This will also stop the node from disconnecting with the peer.

UPDATE: I believe I have identified the problem here

Closes: #5560

@cmwaters cmwaters added T:bug Type Bug (Confirmed) C:evidence Component: Evidence labels Oct 26, 2020
@codecov
Copy link

codecov bot commented Oct 26, 2020

Codecov Report

Merging #5574 into master will decrease coverage by 0.07%.
The diff coverage is 58.82%.

@@            Coverage Diff             @@
##           master    #5574      +/-   ##
==========================================
- Coverage   61.08%   61.01%   -0.08%     
==========================================
  Files         263      263              
  Lines       23751    23754       +3     
==========================================
- Hits        14508    14493      -15     
- Misses       7761     7784      +23     
+ Partials     1482     1477       -5     
Impacted Files Coverage Δ
evidence/verify.go 81.37% <ø> (+2.46%) ⬆️
evidence/reactor.go 59.59% <55.55%> (+4.59%) ⬆️
evidence/pool.go 69.75% <62.50%> (-0.44%) ⬇️
consensus/reactor.go 74.08% <0.00%> (-3.28%) ⬇️
blockchain/v0/pool.go 76.01% <0.00%> (-2.22%) ⬇️
blockchain/v2/reactor.go 33.58% <0.00%> (-1.50%) ⬇️
statesync/syncer.go 78.48% <0.00%> (-0.85%) ⬇️
p2p/pex/pex_reactor.go 78.27% <0.00%> (-0.60%) ⬇️
consensus/state.go 68.21% <0.00%> (-0.19%) ⬇️
blockchain/v0/reactor.go 62.56% <0.00%> (+0.98%) ⬆️
... and 7 more

evidence/pool.go Outdated Show resolved Hide resolved
evidence/pool.go Outdated Show resolved Hide resolved
evidence/reactor.go Outdated Show resolved Hide resolved
@erikgrinaker
Copy link
Contributor

Would be nice with unit tests for these cases as well.

@cmwaters
Copy link
Contributor Author

So I believe I have managed to fix the problem. My hypothesis for how it originate stems from basically having all reactors start up at once. This means that a peer who is fast syncing is also running the evidence reactor.

In a situation where an attack occurred and evidence was formed, validators tried to send the evidence to all it's peers. However at the time, a full node was fast syncing and so was behind the height of the attack. Hence the validators were being put in a for loop where every 100 ms they would ping the behind node to see if it had caught up and could receive evidence. In this for loop they were examining whether the evidence had already expired but not if it had been committed .

Therefore by the time the full node was at the height of the attack and could receive the evidence the validators were five or six heights ahead and the evidence already committed hence all these validators were sending committed evidence to the full node which was then trying to gossip the evidence because it also hadn't noticed that it was committed hence a lot of peers stopped connections with one another.

evidence/reactor.go Outdated Show resolved Hide resolved
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great reasoning! This makes sense to me, and I think the proposed fix looks fine.

if evpool.isCommitted(ev) {
return &types.ErrInvalidEvidence{Evidence: ev, Reason: errors.New("evidence was already committed")}
}

evInfo, err := evpool.verify(ev)
if err != nil {
return &types.ErrInvalidEvidence{Evidence: ev, Reason: err}
}

if err := evpool.addPendingEvidence(evInfo); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange to me that a "check" function would schedule the evidence as well. Shouldn't it just check it without any side-effects?

I know this is unrelated to your changes, just an observation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are correct. It doesn't mutate the evidence it just saves it if it hasn't seen it before. This makes it quicker later on when we want to create abci evidence and send it to the application because the evidence pool has already gone and worked out who the validators involved were

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:evidence Component: Evidence T:bug Type Bug (Confirmed)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

evidence: proposer adds already committed evidence to new block
2 participants