-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
evidence: ignore when a peer sends committed evidence #5574
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5574 +/- ##
==========================================
- Coverage 61.08% 61.01% -0.08%
==========================================
Files 263 263
Lines 23751 23754 +3
==========================================
- Hits 14508 14493 -15
- Misses 7761 7784 +23
+ Partials 1482 1477 -5
|
Would be nice with unit tests for these cases as well. |
So I believe I have managed to fix the problem. My hypothesis for how it originate stems from basically having all reactors start up at once. This means that a peer who is fast syncing is also running the evidence reactor. In a situation where an attack occurred and evidence was formed, validators tried to send the evidence to all it's peers. However at the time, a full node was fast syncing and so was behind the height of the attack. Hence the validators were being put in a for loop where every 100 ms they would ping the behind node to see if it had caught up and could receive evidence. In this for loop they were examining whether the evidence had already expired but not if it had been committed . Therefore by the time the full node was at the height of the attack and could receive the evidence the validators were five or six heights ahead and the evidence already committed hence all these validators were sending committed evidence to the full node which was then trying to gossip the evidence because it also hadn't noticed that it was committed hence a lot of peers stopped connections with one another. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great reasoning! This makes sense to me, and I think the proposed fix looks fine.
if evpool.isCommitted(ev) { | ||
return &types.ErrInvalidEvidence{Evidence: ev, Reason: errors.New("evidence was already committed")} | ||
} | ||
|
||
evInfo, err := evpool.verify(ev) | ||
if err != nil { | ||
return &types.ErrInvalidEvidence{Evidence: ev, Reason: err} | ||
} | ||
|
||
if err := evpool.addPendingEvidence(evInfo); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit strange to me that a "check" function would schedule the evidence as well. Shouldn't it just check it without any side-effects?
I know this is unrelated to your changes, just an observation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are correct. It doesn't mutate the evidence it just saves it if it hasn't seen it before. This makes it quicker later on when we want to create abci evidence and send it to the application because the evidence pool has already gone and worked out who the validators involved were
…hat is already committed (#5574)
…hat is already committed (#5574)
Description
What I am observing seems to only be occurring with full nodes who are persistent peers with a validator. If the validator witnesses duplicate votes it adds it to the evidence pool and broadcasts it. When the evidence is committed the evidence pool deletes the evidence both in the store and in the clist, However a couple of heights later the validator broadcasts the evidence that should no longer exist to the full node that it is persistent peers with. This seems to only occur once.
I am not familiar enough with the concurrent lists to be able to decipher what exactly is going on (i.e. it might be that the element is being incorrectly detached) so I have stopped returning an error when a peer sends committed evidence. This seems sensible in any case because nodes notion of whether evidence is committed or not is updated at different times. This will also stop the node from disconnecting with the peer.
UPDATE: I believe I have identified the problem here
Closes: #5560