-
Notifications
You must be signed in to change notification settings - Fork 679
Fix race conditions causing problems identified in #417 #157
Conversation
@@ -7,3 +7,4 @@ TODO | |||
.tern-port | |||
.vscode | |||
yarn.lock | |||
*.swp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I added this because I use vim and it leaves these temporary files.
Hi @kn, thanks for taking a crack at this! I'm afraid this patch doesn't entirely fix the problem we've been experiencing. To see this in action try setting the option The Another way to test is to run ganache-cli in forking mode with your fix in place and attempt to debug a transaction from a forked contract. |
@kn I'm going to close this PR because the problem needs to be solved by having a robust underlying data structure (the merkle-patricia-tree) rather than by carefully sidestepping problems in that data structure's current implementation. I strongly encourage you to keep trying, though! |
@kn +1 on the above comment from @benjamincburns, happy to pay out additional funds as well for that work. |
Thanks for the feedback! Happy to look into a better solution. One question on the past attempt to solve this issue. From the comment you made in the issue page, it sounds like someone attempted to solve this by updating ‘checkpoint()’ to take callback and make sure there won’t be race conditions during a call to the ‘checkpoint()’ by using a semaphore lock. I think this doesn’t solve the issue entirely since the example problem I described above still can happen i.e. we have no control over which checkpoint to commit or revert when there are multiple async functions creating checkpoints for isolated contexts. If this is true, we probably need the merkle tree to lock checkpoint mode until exit, meaning all checkpoints are committed or reverted, to prevent other async functions from creating checkpoint assuming they are entering into checkpoint mode of their own. Does this statement align with your understanding of the issue? |
Here are the changes that demonstrate the idea above: Run openzeppelin-solidity tests with Unfortunately, I haven't managed to reproduce flaky test with openzeppelin-solidity so I'll investigate more when I have time. |
@kn - I'll try to run the |
And thanks for sticking with this! |
Hey @kn just wanted to check in to see how things are going on this front. Seconding @benjamincburns, appreciate you sticking with this! |
Thanks for checking in! I still haven't been able to repro the issue @benjamincburns described. I'm going to be traveling for two months soon so I'll probably won't have time to work on this for a while. I'll release the bounty for now so that other people can claim it. |
This PR fixes the problem described in here.
Problem:
The root cause of the problem is that we use
blockchain.vm.stateManager.[checkpoint|commit|revert]()
assuming there is no other functions performing transactions (meaning db transaction instead of blockchain transaction :)) onstateTrie
. Doing this without a lock can cause race conditions. For example, the following events can cause race conditions:checkpoint()
is called byprocessCall()
checkpoint()
is called byprocessNextBlock()
revert()
is called byprocessCall()
<- this reverts the checkpoint created byprocessNextBlock()
commit()
is called byprocessNextBlock()
<- this commits the checkpoint created byprocessCall()
Solution:
This PR solves this problem by introducing a semaphore lock on the stateTrie transactions to prevent functions unintentionally committing or reverting a checkpoint created by other functions.