Skip to content
This repository has been archived by the owner on Feb 26, 2024. It is now read-only.

Fix race conditions causing problems identified in #417 #157

Closed
wants to merge 2 commits into from
Closed

Conversation

kn
Copy link

@kn kn commented Aug 24, 2018

This PR fixes the problem described in here.

Problem:
The root cause of the problem is that we use blockchain.vm.stateManager.[checkpoint|commit|revert]() assuming there is no other functions performing transactions (meaning db transaction instead of blockchain transaction :)) on stateTrie. Doing this without a lock can cause race conditions. For example, the following events can cause race conditions:

  1. checkpoint() is called by processCall()
  2. checkpoint() is called by processNextBlock()
  3. revert() is called by processCall() <- this reverts the checkpoint created by processNextBlock()
  4. commit() is called by processNextBlock() <- this commits the checkpoint created by processCall()

Solution:
This PR solves this problem by introducing a semaphore lock on the stateTrie transactions to prevent functions unintentionally committing or reverting a checkpoint created by other functions.

@@ -7,3 +7,4 @@ TODO
.tern-port
.vscode
yarn.lock
*.swp
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I added this because I use vim and it leaves these temporary files.

@benjamincburns
Copy link
Contributor

Hi @kn, thanks for taking a crack at this!

I'm afraid this patch doesn't entirely fix the problem we've been experiencing. To see this in action try setting the option asyncRequestProcessing to true in ganache-core and then the zeppelin-solidity test suite against it. If they don't fail the first time, they'll fail shortly after.

The asyncRequestProcessing option is purposefully undocumented and it defaults to false as a work around to the race conditions in the merkle-patricia-tree.

Another way to test is to run ganache-cli in forking mode with your fix in place and attempt to debug a transaction from a forked contract.

@benjamincburns
Copy link
Contributor

benjamincburns commented Aug 24, 2018

@kn I'm going to close this PR because the problem needs to be solved by having a robust underlying data structure (the merkle-patricia-tree) rather than by carefully sidestepping problems in that data structure's current implementation.

I strongly encourage you to keep trying, though!

@spm32
Copy link

spm32 commented Aug 25, 2018

@kn +1 on the above comment from @benjamincburns, happy to pay out additional funds as well for that work.

@kn
Copy link
Author

kn commented Aug 25, 2018

Thanks for the feedback! Happy to look into a better solution.

One question on the past attempt to solve this issue. From the comment you made in the issue page, it sounds like someone attempted to solve this by updating ‘checkpoint()’ to take callback and make sure there won’t be race conditions during a call to the ‘checkpoint()’ by using a semaphore lock.

I think this doesn’t solve the issue entirely since the example problem I described above still can happen i.e. we have no control over which checkpoint to commit or revert when there are multiple async functions creating checkpoints for isolated contexts.

If this is true, we probably need the merkle tree to lock checkpoint mode until exit, meaning all checkpoints are committed or reverted, to prevent other async functions from creating checkpoint assuming they are entering into checkpoint mode of their own.

Does this statement align with your understanding of the issue?

@kn
Copy link
Author

kn commented Aug 27, 2018

Here are the changes that demonstrate the idea above:
ethereumjs/merkle-patricia-tree@master...kn:i417
ethereumjs/ethereumjs-monorepo@master...kn:i417
develop...kn:i417_2

Run openzeppelin-solidity tests with asyncRequestProcessing option set to true a few times and they always pass also, except for this one test failing consistently called Contract: Bount against broken contract can claim reward which fails even without the changes and asyncRequestProcessing option set to false.

Unfortunately, I haven't managed to reproduce flaky test with openzeppelin-solidity so I'll investigate more when I have time.

@benjamincburns
Copy link
Contributor

@kn - I'll try to run the zeppelin-solidity tests again this afternoon. I didn't look too carefully when I ran last time -- it's possible I saw the same test fail and mistook it for the bug still hanging around. Will reopen if I reproduce your results.

@benjamincburns
Copy link
Contributor

And thanks for sticking with this!

@spm32
Copy link

spm32 commented Sep 13, 2018

Hey @kn just wanted to check in to see how things are going on this front. Seconding @benjamincburns, appreciate you sticking with this!

@kn
Copy link
Author

kn commented Sep 14, 2018

Thanks for checking in!

I still haven't been able to repro the issue @benjamincburns described. I'm going to be traveling for two months soon so I'll probably won't have time to work on this for a while. I'll release the bounty for now so that other people can claim it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants