Fix race conditions causing problems identified in #417 #157

kn · 2018-08-24T03:25:14Z

This PR fixes the problem described in here.

Problem:
The root cause of the problem is that we use blockchain.vm.stateManager.[checkpoint|commit|revert]() assuming there is no other functions performing transactions (meaning db transaction instead of blockchain transaction :)) on stateTrie. Doing this without a lock can cause race conditions. For example, the following events can cause race conditions:

checkpoint() is called by processCall()
checkpoint() is called by processNextBlock()
revert() is called by processCall() <- this reverts the checkpoint created by processNextBlock()
commit() is called by processNextBlock() <- this commits the checkpoint created by processCall()

Solution:
This PR solves this problem by introducing a semaphore lock on the stateTrie transactions to prevent functions unintentionally committing or reverting a checkpoint created by other functions.

#417

kn · 2018-08-24T15:07:19Z

.gitignore

@@ -7,3 +7,4 @@ TODO
 .tern-port
 .vscode
 yarn.lock
+*.swp


FYI: I added this because I use vim and it leaves these temporary files.

benjamincburns · 2018-08-24T22:25:18Z

Hi @kn, thanks for taking a crack at this!

I'm afraid this patch doesn't entirely fix the problem we've been experiencing. To see this in action try setting the option asyncRequestProcessing to true in ganache-core and then the zeppelin-solidity test suite against it. If they don't fail the first time, they'll fail shortly after.

The asyncRequestProcessing option is purposefully undocumented and it defaults to false as a work around to the race conditions in the merkle-patricia-tree.

Another way to test is to run ganache-cli in forking mode with your fix in place and attempt to debug a transaction from a forked contract.

benjamincburns · 2018-08-24T23:12:35Z

@kn I'm going to close this PR because the problem needs to be solved by having a robust underlying data structure (the merkle-patricia-tree) rather than by carefully sidestepping problems in that data structure's current implementation.

I strongly encourage you to keep trying, though!

spm32 · 2018-08-25T02:19:26Z

@kn +1 on the above comment from @benjamincburns, happy to pay out additional funds as well for that work.

kn · 2018-08-25T19:17:23Z

Thanks for the feedback! Happy to look into a better solution.

One question on the past attempt to solve this issue. From the comment you made in the issue page, it sounds like someone attempted to solve this by updating ‘checkpoint()’ to take callback and make sure there won’t be race conditions during a call to the ‘checkpoint()’ by using a semaphore lock.

I think this doesn’t solve the issue entirely since the example problem I described above still can happen i.e. we have no control over which checkpoint to commit or revert when there are multiple async functions creating checkpoints for isolated contexts.

If this is true, we probably need the merkle tree to lock checkpoint mode until exit, meaning all checkpoints are committed or reverted, to prevent other async functions from creating checkpoint assuming they are entering into checkpoint mode of their own.

Does this statement align with your understanding of the issue?

kn · 2018-08-27T04:07:52Z

Here are the changes that demonstrate the idea above:
ethereumjs/merkle-patricia-tree@master...kn:i417
ethereumjs/ethereumjs-monorepo@master...kn:i417
develop...kn:i417_2

Run openzeppelin-solidity tests with asyncRequestProcessing option set to true a few times and they always pass also, except for this one test failing consistently called Contract: Bount against broken contract can claim reward which fails even without the changes and asyncRequestProcessing option set to false.

Unfortunately, I haven't managed to reproduce flaky test with openzeppelin-solidity so I'll investigate more when I have time.

benjamincburns · 2018-08-27T16:15:41Z

@kn - I'll try to run the zeppelin-solidity tests again this afternoon. I didn't look too carefully when I ran last time -- it's possible I saw the same test fail and mistook it for the bug still hanging around. Will reopen if I reproduce your results.

benjamincburns · 2018-08-27T16:15:59Z

And thanks for sticking with this!

spm32 · 2018-09-13T16:51:45Z

Hey @kn just wanted to check in to see how things are going on this front. Seconding @benjamincburns, appreciate you sticking with this!

kn · 2018-09-14T03:24:03Z

Thanks for checking in!

I still haven't been able to repro the issue @benjamincburns described. I'm going to be traveling for two months soon so I'll probably won't have time to work on this for a while. I'll release the bounty for now so that other people can claim it.

kn added 2 commits August 23, 2018 20:17

Update test cases for issue #417 to demonstrate the fix

b1540a5

Use semaphore lock to prevent race conditions on stateTrie described in

6dde88f

#417

gitcoinbot mentioned this pull request Aug 24, 2018

Bounty: Fix race conditions causing problems identified in #417 trufflesuite/ganache-cli-archive#453

Closed

mikeseese requested review from benjamincburns and davidmurdoch August 24, 2018 14:57

kn commented Aug 24, 2018

View reviewed changes

benjamincburns closed this Aug 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race conditions causing problems identified in #417 #157

Fix race conditions causing problems identified in #417 #157

kn commented Aug 24, 2018 •

edited

Loading

kn Aug 24, 2018

benjamincburns commented Aug 24, 2018

benjamincburns commented Aug 24, 2018 •

edited

Loading

spm32 commented Aug 25, 2018 •

edited

Loading

kn commented Aug 25, 2018 •

edited

Loading

kn commented Aug 27, 2018 •

edited

Loading

benjamincburns commented Aug 27, 2018

benjamincburns commented Aug 27, 2018

spm32 commented Sep 13, 2018

kn commented Sep 14, 2018

Fix race conditions causing problems identified in #417 #157

Fix race conditions causing problems identified in #417 #157

Conversation

kn commented Aug 24, 2018 • edited Loading

kn Aug 24, 2018

Choose a reason for hiding this comment

benjamincburns commented Aug 24, 2018

benjamincburns commented Aug 24, 2018 • edited Loading

spm32 commented Aug 25, 2018 • edited Loading

kn commented Aug 25, 2018 • edited Loading

kn commented Aug 27, 2018 • edited Loading

benjamincburns commented Aug 27, 2018

benjamincburns commented Aug 27, 2018

spm32 commented Sep 13, 2018

kn commented Sep 14, 2018

kn commented Aug 24, 2018 •

edited

Loading

benjamincburns commented Aug 24, 2018 •

edited

Loading

spm32 commented Aug 25, 2018 •

edited

Loading

kn commented Aug 25, 2018 •

edited

Loading

kn commented Aug 27, 2018 •

edited

Loading