-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Tests randomly crashing at ProviderError.ExtendableError on Ubuntu (Linux) #729
Comments
This happens to me as well:
Repo: https://github.com/daostack/daostack I am running Windows 10 so this doesn't seem related to OS For me it happens around 50% of the time locally and also happens during Travis build. |
This issue has been open for a while and looks a lot like the non-deterministic ganache bug which @benjamincburns fixed recently. Downloading the latest ganache-cli (read the release notes!) should resolve this problem. Closing but if anyone continues to see this error please re-open or comment. Thanks for reporting. |
I'm getting this error message when connecting to Rinkeby with Infura, truffle-wallet-provider, and ethereumjs-wallet. I'm not convinced this is Ganache, but maybe my error is caused my something else. |
@nickjm Could you provide context (what are you doing in your code) or a reproduction path? A stacktrace might also be helpful. |
I have this issue around 10% of my builds, with 128 tests active, randomly. |
Same problem on Windows 10:
|
@cgewecke: |
@barakman No, everyone's seeing it intermittently, AFAIK it predates recent work to stabilize the test client and it's possible that it's related to issue 453 at Unfortunately this looks like (as @benjamincburns would say) a heisenbug. If anyone finds a consistent way of reproducing it they will be greeted with delight. |
@cgewecke: In my case, I see this occurring at random indeed, but - and this is actually very deterministic - in only 1 out of 23 scripts which Thanks |
@barakman Ahhh!! That's pretty good isolation of the problem. It could be something in the script you identified. Could also be related to the test execution that precedes it. If there's any possibility of having another set of eyes look at the codebase, we'd definitely be interested. |
@cgewecke: |
@barakman Ok great, thank you. One thing about |
@cgewecke: MyContract.sol:
MyContractUnitTest.js:
My setup (as mentioned in a previous comment on this thread) is:
Thanks |
@barakman Great!! Thank you. Going to open a companion issue over at |
@cgewecke: Also happens on |
@cgewecke : I am circulating around the conclusion that this problem stems from improper usage of the Mocha framework. More precisely, improper usage of the The typical error messages, although poorly phrased, imply this conjecture as well:
You can read the relevant information here, namely:
So using these hooks on the root-level might be a bad idea in this case, since due to the nature of the tested system (communication with a TestRPC or Ganache process), they typically execute asynchronous code. |
@barakman Agree it seems like disconnections happen at the 'seams' of the suites where the hooks are. The code at |
@cgewecke: By the way, in the code that you linked, there doesn't seem to be support for the |
@barakman Hmmm....that's a nice observation about |
@cgewecke:
If all tests pass without disconnections, then I'm pretty sure that we can stamp this as the cause of the problem. And even if not, I think that it could still be related to hooks which are added implicitly by the Mocha framework. |
I take back my previous conjecture of this error occurring on a given test as a result of something which has executed on a previous test. In addition to that, I have recently tested the new I have posted my findings on a similar GIT thread which is closed by now, but I am hoping will reopen. Thanks |
I have conducted a more extensive research, by modifying file I started off by checking which path leads to the
As expected, this error occurs only in the asynchronous path. Second, I added some logging in this path, just before invoking
Here is the consistency that I have observed:
I am hopeful that the above information will provide some clues towards the source of this problem. Thanks |
Update to the above: I later realized that:
I therefore changed the logging as follows:
The new logging has improved my previous observation from this:
To this:
All of that, during normal execution of course. What happens right before the error can be described as follows:
Is it possible that somewhere in ganache-cli code, the That could most certainly be classified as an "Invalid JSON RPC response" (which you guys can easily resolve). Thanks |
Some more observations: When the last two
When only the last
|
@barakman Thank you. . . this analysis is really helpful.
@benjamincburns Does anything jump out at you as a possibility in the preceding three comments? |
Unfortunately it looks like case 1 is true. XHR2 is used in the latest web3 0.x as wells as web3 1.0. Have also tried running your reproduction case using web3 1.0 over websockets without luck. . . This issue raises questions about whether web3 / truffle / ganache are really suited to running simulations with tens of thousands of calls. There might be significant value in building a tool that ran tests directly on top of ethereumjs-vm, or perhaps inside ganache, avoiding http overhead and other constraints. |
I did a little reading, and it seems that connections are closed by default in HTTP 1.0 and kept alive by default in HTTP 1.1. And I'm guessing that As with regards to the second part of your comment, please note that I have experienced the same problem when using For now, I have added the following workaround on my system:
Thanks. |
@cgewecke - just to finalize this issue (also for future readers):
This problem seems to be of the following nature:
I believe that a possible fix for this problem is in the
Perhaps there's a missing handler for this request, for its socket, for its response or for its response's socket. In either case, I have not been able to resolve it. The fact that a "massive" test completes successfully, but only when it takes place, does the next test emit this error (immediately when it begins) should give some hints, but I'm not sure what. It seems that the "massive" test does not release the socket when it is held for a long period (cutting this test shorter resolves the problem). A simple workaround for this problem is to execute Unfortunately, this workaround is insufficient for If someone can find a way to apply this ("close and reopen after every test file") in Truffle source code itself, then it might be a good solution. I tried that too - in the |
I have managed to fix (or if you will, find a workaround for) the As mentioned before, this A deeper investigation has shown that it always happens as a result of a request consisting of A glimpse at Ganache source code reveals that Though I don't have any real evidence to support this, I think that it is possibly because an By the way, the status of this response is 0. I previously bumped into some GitHub thread referring to why you've decided not to ignore status 0 in Truffle (the reason being that a test might fail silently, if I remember correctly). I can't find this thread now, but you were in it, so you might find the remaining of this comment relevant. In any case, in order to workaround the Since Here is the extended workaround (for both problems), for any future readers:
Thanks UPDATE: It seems that even if a We can slightly extend the workaround above to handle both cases, by changing this:
To this:
As However, generally speaking, I get the feeling that while Ganache takes a very long time to complete these requests in some cases (more specifically, after a massive test is conducted), the connection is simply (and abruptly) terminated. I am not very "happy" with the workaround proposed above, and I believe that a better approach would be to:
UPDATE 2: For safety, extend this:
To this:
Or even to this:
|
@barakman Thanks so much. The workaround you've proposed seems reasonable to me. There might be some kind of connection timeout at the HTTP layer - I've also seen this disconnection when running long solidity loops that validate bytecode in a @barakman Out of curiosity, would making |
Thank you. That said, since it's optional, I guess that there's no harm done (i.e., Truffle users can choose that on their own risk). That said #2, I've already added an npm-post-install script to fix Truffle source code, so I'm not in any dire need for this feature (though, I suppose I'll have to do some maintenance work on that script every time I update Truffle version, so perhaps it WILL help me in the future). It would help for sure if you could check with Ganache developers what might cause the execution of Thank you for your help. |
I will. In your current suite, approximately how many blocks are being snapshotted / reverted? |
@cgewecke: |
Apologies @barakman - yes you could do that or estimate the number of transactions that occur in the suite, since ganache executes a single tx per block. I'd just like to give the ganache engineers a some guidance about what magnitude of tests triggers this. |
@cgewecke:
I could give you more accurate figures by getting the block number before and after, but that would take me a while (each one of them runs for about 15-20 minutes or so). Thanks |
That's perfect, thanks @barakman. |
Would there be an universal fix available any time soon? I have random In May everything was still okay, today - it's not :( |
@vicnaum Could you provide more detail about your suite or a link to project? At the moment we think this error is limited to very large suites. The principal reporter above has a battery of 50,000 tests. Do the same 3 tests fail each time? |
@cgewecke it's always different tests. Can be only one test failing, but can be at most five. Usually near three. I'm using Windows 10. The sources are here: https://github.com/vicnaum/hourlyPay |
@cgewecke : The error specified by vicnaum (connection error) does not seem to have any relation whatsoever with the issue described in this thread, which appears to be the result of limited resources (more precisely, the system runs out of HTTP connections). |
@vicnaum I think @barakman is correct - I looked through the
|
@cgewecke & @barakman & others having this issue: I haven't dug into this too deeply, but my guess is that either Truffle or the tests in question are creating new instances of Optimal resource management would be to take advantage of HTTP keep alive by reusing provider instances between tests rather than recreating them. I can say from experience that sending |
@benjamincburns Yes, it turns out this originates at web3 and they're fixing it in (It was keep-alive - the change). |
Closing this since it seems to have been addressed as a duplicate of the issue above. Let us know if it's still a problem. Thank you! |
@gnidan: In Truffle 5.x this is possibly fix, since this part of the code has changed, though I haven't verified that, as it requires a bit of work on both my contracts and my tests. To my understanding, you have released 4.1.15 specifically for this reason (i.e., for those who aren't rushing to upgrade their Solc and Web3 major versions). So you might want to keep this issue opened until fixed in the Truffle 4 branch (or at least leave a note somewhere to mention that this problem is as viable as ever). Thanks |
hey fyi I'm having this issue w/ Truffle v5.3.3 (core: 5.3.3) web3 in my project is ^1.2.6, seems like it might be connected to assert manager function recently added.. this has been the first time that tests have been expected to perform any logic past the return to this (await ofc) function in the middle of the test functions.
anyone aware of anything recently changed that could cause this to re-appear? |
Issue
On a Ubuntu Linux environment (Trusty), tests randomly fail with this
ExtendableError
:Specifically, I have a Travis-CI (continuous integration) setup and this is where the tests are failing. My local Mac OSX environment passes these tests with no problem. Every once and a while, they will fail with the same error, but I just run the tests again and they pass.
I'd say it happens like 10-15% of the time on Mac OSX, but it happens like 60-80% of the time on the Travis-CI linux env.
It feels like it used to have this error less on earlier Truffle versions. I just updated to 4.0.4 and it seems way more often now.
Steps to Reproduce
Expected Behavior
Tests should pass like they do on Mac OSX env.
Actual Results
I test this on my local machine (mac osx), when all tests pass which they do, I push up to Github. Then it fires off a Travis-CI test on the linux env and fails pretty much every time.
Environment
Travis-CI Env (fails)
Mac OSX Env (passes)
$ gcc --version:
The text was updated successfully, but these errors were encountered: