mpirun -np 2 python -Wd setup.py test --trilinos hanging on sandbox under buildbot
#305
Comments
|
This is so weird. I can no longer reproduce the hang failure on sandbox even when setting PYTHONPATH. It clearly broke after setting the PYTHONPATH at the command line. That is the only time it hanged at the command line. Probably just intermittent. I messed with the PYTHONPATH in the crontab and then relaunched the slave and restarted the buildbot build. It didn't hang http://build.cmi.kent.edu:8010/builders/Ubuntu_x86_64~trunk~full/builds/12/steps/trial_2/logs/stdio Trac comment by wd15 on 01-20-2012 at 14:42 |
|
killed the buildslave and removed the PYTHONPATH from the crontab. The original configuration. New buildslave started up again. Let's see what happens. Trac comment by wd15 on 01-20-2012 at 15:02 |
|
It fails on Mac OS X 10.6 with python 2.7 even if PYTHONPATH is unset completely. Could it be that the two (or more) processes are getting different PYTHONPATHs? A separate question is what is setting PYTHONPATH in the first place? As seen in http://build.cmi.kent.edu:8010/builders/Mac_OS_X%7Etrunk%7Efull/builds/8/steps/trial_2/logs/stdio, Trac comment by guyer on 01-20-2012 at 18:20 |
|
Using the following debugging patch in trilinosMatrix.py seems to always fail in the rowMap construction This tests actually run fairly rapidly with these debug statements. The slowness using the PRINT statements is caused by the sleep command. What to do next? Trac comment by wd15 on 01-25-2012 at 16:05 |
|
Going further, it stops hanging when the domainMap is set equal to the I just wanted to see what would happen if the domainMap was equal to Trac comment by wd15 on 01-25-2012 at 16:52 |
|
As of 5:45 the tests on sandbox and on Mac OS X seem to have got through the critical chemotaxis tests which bodes well. Trac comment by wd15 on 01-25-2012 at 17:45 |
|
It seems to have passed all the parallel tests. Jon needs to test this on his laptop. Trac comment by wd15 on 01-25-2012 at 17:57 |
|
The change was r5117. Unfortunately it looks like Ubuntu i686 may be hanging :-(. Trac comment by wd15 on 01-25-2012 at 18:03 |
|
Replying to wd15:
It all seems to have worked. We have to think about the following:
Trac comment by wd15 on 01-26-2012 at 10:37 |
|
Replying to wd15:
On my Mac OS X laptop, it stalls before the rowMap construction, but only if I used 7 processors. After dumping stack traces, I ended up modifying which seems to prevent the stall in chemotaxis, but then it stalls later in Trac comment by guyer on 01-26-2012 at 11:14 |
|
The following changes to trunk@5117 still hangs on chemotaxis with Trac comment by wd15 on 01-26-2012 at 12:27 |
|
This intermittent freeze has been documented in r5149, so I am removing it from milestone:3.0 Trac comment by guyer on 01-31-2012 at 17:54 |
|
As of #524, there appear to be no parallel hangs in the tests |
Hangs at the chemotaxis tests.
The problem seems to be that when the PYTHONPATH is explicitly set to "." then it hangs otherwise it runs through without issue. The crontab on sandbox sets the PYTHONPATH. I'm worried about simply removing that as python could pick up on the system version of fipy during buildbot runs. Maybe use a virtualenv.
Also reported in issue #264
Imported from trac ticket #425, created by wd15 on 01-20-2012 at 13:30, last modified: 01-30-2013 at 13:42
The text was updated successfully, but these errors were encountered: