Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buggy OpenMP in _LikelihoodFunction::ComputeBlock() #5

Closed
nlhepler opened this issue Sep 13, 2011 · 6 comments
Closed

Buggy OpenMP in _LikelihoodFunction::ComputeBlock() #5

nlhepler opened this issue Sep 13, 2011 · 6 comments

Comments

@nlhepler
Copy link
Contributor

Whilst running the 454 UDS pipeline, we get this with MP2, but not with DEBUG. Interesting, eh?

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00000009045f5740
[Switching to process 25604 thread 0x1903]
0x00000001002c4cb1 in _LikelihoodFunction::ComputeBlock ()
(gdb) bt
#0 0x00000001002c4cb1 in _LikelihoodFunction::ComputeBlock ()
#1 0x0000000100002da3 in gomp_thread_start ()
#2 0x00007fff8ca318bf in _pthread_start ()
#3 0x00007fff8ca34b75 in thread_start ()

@spond
Copy link
Member

spond commented Sep 13, 2011

Can you recompile MP2 with -g flags so we get the actual code referenced traceback? Thread issues are a bitch to debug. Sigh

@nlhepler
Copy link
Contributor Author

Whatever you did in likefunc.cpp seems to have fixed the issue.

L

On Mon, Sep 12, 2011 at 10:10 PM, Sergei Pond
reply@reply.github.com
wrote:

Can you recompile MP2 with -g flags so we get the actual code referenced traceback? Thread issues are a bitch to debug. Sigh

Reply to this email directly or view it on GitHub:
https://github.com/nlhepler/hyphy/issues/5#issuecomment-2078459

@nlhepler
Copy link
Contributor Author

Guess not, actually. Here's a full backtrace.

#0 0x00007fff919d8bca in __psynch_cvwait ()
#1 0x00007fff8ca35274 in _pthread_cond_wait ()
#2 0x0000000100003c9b in gomp_sem_wait ()
#3 0x0000000100003d7c in gomp_barrier_wait_end ()
#4 0x0000000100003a9c in gomp_team_end ()
#5 0x0000000100099a2e in _LikelihoodFunction::ComputeBlock (this=0x7fff5fbf2738, index=0, siteRes=0x7fff5fbf2800, currentRateClass=3, branchIndex=140734799751168, branchValues=0x7fff5fbf2800) at likefunc.cpp:7689
#6 0x000000010008d811 in _LikelihoodFunction::Compute (this=0x7fff5fbf28f0) at likefunc.cpp:1889
#7 0x00000001000a5fd5 in _LikelihoodFunction::SetParametersAndCompute (this=0x100ff7800, index=768, value=6.9532229731612758e-310, baseLine=0x300, direction=0x0) at likefunc.cpp:5050
#8 0x00000001000a6de8 in _LikelihoodFunction::Bracket (this=0x7fff5fbf3010, index=768, left=@0x7fff5fbf3010, middle=@0x7fff5fbf30e8, right=@0x7fff5fbf3010, leftValue=@0x7fff5fbf3010, middleValue=@0x7fff5fbf30d8, rightValue=@0x7fff5fbf30d0, initialStep=@0x7fff5fbf30c8, gradient=0x0) at likefunc.cpp:5229
#9 0x00000001000a7609 in _LikelihoodFunction::LocateTheBump (this=0x100ff7800, index=140734799754288, gPrecision=6.9532229732940806e-310, maxSoFar=@0x7fff5fbf3430, bestVal=@0x7fff5fbf3430, bracketSetting=6.9532229732940806e-310) at likefunc.cpp:6459
#10 0x0000000100091019 in _LikelihoodFunction::Optimize (this=0x7fff5fbf47d0) at likefunc.cpp:4690
#11 0x000000010002d10a in _ElementaryCommand::Execute (this=0x7fff5fbf5e00, chain=@0x7fff5fbf5e00) at batchlan.cpp:6671
#12 0x00000001000317d8 in _ExecutionList::Execute (this=0x7fff5fbf5f10) at batchlan.cpp:1223
#13 0x0000000100030fb0 in _ElementaryCommand::ExecuteCase39 (this=0x7fff5fbf67c0, chain=@0x7fff5fbf67c0) at batchlan.cpp:3335
#14 0x000000010002e2d7 in _ElementaryCommand::Execute (this=0x7fff5fbf7df0, chain=@0x7fff5fbf7df0) at batchlan.cpp:6963
#15 0x00000001000317d8 in _ExecutionList::Execute (this=0x7fff5fbf7f00) at batchlan.cpp:1223
#16 0x0000000100030fb0 in _ElementaryCommand::ExecuteCase39 (this=0x7fff5fbf87b0, chain=@0x7fff5fbf87b0) at batchlan.cpp:3335
#17 0x000000010002e2d7 in _ElementaryCommand::Execute (this=0x7fff5fbf9de0, chain=@0x7fff5fbf9de0) at batchlan.cpp:6963
#18 0x00000001000317d8 in _ExecutionList::Execute (this=0x7fff5fbf9ef0) at batchlan.cpp:1223
#19 0x0000000100030fb0 in _ElementaryCommand::ExecuteCase39 (this=0x7fff5fbfa7a0, chain=@0x7fff5fbfa7a0) at batchlan.cpp:3335
#20 0x000000010002e2d7 in _ElementaryCommand::Execute (this=0x7fff5fbfbdd0, chain=@0x7fff5fbfbdd0) at batchlan.cpp:6963
#21 0x00000001000317d8 in _ExecutionList::Execute (this=0x7fff5fbfbee0) at batchlan.cpp:1223
#22 0x0000000100030fb0 in _ElementaryCommand::ExecuteCase39 (this=0x7fff5fbfc790, chain=@0x7fff5fbfc790) at batchlan.cpp:3335
#23 0x000000010002e2d7 in _ElementaryCommand::Execute (this=0x7fff5fbfddc0, chain=@0x7fff5fbfddc0) at batchlan.cpp:6963
#24 0x00000001000317d8 in _ExecutionList::Execute (this=0x7fff5fbfe200) at batchlan.cpp:1223
#25 0x00000001001da2d0 in main (argc=1606410672, argv=0x7fff5fbffb80) at unix.cpp:683

@spond
Copy link
Member

spond commented Sep 13, 2011

Yep, this is a data race/write condition of some sort. Need helgrind...

@nlhepler
Copy link
Contributor Author

Do we really need to recompile gcc (since we're using Mac OS X (not linux with its futexes))? Also, no (?:(?:hell|val)grind|drd) on 10.7 yet. Le sigh, le sigh.

@nlhepler
Copy link
Contributor Author

Since we can't debug the issue, and other projects (notably Blender) are just disabling OpenMP in Lion until something is fixed, I am going to disable OpenMP for Lion builds.

spond added a commit that referenced this issue Nov 8, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants