New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert error in Lck_CondWait(), cache/cache_lck.c line 203 (on Mac OS X) #1853
Comments
@mjastrzebowski are you the original reporter of the (archived) ticket? |
The way I read the MacOS manpage, this should not be happening, also the panic trace provided shows that |
Backport review: nothing commited. |
Got access to a osx instance now, and had to satisfy my curiosity. On OSX pthread_cond_wait() can return non-null, the man page documents that EINVAL is a valid return code. There is no mention of errno being used, I think @nigoroll misread the man page on that point. Discussed this a bit with Martin. Since this happens in VBT_Wait() this could be an issue with our kqueue waiter and/or tcp pools implementation. It would require some effort to follow up on this, and since the original reporter has gone quiet and OSX is a nice-to-have platform, I think Nils did the right thing by closing it for now. |
I'm consistently getting this error on a number of OSX machines. Is there any way I can help debug / get to the root of the issue?
|
Reopened to get some fresh eyes on this. I'm not running Varnish on macos these days. |
Having this issue as well on several machines but with 4.1.9. Similar error as @yousefcisco.
|
@yousefcisco We are on the same MacOS as well I just noticed. Maybe the newest high sierra is not playing nice with Varnish. |
Thanks for the new panic infos. The original panic info did not have |
I've downgraded my version of varnish and I'm still having the error. @nigoroll do you think this is a different issue? |
@nigoroll @yousefcisco I upgraded my version of varnish and I'm still getting the issue. 90% of my requests result in a 502 Gateway error through nginx |
@yousefcisco I am still having major issues @nigoroll @fgsch @bsdphk |
Please note that OSX is not one of the supported platforms. See https://varnish-cache.org/docs/6.0/phk/platforms.html for details. |
Yeah we're still having this issue as well, some members on my team are thankfully on Sierra and don't have it. Everyone on High Sierra has it constantly. @winstonjz I reduced the frequency of this issue by getting nginx to resolve static files instead of going through varnish which worked for the most part. The other thing in common between all three panic errors (I think we need more) is the @fgsch @nigoroll can you shed some light on what |
|
@nigoroll My company is already with Fastly, would their professional services team be able to resolve this issue? FYI the |
@yousefcisco I cannot answer your question, I am not fastly. |
@nigoroll I was just going to try and see if this issue happens without the |
Somebody needs to reproduce this and get at the core dump with a debugger to find out what is going on. Alternatively they can stuff a bunch of VSL/printfs into their copy of the source-code. The fact that we only see it on OS/X does not mean it's not a generic bug, any bug which can cause this would be very sensitive to both the implementation and scheduling of threads. |
I am getting this error on varnish 6. it seems that it does not happen with varnish v4.1.10. |
Timing this one out |
This is still an issue on macos mojave 10.14.5, with git master:
The following test cases are failing:
|
I have added an assert on the ts_nsec field to make sure it is valid. I'm pretty sure we're not guilty of this one: https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87151984 but that is probably my best guess by now. |
Hi. Re-ran with latest master now, I'm afraid it is the same assert that is triggering:
b00048.vtc didn't fail this time. Pasting backtrace from c00004.vtc, in case you/someone can spot something different this time around:
|
Re-running with current master, on macos mojave 10.14.5, the problem is still there. There may be value in checking this on macos catalina/10.15.0, but that is still breaking lots of apps around so I haven't upgraded. Tests that fail are: b00013 b00035 b00048 c00008 c00027 d00002 e00026 e00031 g00007 r00476 r00612 r00984 r01038 r01123 r01442 r01637 r01764 r02372 s00000 Sample from failing c00008.vtc:
|
pow-wow: will have another look |
FYI: https://opensource.apple.com/source/libpthread/libpthread-330.250.2/src/pthread_cond.c.auto.html (still need to check with enough time) |
can anyone confirm that the issue is in fact fixed? |
is the travis setup helpful here? |
travis is helpful, but I checked the builds before and after e5e545f and saw no difference. |
like the sign is wrong, we should never get EINVAL if the deadline is in the future. Fix the test to make sense, and hope for the best (or a M1 :-) Spotted by: Rasmus Villemoes
Old ticket imported from Trac
See archived copy here: https://varnish-cache.org/trac/ticket/1853
The text was updated successfully, but these errors were encountered: