New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WS_Assert(), cache/cache_ws.c line 59 #1834
Comments
@lkarsten close? |
I don't think we've squashed this one. I asked the reporter to check if it still happens with 4.1HEAD, timeout in 10d if nothing heard. |
Just had it happen on 4.1.2 |
We do see this panic on our custom 4.1 branch (some internal functionalities on top of upstream/4.1). As suggested in the old ticket (Trac), we're checking whether 96 KB for both client and backend workspaces help.
|
@rnsanchez and @porcospino - Can you please try the 4.1.3-beta1 version and see if that helps? See https://www.varnish-cache.org/releases/index.html#releases . There isn't a specific fix for this in there, but it may be that one of the many other fixed bugs is provoking this. |
@lkarsten I need to review the commits up to 4.1.3-beta1 before rolling out to this server, but I will keep you posted. |
@rnsanchez Is still current? Did you rebase your branch to 4.1.3? Do you still see it with the raised workspace sizes you describe in #2008 ? |
timing this out. |
backport review: Timed out, nothing to backport |
We have a theory and a likely culprit for this problem. Basically, if planets^Wpointers align, session initialization [1,2] may overwrite the end-of-workspace marker with a NUL char in 4.1. This is also true in current master [3] and possibly even more unsafe for H2. This is basically because we disregard the result of the workspace reservation. [1] https://github.com/varnishcache/varnish-cache/blob/9dc2ec5/bin/varnishd/cache/cache_session.c#L154-L157 |
FWIW, the attached test case triggers the assertion on my machine. I am not suggesting this test case for inclusion, as the sizes used is waaaay to fragile to be useful as a regression test. The size of the bogus allocation needs to be exactly large enough to have zero WS left at the right point in order to trigger the bug. I ended up on 208 bytes by trial and success ;-) |
HTC_RxInit could write a single '\0' NUL character outside of the workspace when its called and there is zero bytes left in the workspace. This would trigger the workspace canary causing subsequent assertion. Fix by releaving HTC_RxInit of adding the '\0' character. HTC_RxStuff now returns HTC_S_Overflow early if the available buffer space is zero. The '\0' character is inserted just before calling the completion check function. Also fix an off-by-one error on the http_{req|resp}_size calculations, where the maximum number of bytes accepted was one less than the paramter indicated. c00039.vtc and c00040.vtc has been edited to reflect that and to be more expressive about the sizes they generate. Fixes: varnishcache#1834
HTC_RxInit could write a single '\0' NUL character outside of the workspace when its called and there is zero bytes left in the workspace. This would trigger the workspace canary causing subsequent assertion. Fix by releaving HTC_RxInit of adding the '\0' character. HTC_RxStuff now returns HTC_S_Overflow early if the available buffer space is zero. The '\0' character is inserted just before calling the completion check function. Also fix an off-by-one error on the http_{req|resp}_size calculations, where the maximum number of bytes accepted was one less than the paramter indicated. c00039.vtc and c00040.vtc has been edited to reflect that and to be more expressive about the sizes they generate. Fixes: varnishcache#1834
HTC_RxInit and HTC_RxReInit could write a single '\0' NUL character outside of the workspace when its called and there is zero bytes left in the workspace. This would trigger the workspace canary causing subsequent assertion. Fix by releaving HTC_RxInit and HTC_RxReInit of adding the '\0' character. HTC_RxStuff and V1F_FetchRespHdr returns HTC_S_OVERFLOW if the available buffer space is zero. Both make sure to insert the '\0' character just before calling the completion check function. Note that this fix does not change the fact that we have exchausted the workspace and are unable to continue. Varnishd will panic nonetheless, but at least we have not stepped out of our boundries. Ref: #1834
FYI: The final commit, c1fa324 is a backport to 4.1. Thanks again @mbgrydeland! |
Old ticket imported from Trac
See archived copy here: https://varnish-cache.org/trac/ticket/1834
The text was updated successfully, but these errors were encountered: