VSL fails to get hold of SHM #2241

Closed
bsdphk opened this Issue Mar 2, 2017 · 1 comment

Comments

Projects
None yet
4 participants
@bsdphk
Contributor

bsdphk commented Mar 2, 2017

Sometimes the VSL thread in varnishtest simply doesn't get hold of anything.

I have tried to postpone startup of the VSL thread until I am sure there is something in the VSL, but it still happens.

To provoke the bug add this patch:

diff --git a/bin/varnishtest/vtc_varnish.c b/bin/varnishtest/vtc_varnish.c
index 544a63c40..1a466ba4e 100644
--- a/bin/varnishtest/vtc_varnish.c
+++ b/bin/varnishtest/vtc_varnish.c
@@ -86,6 +86,7 @@ struct varnish {
    unsigned                vsl_tag_count[256];
 
    volatile int            vsl_idle;
+   volatile int            vsl_rec;
 };
 
 #define NONSENSE   "%XJEIFLH|)Xspa8P"
@@ -233,6 +234,7 @@ varnishlog_thread(void *priv)
                            break;
 
                    v->vsl_idle = 0;
+                   v->vsl_rec = 1;
 
                    tag = VSL_TAG(c->rec.ptr);
                    vxid = VSL_ID(c->rec.ptr);
@@ -571,7 +573,7 @@ varnish_start(struct varnish *v)
    macro_def(v->vl, v->name, "port", "%s", p);
    macro_def(v->vl, v->name, "sock", "%s %s", h, p);
    /* Wait for vsl logging to get underway */
-   while (v->vsl_idle == 0)
+   while (v->vsl_rec == 0)
            VTIM_sleep(.1);
 }

Create this VTC:

varnishtest "foo"

server s1 {
    rxreq
    txresp
} -start

varnish v1 -vcl+backend {} -start

client c1 {
    txreq
    rxresp
} -run

varnish v1 -stop

And run varnishtest -i -n 1000 -j 10 foo.vtc

hermunn added a commit that referenced this issue Apr 4, 2017

Fix race in the VSL segment counter initialization
Fix a stupid misstake where the segment counter in the VSL segment was
set just before the entire struct was cleared during
initialization. This didn't cause problems in the common case because
it would be set correctly on the very first log line produced. But it
did open a race where an unfortunate early log client would assume it
was lagging indefinetely behind varnishd on the log reading and report
overflow error. This was most commonly experienced with varnishtest.

Fixes: #2241
@hermunn

This comment has been minimized.

Show comment
Hide comment
@hermunn

hermunn Apr 4, 2017

Contributor

Backport review: The fix acd2198 has been backported as 3e19f33.

Contributor

hermunn commented Apr 4, 2017

Backport review: The fix acd2198 has been backported as 3e19f33.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment