Skip to content
Permalink
Browse files

Retire the kill(SIGNULL) test in VSM

A bug was uncovered in the VSM code that checks if kill(SIGNULL) would
work as a test of liveness of the master and worker processes. If the user
running the utility has the required permissions to send signal to the
worker process, but not the management process, the code would wrongly
assume it could do kill on both. It would then end up in a connect loop
never succeeding.

Because kill() can not always be successfully run (a common scenario when
the user running varnishlog is not the same UID as the cache processes),
there is a fall back to using fstat to check for Varnish restarts. Since
this fallback is active for most use cases anyways, it was decided to
retire the kill() mechanism rather than to fix it. This way the behaviour
does not change depending on what user the utility is run as.

This change was OK'd by PHK after discussing it on IRC.
  • Loading branch information...
mbgrydeland committed Sep 10, 2019
1 parent 904ceab commit b845c2789cae4f165e30a3d0f3af1bfbbeca98ee
Showing with 3 additions and 16 deletions.
  1. +0 −4 bin/varnishtest/tests/u00008.vtc
  2. +3 −12 lib/libvarnishapi/vsm.c
@@ -37,10 +37,6 @@ process p1 -expect-text 0 0 "VBE.vcl1.s1.req"
process p1 -expect-text 0 0 "DIAG"
process p1 -screen_dump

varnish v1 -stop
process p1 -expect-text 2 1 "Uptime child: Not Running"
process p1 -screen_dump

process p1 -write {dek}
process p1 -expect-text 0 1 "Concurrent connections to backend:"
process p1 -screen_dump
@@ -150,8 +150,6 @@ struct vsm {

int attached;
double patience;

int couldkill;
};

/*--------------------------------------------------------------------*/
@@ -357,8 +355,6 @@ VSM_New(void)
vd->child->vsm = vd;
vd->wdfd = -1;
vd->patience = 5;
if (getenv("VSM_NOPID") != NULL)
vd->couldkill = -1;
return (vd);
}

@@ -485,17 +481,13 @@ vsm_vlu_hash(struct vsm *vd, struct vsm_set *vs, const char *line)
int i;
uintmax_t id1, id2;

(void)vd;

i = sscanf(line, "# %ju %ju", &id1, &id2);
if (i != 2) {
vs->retval |= VSM_MGT_RESTARTED | VSM_MGT_CHANGED;
return (0);
}
if (vd->couldkill >= 0 && !kill(id1, 0)) {
vd->couldkill = 1;
} else if (vd->couldkill > 0 && errno == ESRCH) {
vs->retval |= VSM_MGT_RESTARTED | VSM_MGT_CHANGED;
return (0);
}
vs->retval |= VSM_MGT_RUNNING;
if (id1 != vs->id1 || id2 != vs->id2) {
vs->retval |= VSM_MGT_RESTARTED | VSM_MGT_CHANGED;
@@ -694,8 +686,7 @@ vsm_refresh_set(struct vsm *vd, struct vsm_set *vs)

vs->fst.st_size = lseek(vs->fd, 0L, SEEK_CUR);

if (vd->couldkill < 1 || !kill(vs->id1, 0))
vs->retval |= vs->flag_running;
vs->retval |= vs->flag_running;
return (vs->retval);
}

0 comments on commit b845c27

Please sign in to comment.
You can’t perform that action at this time.