Fix bug in probe heap comparison function #3115

mbgrydeland · 2019-11-05T16:11:07Z

A case has been observed where the probe scheduling stopped. The system had some probes with very long interval (24h), and some with short intervals. It is suspected that the cause was the bug that this PR fixes, where the heap bubbled the wrong probe to the top giving a very long (24h) wait time before the next probe scheduling.

An attempt was made at creating a test case for this, but it proved very unreliable and timing sensitive so it was abandoned.

Fix the probe scheduler heap comparison function to be consistent with regard to different running state of the two arguments. With this fix, probes that are not running will always bubble to the top before those that are already running.

rezan · 2019-11-05T16:55:49Z

#2976 (comment)

nigoroll · 2019-11-05T17:31:22Z

I am pretty sure that at one point in time I had more or less the same code (IIUC more like if (aa->running != bb->running) return (!aa->running)), but somehow convinced myself that the existing code was correct.

Can you explain why it is not?

hermunn · 2019-11-05T20:13:17Z

Can you explain why it is not?

In my not so humble opinion, I think it should be the other way around, why would the current code work? Let me explain.

If a function has the name compare and is used as a compare function in some kind of algorithm, then it should be a compare function.

In the existing code, if one probe is running and one is not, then it is possible that both vbp_cmp(a,b) and vbp_cmp(b,a) return zero, which normally would indicate that they are equal or equivalent. Here, the problem might be that all of vbp_cmp(a,b), vbp_cmp(b,c) and vbp_cmp(c,a) can return all 0, while vbp_cmp(b,a) evaluates to 1, and this can happen both when exactly one or two of the three are running.

The code in the proposed fix is clear, and it clearly a true compare function. If one is running and the other is not, use that. If they are the same, use the due time.

It is possible that two different probes become equal in priority, but they will form equivalence classes (they all share running == 0 and the due time).

i think it should be uncontroversial to just merge this (but I am sharing office with @mbgrydeland, so not my call).

nigoroll · 2019-11-06T11:46:16Z

Just to illustrate, the resulting truth table change is (with aa and bb denoting the running state and due denoting the comparison of the due values)

before:

aa      bb      return
 0      0       due
 0      1       due
 1      0       0
 1      1       due

after:

aa      bb      return
 0      0       due
 0      1       1
 1      0       0
 1      1       due

Now I cannot understand any more why I ever thought the existing code was correct.

Correct the probe heap comparison function

784390a

Fix the probe scheduler heap comparison function to be consistent with regard to different running state of the two arguments. With this fix, probes that are not running will always bubble to the top before those that are already running.

nigoroll approved these changes Nov 6, 2019

View reviewed changes

bsdphk self-requested a review November 11, 2019 07:36

bsdphk approved these changes Nov 11, 2019

View reviewed changes

mbgrydeland merged commit f402e5f into varnishcache:master Nov 11, 2019

mbgrydeland deleted the master-probefix branch November 11, 2019 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in probe heap comparison function #3115

Fix bug in probe heap comparison function #3115

mbgrydeland commented Nov 5, 2019

rezan commented Nov 5, 2019

nigoroll commented Nov 5, 2019

hermunn commented Nov 5, 2019

nigoroll commented Nov 6, 2019 •

edited

Fix bug in probe heap comparison function #3115

Fix bug in probe heap comparison function #3115

Conversation

mbgrydeland commented Nov 5, 2019

rezan commented Nov 5, 2019

nigoroll commented Nov 5, 2019

hermunn commented Nov 5, 2019

nigoroll commented Nov 6, 2019 • edited

nigoroll commented Nov 6, 2019 •

edited