Backport bug fixes for a v1.1.1 release #2160

dominiklohmann · 2022-03-22T11:11:12Z

In large-scale deployments we observed the disk monitor erase request to return a timeout error, cancelling the ongoing request. This is a two-fold bug fix: First, we must not use a timeout for process-internal actor communication, especially not for such complicated nested loops that when partially executed and then never resumed leave the actor in an undefined state. Second, we must treat erase requests with a high priority because they should never be upheld by queued requests on the read or write path.

If this does not fix the bug, then a release with these changes will at the very least help us track down the actual source of the issue because it's no longer being shadowed from the request timeout error.

📝 Checklist

All user-facing changes have changelog entries.
The changes are reflected on docs.tenzir.com/vast, if necessary.
The PR description contains instructions for the reviewer, if necessary.

🎯 Review Instructions

Run on our testbed.

In large-scale deployments we observed the disk monitor erase request to return a timeout error, cancelling the ongoing request. This is a two-fold bug fix: First, we must not use a timeout for process-internal actor communication, especially not for such complicated nested loops that when partially executed and then never resumed leave the actor in an undefined state. Second, we must treat erase requests with a high priority because they should never be upheld by queued requests on the read or write path.

Co-authored-by: Benno Evers <benno.evers@tenzir.com>

lava

Changes look good; there are not unit tests but we verified manually that they work and improve disk monitor behavior on the testbed.

dominiklohmann added the bug Incorrect behavior label Mar 22, 2022

dominiklohmann requested a review from lava March 22, 2022 11:11

dominiklohmann changed the base branch from master to v1.1.x March 22, 2022 11:14

dominiklohmann force-pushed the topic/disk-monitor-prio branch from 8269268 to 0b6e5ce Compare March 22, 2022 12:03

dominiklohmann force-pushed the topic/disk-monitor-prio branch from 0b6e5ce to 7b40ec8 Compare March 22, 2022 12:28

dominiklohmann added 2 commits March 22, 2022 14:30

Log when query supervisors returned to the index

c556eea

Continue in disk monitor after a failed erasure

a87dd22

lava mentioned this pull request Mar 23, 2022

Add blacklist to disk monitor #2161

Merged

3 tasks

dominiklohmann force-pushed the topic/disk-monitor-prio branch from 554fc01 to d7d84c6 Compare March 23, 2022 13:50

Keep track of busy query workers separately

05b6c6e

dominiklohmann force-pushed the topic/disk-monitor-prio branch from d7d84c6 to 05b6c6e Compare March 23, 2022 14:03

lava marked this pull request as ready for review March 23, 2022 16:15

Allow multiple queries in query supervisor

22c3101

Co-authored-by: Benno Evers <benno.evers@tenzir.com>

dominiklohmann force-pushed the topic/disk-monitor-prio branch from 16b66c0 to 22c3101 Compare March 24, 2022 08:54

Document bug fixes

42c97ae

lava approved these changes Mar 24, 2022

View reviewed changes

Fix compilation with {fmt} 7 + gcc 10

5bf8f33

dominiklohmann force-pushed the topic/disk-monitor-prio branch from ad33b0d to 5bf8f33 Compare March 24, 2022 10:00

dominiklohmann added 3 commits March 24, 2022 16:39

Cancel backlogged queries from terminated queries

2f7e172

Add third changelog entry

a5dd401

Demote new log messages to debug level

332a898

dominiklohmann changed the title ~~Give erase requests a high priority~~ Backport bug fixes for a v1.1.1 release Mar 25, 2022

Prepare VAST v1.1.1 release

a9a1120

dominiklohmann force-pushed the topic/disk-monitor-prio branch from 7e28a00 to a9a1120 Compare March 25, 2022 16:04

dominiklohmann merged commit 7b99e63 into v1.1.x Mar 25, 2022

dominiklohmann deleted the topic/disk-monitor-prio branch March 25, 2022 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport bug fixes for a v1.1.1 release #2160

Backport bug fixes for a v1.1.1 release #2160

dominiklohmann commented Mar 22, 2022 •

edited

lava left a comment

Backport bug fixes for a v1.1.1 release #2160

Backport bug fixes for a v1.1.1 release #2160

Conversation

dominiklohmann commented Mar 22, 2022 • edited

📝 Checklist

🎯 Review Instructions

lava left a comment

Choose a reason for hiding this comment

dominiklohmann commented Mar 22, 2022 •

edited