Vacuum compressed chunks missing stats #1804
Conversation
09efb41
to
178a721
Compare
1cb22b4
to
c05ec6e
Compare
pkg/vacuum/vacuum.go
Outdated
// next, we look for compressed chunks that have never been vacuum and seem to be missing | ||
// statistics. autovacuum will ignore these until they hit vacuum_freeze_min_age, so let's | ||
// catch these early | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for, I believe can starve out the loop above and vice-versa,.This should probably be one for loop that alternates between the two queries. Or maybe the loop above is more important and should starve-out this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. I think the first loop is more important. I will limit the second for loop to 3 loops. If it finishes its work, the run ends. If it doesn't finish in 3 iterations, it will give the first loop another turn before continuing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take another look? @cevian
9d6e6fe
to
5201ed5
Compare
94a0812
to
2dcc607
Compare
2dcc607
to
8c9ee22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 ❤️ 🔥
d13dfa5
to
d8ed2ca
Compare
pkg/vacuum/vacuum.go
Outdated
|
||
// if this workload already finished, don't continue working it, work the other | ||
if w.finished { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this mean chunksMissingStats can indefinitely starve out chunksToFreeze after chunksToFreeze.finish becomes true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. dangit.
We have seen instances in which compresses chunks are missing statistics. Autovacuum ignores tables that are missing statistics. Analyzing these tables does not help since we rarely modify chunks after they are compressed. Therefore, these chunks are ignored until they pass the vacuum_freeze_max_age. This can be bad for performance. So, the vacuum engine looks for these chunks and vacuums them. We only use one worker for these.
d8ed2ca
to
644dc5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the metrics could be improved but wont hold up the PR for that
return | ||
log.Error("msg", fmt.Sprintf("failed to list %s", w.name), "error", err) | ||
w.stop = true | ||
continue | ||
} | ||
tablesNeedingVacuum.Set(float64(len(chunks))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we add separate metrics as well to count the work for each type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I'll do a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
We have seen instances in which compresses chunks are missing statistics. Autovacuum ignores tables that are missing statistics. Analyzing these tables does not help since we rarely modify chunks after they are compressed. Therefore, these chunks are ignored until they pass the vacuum_freeze_max_age. This can be bad for performance. So, the vacuum engine makes a second pass looking for these chunks and vacuums them. We only use one worker for these.
See also: timescale/promscale_extension#595
Merge requirements
Please take into account the following non-code changes that you may need to make with your PR: