-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compact/main: verify compacted and downsampled blocks #194
Conversation
for _, g := range sy.Groups() { | ||
if _, err := g.Compact(ctx, comp); err != nil { | ||
level.Error(logger).Log("msg", "compaction failed", "err", err) | ||
// The HaltError type signals that we hit a critical bug and should block | ||
// for investigation. | ||
if compact.IsHaltError(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally see the reason here, but I feel like this need to be automated a little bit.. I can see cases when people will start it and don't have (or remember to have) alert for halted compactor.. And from orchestration point of view it will be seen as healthy..
Not sure what is the best way to handle it, maybe halting that as an option? (configured by flag) or maybe just return error and fail everything. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a debugging point of view this is the best course of action I think. Remember that it should[tm] not happen to begin with. But if it did and we just exit with an error, we'll get rescheduled and debugging gets harder.
A less severe alternative would be marking ourselves as unhealthy. But in that case k8s will also just kill us eventually and reschedule.
I get that it's a bit of a dirty approach. But we do log an error and we are setting a respective metric.
Once we are fully confident that the issue is resolved we can maybe just exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I understand it, but let's hide it behind some debug flag then.
Added a hidden flag to enable this behavior. |
This runs the verification procedure whenever we encounter a block that is not valid. The compactor will block indefinitely and the downsample run will terminate with an error.
} | ||
// Ensure all input blocks are valid. | ||
if err := block.VerifyIndex(filepath.Join(pdir, "index")); err != nil { | ||
return id, errors.Wrapf(halt(err), "invalid plan block %s", pdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs the verification procedure before and after we create new blocks from previous blocks. The compactor will block indefinitely and the downsample
run will terminate with an error.