Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact: skip compaction for blocks with no samples #904

Merged

Conversation

mjd95
Copy link
Contributor

@mjd95 mjd95 commented Mar 11, 2019

Changes

During the Thanos group compactor, in the case when the Prometheus compactor has found that the resulting block would have no data, delete the source blocks (after checking they are indeed empty) and exit early. Introduce a sentinel error to mark this case, to force the caller (bucket compactor) to continue compacting the rest of that group.

It is necessary to handle the empty compaction case separately because the later parts of the group compactor result in an error if we attempt to continue.

Note that the Prometheus compactor uses the empty ULID for the "block with no samples" case:

    // When resulting Block has 0 samples
    //  * No block is written.
    //  * The source dirs are marked Deletable.
    //  * Returns empty ulid.ULID{}.
    Compact(dest string, dirs []string, open []*Block) (ulid.ULID, error)

Verification

Ran the compactor against a bucket where some of the compacted blocks would end up empty.

devnev
devnev previously requested changes Mar 11, 2019
Copy link
Contributor

@devnev devnev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the case we were observing was essentially a corrupted block, where we expect data based on the block but see none in the chunks. As we were discussing, this probably shouldn't be silenced entirely, rather the error reporting should not affect the processing of independent blocks. @bwplotka ?

@bwplotka
Copy link
Member

bwplotka commented Mar 11, 2019

Wow, ok I think we have 2-3 separate cases here.

  1. When resulting Block has 0 samples case for compactor - we probably don't handle it well on Thanos. We should.
  2. We use empty ID as "sentinel" of something totally separate - "nothing more to compact in the group", rather than compaction resulted in 0 samples block.
  3. @devnev mention about corrupted block. How this is relevant to this issue? Because compaction of your corrupted block results in 0 samples block, right? If yes we should think.. if compactor should do that and treat them in same way as valid 0 sample blocks. I am not entirely when to check that then. The only validation we do is index validation, but we never check if chunks are already there. I really expect compaction to fail for such block, so maybe your block is not corrupted.. it is just actually empty? meta.json.. how many samples it reports?

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally OK, just some suggestions.

pkg/compact/compact.go Outdated Show resolved Hide resolved
pkg/compact/compact.go Outdated Show resolved Hide resolved
@@ -34,6 +34,7 @@ const (
)

var blockTooFreshSentinelError = errors.New("Block too fresh")
var emptyBlockSentinelULID = ulid.MustNew(123, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentinel is for noMoreCompactionForGroup I guess? can we name it like this or something similar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so I think there is misformation here , I would prefer something opposite like noMoreCompactionForGroup instead of this? What do you think? OR even bool flag return argument to make it more explicit (:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a separate bool flag as I think that's the clearest what's happening. The naming was a little awkward but hopefully it makes sense, let me know what you think :)

@@ -513,23 +513,23 @@ func (cg *Group) Resolution() int64 {

// Compact plans and runs a single compaction against the group. The compacted result
// is uploaded into the bucket the blocks were retrieved from.
func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (ulid.ULID, error) {
func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (bool, ulid.ULID, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need ulid.ULID now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally tried to remove the ulid.ULID but had to add it back in as we need it for compact_e2e_test.go (as written right now). In the case when the compactor actually does something, the test asserts that the new block was actually uploaded, and it needs the id for that.

I thought about changing the test, but it seemed worthwhile keeping the coverage for the upload stage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, let's not end in rabbit whole here.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can stick to (bool, error) return args now, right @mjd95 ?

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

One small nit, but we can fix it later on, not a blocker

@@ -513,23 +513,23 @@ func (cg *Group) Resolution() int64 {

// Compact plans and runs a single compaction against the group. The compacted result
// is uploaded into the bucket the blocks were retrieved from.
func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (ulid.ULID, error) {
func (cg *Group) Compact(ctx context.Context, dir string, comp tsdb.Compactor) (bool, ulid.ULID, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, let's not end in rabbit whole here.

}

compID, err := cg.compact(ctx, subDir, comp)
shouldRerun, compID, err := cg.compact(ctx, subDir, comp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this as this indicate that something wrong happen and we need to retry.. non empty ulid was actually more sensible maybe? Or at least we should change var name... Not a blocker.

@bwplotka bwplotka dismissed devnev’s stale review March 12, 2019 16:57

Initial review

@bwplotka bwplotka merged commit bc3aaab into thanos-io:master Mar 12, 2019
earthdiaosi pushed a commit to earthdiaosi/thanos that referenced this pull request Mar 14, 2019
* skip compaction for blocks with no samples

* update to actually delete the empty input blocks, and to correctly handle from bucket compactor

* warn on error deleting empty block

* use ULID instead of error for emptyBlockSentinel

* don't use a global variable

* full stop at end of comment

* use boolean to indicate whether there is more compaction work

* rename variables

* fix test
@mjd95 mjd95 deleted the skip-compaction-for-blocks-with-no-samples branch April 12, 2019 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants