Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify compactor and respect resolutions #195

Merged
merged 4 commits into from
Mar 9, 2018
Merged

Conversation

fabxc
Copy link
Collaborator

@fabxc fabxc commented Jan 30, 2018

This extends the compactor to consider downsample resolutions to the compaction grouping. With this it is safe to run the compactor against a bucket with data of different resolutions.

Furthermore #183 is likely to be cased by semantical races in the compactor. The PR drastically simplifies the logic and gets rid of any disk caching of meta.json files and just rebuilds groupings on demand. It is very unlikely that this has any noticable performance impact.

@fabxc
Copy link
Collaborator Author

fabxc commented Jan 30, 2018

Note that this does not yet create downsamplings on the fly during compaction. But running sequences of thanos downsample and thanos compact will now work properly.

@fabxc fabxc changed the base branch from verifyhalt to master March 1, 2018 13:14
@fabxc
Copy link
Collaborator Author

fabxc commented Mar 1, 2018

Rebased, PTAL

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Maybe two minor questions and some "eureka" moments (: LGTM.

return errors.Wrap(err, "build compaction groups")
}
for _, g := range groups {
os.RemoveAll(dataDir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that here? (: We don't need any cache anymore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... ahh are we caching only meta json in memory and download them + blocks on compaction only.

That makes sense, cool 👍

// groupKey returns a unique identifier for the group the block belongs to. It considers
// the downsampling resolution and the block's labels.
func groupKey(meta *block.Meta) string {
return fmt.Sprintf("%d@%s", meta.Thanos.Downsample.Resolution, labels.FromMap(meta.Thanos.Labels))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhm.. you are not hashing it. Are we sure order will not be different every call to labels.FromMap ? (: Maybe hash is more safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, checked that method. Output is sorted... OK 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's the general contract for all constructors from the labels package – and really for anywhere we deal with labels.

}
return res
sort.Slice(res, func(i, j int) bool {
return res[i].Key() < res[j].Key()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want this order?

1m@block1
1m@block2
1m@block3
5m@block1
5m@block2
...

Maybe we should put labels in the front of groupKey? Then we group by labels first.

I don't know which one is better ;p Just checking if there are any rationales behind that order (:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh, no strict rationale I guess. But it likely makes sense to treat lower resolution blocks with higher priority since the most recent ones are queried the most.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

return g, nil
}

// Key returns an identifier for the group.
func (cg *Group) Key() string {
return fmt.Sprintf("%s@%d", cg.labels, cg.resolution)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, cannot we use the same function/form as in groupKey function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched form but left it as 2 functions. Just wasn't practical to consolidate.

@fabxc fabxc force-pushed the compactdownsample branch 2 times, most recently from 8357d16 to 4af3535 Compare March 5, 2018 13:34
@fabxc fabxc merged commit 0384cd8 into master Mar 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants