Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed partial delete issues on compactor; Added Upload/Delete tests. #1525

Merged
merged 1 commit into from
Sep 16, 2019

Conversation

bwplotka
Copy link
Member

@bwplotka bwplotka commented Sep 16, 2019

Fixes #1331

Problem that we are fixing is explained in the linked issue.

Also the problem will be fully fixed once this is also done: #1394 This is because although major parts of Thanos for syncing blocks is resilient to partial uploads, there are some points when we don't handle it properly (crash e.g compactor)

cc @lx223 @metalmatze

Signed-off-by: Bartek Plotka bwplotka@gmail.com

// only if they don't have meta.json. If meta.json is present Thanos assumes valid block.
// * This avoids deleting empty dir (whole bucket) by mistake.
func Delete(ctx context.Context, logger log.Logger, bkt objstore.Bucket, id ulid.ULID) error {
if err := bkt.Delete(ctx, path.Join(id.String(), MetaFilename)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to put path.Join(id.String(), MetaFilename) into a variable, to avoid mismatches in the future with the second time below that you use this path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Fixed

Copy link
Contributor

@metalmatze metalmatze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small improvement/nit. Otherwise this looks good! Nice fix!

Fixes #1331

Problem that we are fixing is explained in the linked issue.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
@SuperQ
Copy link
Contributor

SuperQ commented Sep 16, 2019

Is there any race between the sidecar uploading (no meta.json yet) and the compactor deleting?

@bwplotka
Copy link
Member Author

bwplotka commented Sep 16, 2019

Is there any race between the sidecar uploading (no meta.json yet) and the compactor deleting?

@SuperQ Yea, detecting if the block is partially uploaded (malformed) vs upload is in progress is in current model quite tricky. We do that currently by just waiting for some time called consistencyDelay (configurable on compactor:

consistencyDelay := modelDuration(cmd.Flag("consistency-delay", fmt.Sprintf("Minimum age of fresh (non-compacted) blocks before they are being processed. Malformed blocks older than the maximum of consistency-delay and %s will be removed.", compact.MinimumAgeForRemoval)).
). For now the default is 30m.

This has proven to be ok for all users (as per: no known issue because of that). Once upload takes more than 30 minutes, you might want to increase such consistency delay. This delay is also useful for object storages that are eventually consistent.

Good question is: can we warn/document these details about uploads longer than 30 minutes ):

@SuperQ
Copy link
Contributor

SuperQ commented Sep 16, 2019

What about a simple locking method of renaming meta.json to meta.json.deleted, then cleanup the index/chunks, then finally delete the meta.json.deleted file.

This would also allow for deletions to continue in the event of a restart.

@bwplotka bwplotka merged commit bfb5645 into master Sep 16, 2019
@bwplotka
Copy link
Member Author

I merged already, but happy to discuss alternative ideas @SuperQ

What about a simple locking method of renaming meta.json to meta.json.deleted, then cleanup the index/chunks, then finally delete the meta.json.deleted file.

Do you mean for deleting process? Not sure how is that better (: We want to get rid meta.json ASAP. Once we remove meta.json, block is nicely handled as malformed even if deletion would be disrupted at any point (e.g by pod crash/restart). In fact we would be fine by just deleting meta.json and allow compactor to assume as malformed block in next iteration.

@bwplotka bwplotka deleted the issue1331-upload-fix branch September 16, 2019 16:29
@bwplotka
Copy link
Member Author

We also discussed some alternatives here: https://thanos.io/proposals/201901-read-write-operations-bucket.md/

@bwplotka
Copy link
Member Author

There is still some slight chance to restart compactor in a certain moment that will cause overlaps (in the worst case). For that to work we need to implement action items for design doc: https://thanos.io/proposals/201901-read-write-operations-bucket.md/

wbh1 pushed a commit to wbh1/thanos that referenced this pull request Sep 17, 2019
…hanos-io#1525)

Fixes thanos-io#1331

Problem that we are fixing is explained in the linked issue.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3/GCS: Upload/Delete inconsistency (missing chunk file)
3 participants