Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: Get rid of syncDelay and handle eventual consistency & partial upload differently. #377

Closed
bwplotka opened this issue Jun 15, 2018 · 4 comments

Comments

@bwplotka
Copy link
Member

syncDelay in compactor was used to mitigate two problems:

  • partial upload
  • eventual consistency across uploads

syncDelay in theory was fine because we just assumed that new blocks are delayed, but it causes lot's of overlaps in blocks because "new" block is based on ULID (creation time) which not necessarily meant new time range. Example failure scenario for this was:

  • compactor compacts blocks for some time range [t-100, t-50) (let's say block 'A'). The new block 'A' is new so sync delay is keeping it invisible for compactor. Some fresh block 'B' (after sync delay) appears in compactor for time range [t-20, t-10). It can happen that compactor assumes that time range A is simply missing and is irreversibly compacting blocks without A. Once A after sync delay is visible again, we got Overlap.

So we fixed it by additional condition is new block after syncDelay || compactionLevel > 1. This is inconsistent, because if the object store is NOT strongly consistent we can have overlaps anyway(!).

With new PR for auto repairs #375, we can have compactionLevel == 1 and still having broken compaction because newly repaired block needs to avoid syncDelay violating "eventual consistency handling". We supports that by adding Source meta to ThanosMeta.

There are various options how to fix this though and REMOVE syncDelay workaround totally. Will edit this issue to describe them.

@asbjxrn
Copy link

asbjxrn commented Jun 27, 2018

A related scenario is when a upload is in progress, so the sidecar has only uploaded the chunks and possibly the index file of a block.

Currently when the compactor comes across a partially uploaded block, the whole compaction process is aborted.

One example scenario for this is that a server crashes during the upload of the index file. In this case no compaction or downsampling can be done until the server is back up and the upload is reattempted and completed, or the block directory is manually cleaned up.

@bwplotka
Copy link
Member Author

bwplotka commented Jun 27, 2018

Yes indeed.

What do you think the sane logic should do in case of partially uploaded block?

I think we need to divide this problem (and conquer it):

A) First of all we need to detect in what state block is. Is it partially uploaded because it is being uploaded right now? (a) Or is it partially uploaded because of any disaster happening? (b)

B) The second question is what to do for after we detect what is going on:
For first case (a) we should just wait.
For the latter (b) the source of the block should detect the problem and heal this block (upload the correct one, or continue what was not uploaded). If source (sidecar or ruler) does not have the data anymore, well - log, increment metric about missing data and remove it.

There is also C) issue here:
Compactor cannot just ignore broken or missing block. In most cases this leads to broken compaction, because vanilla TSDB compaction is bit fragile. It requires continue in-order stream of blocks. I am hoping for this to happen: prometheus-junkyard/tsdb#90 and that will make maybe Compactor more resilient. I think at some point I will just jump on it if no one will do.

@asbjxrn
Copy link

asbjxrn commented Jun 28, 2018

Aside: I think this relates to #318 as it's about how to handle download errors.

Without knowing anything about the TSDB format or how compaction works, I'm probably oversimplifying here but:

I think the sane logic for the compactor/store is to treat it like the partial block is not there at all. On a filesystem I would have wanted to put all data into a temporary directory and move that directory into place in one atomic operation once everything is there. I don't think cloud storage support this but they do guarantee that partial file uploads doesn't happen(?). So to mimic the move, treat any missing file as indicating "the move" not happenening yet.

A) I think this is only possible to know for the sidecar that was uploading it? And that sidecar may be down for maintenance and unable to provide any answer. I guess the cluster could look at the external labels in the chunks, and see if any members of the cluster have the same labels and try to detect it that way but it seems fragile.

B) I would argue that the response to both situations is the same for the compactor/store: Wait until the next run of the compactor/pretend the partial block is not there. The sidecar/ruler should be responsible for fixing partial uploads. It should not have marked the block as uploaded in thanos.shipper.json since that only happens after all the files are successfully uploaded so this should happen automatically. To clean up after cases where the servers are gone for good, a rule could be added to the "bucket validate" command that it deletes partial buckets that are more than (insert too long timerange to be useful any more) old.

C) Does handling of a partially uploaded block need to differ from how blocks that have not started uploading at all are handled? Consider the case where the thanos sidecar has been down for maintenance. If it is turned on again just before compaction is happening we get a partial upload, if it's turned on again 5 minutes later no data has yet been uploaded at all.

@bwplotka
Copy link
Member Author

Dup of #298

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants