compact: Get rid of `syncDelay` and handle eventual consistency & partial upload differently. #377

bwplotka · 2018-06-15T11:30:25Z

syncDelay in compactor was used to mitigate two problems:

partial upload
eventual consistency across uploads

syncDelay in theory was fine because we just assumed that new blocks are delayed, but it causes lot's of overlaps in blocks because "new" block is based on ULID (creation time) which not necessarily meant new time range. Example failure scenario for this was:

compactor compacts blocks for some time range [t-100, t-50) (let's say block 'A'). The new block 'A' is new so sync delay is keeping it invisible for compactor. Some fresh block 'B' (after sync delay) appears in compactor for time range [t-20, t-10). It can happen that compactor assumes that time range A is simply missing and is irreversibly compacting blocks without A. Once A after sync delay is visible again, we got Overlap.

So we fixed it by additional condition is new block after syncDelay || compactionLevel > 1. This is inconsistent, because if the object store is NOT strongly consistent we can have overlaps anyway(!).

With new PR for auto repairs #375, we can have compactionLevel == 1 and still having broken compaction because newly repaired block needs to avoid syncDelay violating "eventual consistency handling". We supports that by adding Source meta to ThanosMeta.

There are various options how to fix this though and REMOVE syncDelay workaround totally. Will edit this issue to describe them.

The text was updated successfully, but these errors were encountered:

asbjxrn · 2018-06-27T14:20:52Z

A related scenario is when a upload is in progress, so the sidecar has only uploaded the chunks and possibly the index file of a block.

Currently when the compactor comes across a partially uploaded block, the whole compaction process is aborted.

One example scenario for this is that a server crashes during the upload of the index file. In this case no compaction or downsampling can be done until the server is back up and the upload is reattempted and completed, or the block directory is manually cleaned up.

bwplotka · 2018-06-27T18:49:56Z

Yes indeed.

What do you think the sane logic should do in case of partially uploaded block?

I think we need to divide this problem (and conquer it):

A) First of all we need to detect in what state block is. Is it partially uploaded because it is being uploaded right now? (a) Or is it partially uploaded because of any disaster happening? (b)

B) The second question is what to do for after we detect what is going on:
For first case (a) we should just wait.
For the latter (b) the source of the block should detect the problem and heal this block (upload the correct one, or continue what was not uploaded). If source (sidecar or ruler) does not have the data anymore, well - log, increment metric about missing data and remove it.

There is also C) issue here:
Compactor cannot just ignore broken or missing block. In most cases this leads to broken compaction, because vanilla TSDB compaction is bit fragile. It requires continue in-order stream of blocks. I am hoping for this to happen: prometheus-junkyard/tsdb#90 and that will make maybe Compactor more resilient. I think at some point I will just jump on it if no one will do.

asbjxrn · 2018-06-28T02:56:48Z

Aside: I think this relates to #318 as it's about how to handle download errors.

Without knowing anything about the TSDB format or how compaction works, I'm probably oversimplifying here but:

I think the sane logic for the compactor/store is to treat it like the partial block is not there at all. On a filesystem I would have wanted to put all data into a temporary directory and move that directory into place in one atomic operation once everything is there. I don't think cloud storage support this but they do guarantee that partial file uploads doesn't happen(?). So to mimic the move, treat any missing file as indicating "the move" not happenening yet.

A) I think this is only possible to know for the sidecar that was uploading it? And that sidecar may be down for maintenance and unable to provide any answer. I guess the cluster could look at the external labels in the chunks, and see if any members of the cluster have the same labels and try to detect it that way but it seems fragile.

B) I would argue that the response to both situations is the same for the compactor/store: Wait until the next run of the compactor/pretend the partial block is not there. The sidecar/ruler should be responsible for fixing partial uploads. It should not have marked the block as uploaded in thanos.shipper.json since that only happens after all the files are successfully uploaded so this should happen automatically. To clean up after cases where the servers are gone for good, a rule could be added to the "bucket validate" command that it deletes partial buckets that are more than (insert too long timerange to be useful any more) old.

C) Does handling of a partially uploaded block need to differ from how blocks that have not started uploading at all are handled? Consider the case where the thanos sidecar has been down for maintenance. If it is turned on again just before compaction is happening we get a partial upload, if it's turned on again 5 minutes later no data has yet been uploaded at all.

bwplotka · 2018-12-18T18:15:34Z

Dup of #298

bwplotka added feature request/improvement difficulty: hard labels Jun 15, 2018

bwplotka mentioned this issue Jun 28, 2018

store (s3, gcs): invalid memory address or nil pointer dereference #335

Closed

asbjxrn mentioned this issue Jul 11, 2018

sidecar: Allow Thanos backup when local compaction is enabled #206

Open

bwplotka mentioned this issue Nov 8, 2018

compactor is in infinite loop when broken block #621

Closed

bwplotka mentioned this issue Dec 18, 2018

Cannot access thanos metrics #688

Closed

bwplotka closed this as completed Dec 18, 2018

bwplotka mentioned this issue Dec 31, 2018

Added proposal for Read-Write coordination-free bucket operation contract #700

Merged

bwplotka added this to To do in v0.4.0 via automation Jan 14, 2019

bwplotka removed this from To do in v0.4.0 Apr 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compact: Get rid of `syncDelay` and handle eventual consistency & partial upload differently. #377

compact: Get rid of `syncDelay` and handle eventual consistency & partial upload differently. #377

bwplotka commented Jun 15, 2018

asbjxrn commented Jun 27, 2018

bwplotka commented Jun 27, 2018 •

edited

Loading

asbjxrn commented Jun 28, 2018 •

edited

Loading

bwplotka commented Dec 18, 2018

compact: Get rid of syncDelay and handle eventual consistency & partial upload differently. #377

compact: Get rid of syncDelay and handle eventual consistency & partial upload differently. #377

Comments

bwplotka commented Jun 15, 2018

asbjxrn commented Jun 27, 2018

bwplotka commented Jun 27, 2018 • edited Loading

asbjxrn commented Jun 28, 2018 • edited Loading

bwplotka commented Dec 18, 2018

compact: Get rid of `syncDelay` and handle eventual consistency & partial upload differently. #377

compact: Get rid of `syncDelay` and handle eventual consistency & partial upload differently. #377

bwplotka commented Jun 27, 2018 •

edited

Loading

asbjxrn commented Jun 28, 2018 •

edited

Loading