Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle (ignore) partial uploads on downsampling and apply retention cycles #1335

Closed
bwplotka opened this issue Jul 18, 2019 · 9 comments · Fixed by #1937
Closed

Handle (ignore) partial uploads on downsampling and apply retention cycles #1335

bwplotka opened this issue Jul 18, 2019 · 9 comments · Fixed by #1937

Comments

@bwplotka
Copy link
Member

AC:

  • Meta files sync should ignore not-complete blocks for downsampling and retention cycle.
  • Removal of malformed block if any is ONLY on compaction cycle.

We have nice concurrent logic here: https://github.com/improbable-eng/thanos/blob/10f84a7d7f06813fda97e17d609a9969fcda037f/pkg/compact/compact.go#L169

We did not apply it here: https://github.com/improbable-eng/thanos/blob/10f84a7d7f06813fda97e17d609a9969fcda037f/cmd/thanos/downsample.go#L145

And not on retention: https://github.com/improbable-eng/thanos/blob/1d582af96b2cd412ade46dcfa07e6d40ffb3c176/pkg/compact/retention.go#L16

@bwplotka
Copy link
Member Author

It fixes compactor crashes like

level=error ts=2019-07-15T21:09:56.108249354Z caller=main.go:182 msg="running command failed" err="error executing compaction: first pass of downsampling failed: retrieve bucket block metas: get meta for block 01DFVPYR64AGDW009PF99N2YFY: The specified key does not exist."

This is because when doing sync and sidecar uploading block we might hit block with no meta.json.

@lx223
Copy link
Contributor

lx223 commented Jul 19, 2019

Hey @bwplotka I can help with this.

@bwplotka
Copy link
Member Author

bwplotka commented Jul 21, 2019

oh yes please @lx223

Essentially we have 3 4 different sync implementation. I think we should have nice one meta syncer everyone can reuse: compactor for compating/downsampling/retention but also store gateway.

@bwplotka
Copy link
Member Author

bwplotka commented Aug 5, 2019

Any progress @lx223 ?

@lx223
Copy link
Contributor

lx223 commented Aug 8, 2019

sorry for the delay; have been busy with work. should have something up for review soon.

@lx223
Copy link
Contributor

lx223 commented Aug 11, 2019

PR: #1394 PTAL

@bwplotka
Copy link
Member Author

@bwplotka
Copy link
Member Author

It's getting quite important to get this in, another point of failure:

@bwplotka
Copy link
Member Author

#1394 was our first attempt and can be used as a base (: Will get into this in free time, unless there are other volunteers (:

bwplotka added a commit that referenced this issue Jan 3, 2020
This replaces man 4 inconsistent meta.json syncs places in other components.

Fixes: #1335
Fixes: #1919
Fixes: #1300

* One place for sync logic for both compactor and store
* Corrupted disk cache for meta.json is handled gracefully.
* Blocks without meta.json are handled properly for all compactor phases.
* Synchronize was not taking into account deletion by removing meta.json.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Better observability for syncronize process.
* More logs for store startup process.
* Remove Compactor Syncer.
* Added metric for partialUploadAttempt deletions.
* More tests.

TODO in separate PR:
* More observability for index-cache loading / adding time.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 3, 2020
This replaces man 4 inconsistent meta.json syncs places in other components.

Fixes: #1335
Fixes: #1919
Fixes: #1300

* One place for sync logic for both compactor and store
* Corrupted disk cache for meta.json is handled gracefully.
* Blocks without meta.json are handled properly for all compactor phases.
* Synchronize was not taking into account deletion by removing meta.json.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Better observability for syncronize process.
* More logs for store startup process.
* Remove Compactor Syncer.
* Added metric for partialUploadAttempt deletions.
* More tests.

TODO in separate PR:
* More observability for index-cache loading / adding time.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 3, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 3, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 3, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 3, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 6, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 6, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Jan 6, 2020
Fixes: #1335
Fixes: #1919
Fixes: #1300

* Clean up of meta files are now started only if block which is being uploaded is older than 2 days (only a mitigation).
* Blocks without meta.json are handled properly for all compactor phases.
* Prepare for future implementation of https://thanos.io/proposals/201901-read-write-operations-bucket.md/
* Added metric for partialUploadAttempt deletions and delayed it.
* More tests.


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants