Existing upload confuses shipper #934

SuperQ · 2019-03-18T11:36:45Z

Thanos, Prometheus and Golang version used

thanos, version 0.3.2 (branch: HEAD, revision: 4b7320c0e45e3f48a437bd19294f569785bafb02)
  build user:       root@e9a9c28f966a
  build date:       20190304-17:11:05
  go version:       go1.11.5

What happened

Shipper didn't write out thanos.shipper.json due to failure to upload a compacted block.

What you expected to happen

Shipper should update thanos.shipper.json every time it uploads a block, not just when it completes a batch sync.

How to reproduce it (as minimally and precisely as possible):

Bucket storage upload is canceled in the middle of a large compacted block upload.

Full logs to relevant components

2019-03-18_10:55:09.69871 level=info ts=2019-03-18T10:55:09.698596569Z caller=shipper.go:375 msg="upload new block" id=01CXQ2RF3N84BSXF8TGGN36ZST
2019-03-18_11:09:03.45688 level=info ts=2019-03-18T11:09:03.456809039Z caller=shipper.go:375 msg="upload new block" id=01CY8ETJP1F9MC3T9EQWFBR884
2019-03-18_11:20:27.20557 level=error ts=2019-03-18T11:20:27.205460156Z caller=shipper.go:342 msg="shipping failed" block=01CY8ETJP1F9MC3T9EQWFBR884 err="upload chunks: upl
oad file /opt/prometheus/prometheus/data/thanos/upload/01CY8ETJP1F9MC3T9EQWFBR884/chunks/000041 as 01CY8ETJP1F9MC3T9EQWFBR884/chunks/000041: context canceled"
2019-03-18_11:20:27.33593 level=info ts=2019-03-18T11:20:27.335859618Z caller=shipper.go:226 msg="gathering all existing blocks from the remote bucket"
2019-03-18_11:23:48.78979 level=error ts=2019-03-18T11:23:48.789680688Z caller=shipper.go:326 msg="found overlap or error during sync, cannot upload compacted block" err="shipping compacted block 01CXQ2RF3N84BSXF8TGGN36ZST is blocked; overlap spotted: [mint: 1543147200000, maxt: 1543730400000, range: 162h0m0s, blocks: 2]: <ulid: 01CXQ2RF3N84BSXF8TGGN36ZST, mint: 1543147200000, maxt: 1543730400000, range: 162h0m0s>, <ulid: 01CXQ2RF3N84BSXF8TGGN36ZST, mint: 1543147200000, maxt: 1543730400000, range: 162h0m0s>"

The text was updated successfully, but these errors were encountered:

bwplotka · 2019-03-19T11:13:25Z

Dived into this bit more.

We don't want to spend too much time & code on this. This is only one time feature. Most of the time user will not use this code path so supporting it in production path code should be limited. Maybe instead we should add this functionality as tool?
Something is weird here:
If your log is from single run:

we iterate over 01CXQ2RF3N84BSXF8TGGN36ZST
then over 01CY8ETJP1F9MC3T9EQWFBR884 and that failed.
somehow we iterate over 01CXQ2RF3N84BSXF8TGGN36ZST again! (what?)

bwplotka · 2019-03-19T11:14:10Z

There is another issue with this -> if you restart sidecar in the middle of process you will have overlap issue as well, but that's another story.

caarlos0 · 2019-06-25T17:10:13Z

I still see this on v0.5.0 + prometheus 2.10.0

level=error ts=2019-06-25T17:09:38.818351347Z caller=shipper.go:310 msg="found overlap or error during sync, cannot upload compacted block" err="shipping compacted block 01DDJAMD7QJE1SAZAWQ2Q7ESR8 is blocked; overlap spotted: [mint: 1560751200000, maxt: 1560902400000, range: 42h0m0s, blocks: 2]: <ulid: 01DDQ3M3085SK8MSYG475ZG7RZ, mint: 1560751200000, maxt: 1560902400000, range: 42h0m0s>, <ulid: 01DDQ3KJMV8WX3MWPYA21Q52WE, mint: 1560751200000, maxt: 1560902400000, range: 42h0m0s>\n[mint: 1560902400000, maxt: 1561075200000, range: 48h0m0s, blocks: 2]: <ulid: 01DDW1HSGA2K8XAJ1A86JK7907, mint: 1560902400000, maxt: 1561075200000, range: 48h0m0s>, <ulid: 01DDW1J9SGY24J25MYPMHMHA8J, mint: 1560902400000, maxt: 1561075200000, range: 48h0m0s>\n[mint: 1561075200000, maxt: 1561248000000, range: 48h0m0s, blocks: 2]: <ulid: 01DE16BG0NVDWPSNQD1X4GVEYG, mint: 1561075200000, maxt: 1561248000000, range: 48h0m0s>, <ulid: 01DE16AZJ4B49AQK38RTR0WZH2, mint: 1561075200000, maxt: 1561248000000, range: 48h0m0s>\n[mint: 1561248000000, maxt: 1561420800000, range: 48h0m0s, blocks: 2]: <ulid: 01DE6B192N2135ZQFS9P5SJE20, mint: 1561248000000, maxt: 1561420800000, range: 48h0m0s>, <ulid: 01DE6B0PSYX6Q6ZP2QJFK3Z0ZG, mint: 1561248000000, maxt: 1561420800000, range: 48h0m0s>"

uploading 300d of old data 💭

happened on another prometheus instance with 40d too...

bwplotka · 2019-06-25T19:51:07Z

@caarlos0 what exactly you want to accomplish? Are you running sidecar with any special flag?

caarlos0 · 2019-06-25T19:59:59Z

I'm running with --shipper.upload-compacted , wanted to upload historical data of an existing instance...

bwplotka · 2019-06-25T20:04:15Z

This is quite manual feature and single time. To make sure it is safe, it errors out instead of assumming something. The best bet is to try understand the error and mititgate. I don't know you case but by first glance it looks like you have blocks for exactly the same timestamp but diffferent ULID. Are you sure that:

You set unique external labels?
There is no global compactor running?

caarlos0 · 2019-06-25T20:12:36Z

You set unique external labels?

yes

There is no global compactor running?

it was running at some point, I stopped it but errors continued...

caarlos0 · 2019-06-25T20:13:25Z

what happened - I think, is that I started it without the flag, stopped, added the flag and started again

maybe it got lost in there?

SuperQ mentioned this issue Mar 18, 2019

Retry on network failures (e.g uploads) #318

Closed

bwplotka added bug help wanted component: sidecar labels Mar 18, 2019

SuperQ mentioned this issue Mar 19, 2019

WIP: Update the shipper metadata after each upload #938

Closed

bwplotka mentioned this issue Mar 19, 2019

sidecar: Handle intermediate restarts of sidecar gracefully. #941

Merged

bwplotka closed this as completed in #941 Mar 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Existing upload confuses shipper #934

Existing upload confuses shipper #934

SuperQ commented Mar 18, 2019

bwplotka commented Mar 19, 2019

bwplotka commented Mar 19, 2019

caarlos0 commented Jun 25, 2019

bwplotka commented Jun 25, 2019

caarlos0 commented Jun 25, 2019

bwplotka commented Jun 25, 2019 •

edited

caarlos0 commented Jun 25, 2019 •

edited

caarlos0 commented Jun 25, 2019

Existing upload confuses shipper #934

Existing upload confuses shipper #934

Comments

SuperQ commented Mar 18, 2019

bwplotka commented Mar 19, 2019

bwplotka commented Mar 19, 2019

caarlos0 commented Jun 25, 2019

bwplotka commented Jun 25, 2019

caarlos0 commented Jun 25, 2019

bwplotka commented Jun 25, 2019 • edited

caarlos0 commented Jun 25, 2019 • edited

caarlos0 commented Jun 25, 2019

bwplotka commented Jun 25, 2019 •

edited

caarlos0 commented Jun 25, 2019 •

edited