receive: Support backfilling data through remote write (in case of origin got disconnected) #2114

sepich · 2020-02-10T13:28:41Z

Thanos, Prometheus and Golang version used:
v0.10.1

Object Storage Provider:
AWS S3

What happened:
We use thanos-receive to get data from multiple locations.
Due to #1624 we have to use short tsdb.block-duration=15m. So, in case any issue happens and thanos-receive is down for >15m remote prometheuses are accumulating data. When connectivity to thanos-receive reestablishes, prometheuses start to send delayed data - which is great. But thanos-receive won't accept that as timestamp is in the past chunk. So, we're getting holes on graphs.

What you expected to happen:
Thanos-receive is able to upload old data to S3.

Note that we cannot do both - upload to S3 from remote locations and also send to remote-write. Because thanos-compact cannot deduplicate such blobs. Which leads to sum() = x2 etc.

These remote prometheuses are running inside k8s on emptyDir. Switching to just thanos-sidecar (s3 upload) model would lead again to holes on graphs when prometheus pod is restarted. Because sidecar only uploads completed chunk, and we loose WAL in emptyDir on restart. Another option is to use pvc, but this would bind prometheus to some AZ, as EBS are per-AZ entity. And single AZ could be down.

That's why remote-write model looks so competitive for us.

How to reproduce it (as minimally and precisely as possible):
Shutdown working thanos-receive for >2h, then turn it back on. See that prometheuses uploads stale data, but thanos-receive reject it.

The text was updated successfully, but these errors were encountered:

squat · 2020-02-10T13:37:02Z

What do you mean exactly by:

What you expected to happen:
Thanos-receive is able to upload old data to S3.

Thanos receive is already able to upload old data so S3. Each instance of Thanos receive in the hashring should use a different replica label. Thanos querier will correctly deduplicate data from this when querying, even when using aggregations likse sum, so no queries should be incorrect.

This sounds like we are trying to solve a symptom of the actual issue in 1624. I would prefer to focus on fixing that so that everyone can use block durations of 2hrs and avoid anything like what you are seeing. What do you think?

sepich · 2020-02-10T13:56:58Z

What do you mean exactly by:
Thanos receive is already able to upload old data so S3

Sorry, I've meant such errors after thanos-receive is down for some time and than back on:

level=debug ts=2020-02-06T14:51:27.634126008Z caller=writer.go:69 component=receive component=receive-writer msg="Out of bounds metric" lset="{__name__=\"container_processes\", instance=...}" sample="timestamp:1581000525667 "
level=warn ts=2020-02-06T14:51:27.635264235Z caller=writer.go:83 component=receive component=receive-writer msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=2

squat · 2020-02-10T14:02:31Z

aha, ok so you mean something like:

What you expected to happen:
Thanos-receive is able to continue ingesting data even after closing and uploading a block without seeing out of bounds errors.

I still think that this is really addressing a symptom rather than the root of the issue. I am not against investigating this avenue but I think it will be more valuable and solve more people's issues if we go to the root and prevent receive from OOMing while replaying the WAL. This way you can have longer block durations and as a result see fewer out of bounds errors.

brancz · 2020-02-10T14:52:14Z

I think there is space for something like this in some shape or form in the prometheus remote write protocol. Essentially the protocol itself should have the ability to even if the backend was down for >2h be able to backfill data that has been flushed to a block from the head block/WAL. I probed some other Prometheus team members and it did seem like there were other parties that would be interested in such a thing.

sepich · 2020-02-14T08:50:15Z

I still think that this is really addressing a symptom rather than the root of the issue.

Unfortunately even with default tsdb.block-duration=2h on thanos-receive side remote-write scheme is unable to recover after being down for >2h.
Logs on prometheus v2.15.2 side:

ts=2020-02-14T08:38:06.999Z caller=dedupe.go:111 component=remote level=error remote_name=f9a55a url=https://xxx/api/v1/receive msg="non-recoverable error" count=100 err="server returned HTTP status 409 Conflict: 3 errors: failed to non-fast add 28 samples: out of bounds: conflict; 409; 409"
ts=2020-02-14T08:38:11.921Z caller=dedupe.go:111 component=remote level=warn remote_name=f9a55a url=https://xxx/api/v1/receive msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1581657554 minSendTimestamp=1581669481

And lastSendTimestamp stays the same in each event, not growing.
On thanos-receive side (3 hashring endpoints with replica=1):

level=warn ts=2020-02-14T08:42:27.873428299Z caller=writer.go:83 component=receive component=receive-writer msg="Error on ingesting samples that are too old or are too far into the future" num_dropped=35
level=error ts=2020-02-14T08:42:27.874889268Z caller=handler.go:347 component=receive component=receive-handler msg="storing locally" err="failed to non-fast add 35 samples: out of bounds: conflict" endpoint=http://thanos-receive-2.thanos-receive.monitoring.svc.cluster.local:19291/api/v1/receive
level=error ts=2020-02-14T08:42:27.884528922Z caller=handler.go:395 component=receive component=receive-handler msg="forwarding returned non-200 status" err=409 endpoint=http://thanos-receive-0.thanos-receive.monitoring.svc.cluster.local:19291/api/v1/receive
level=error ts=2020-02-14T08:42:27.887348733Z caller=handler.go:395 component=receive component=receive-handler msg="forwarding returned non-200 status" err=409 endpoint=http://thanos-receive-1.thanos-receive.monitoring.svc.cluster.local:19291/api/v1/receive

Metric thanos_receive_forward_requests_total dropped after receive being down:

It seems prometheuses sending same old data over and over again, and unable to move on to new data.
I have to manually restart all prometheuses with deleting /wal/* folder content to recover from this.

brancz · 2020-02-14T10:36:10Z

There are some caveats:

The receive component will only ingest data up to 1 hour old.
Prometheus will stop retrying sending data once it's not available in the WAL anymore, and from there only attempt to send the new data newly available in the WAL, so data older than 2h won't be possible to be ingested/backfilled, only within 1 hour.

sepich · 2020-02-14T12:59:08Z

Thank you for such an important information!
I believe this should be mentioned in docs. I'm confirming that setting --storage.tsdb.min-block-duration=15m --storage.tsdb.max-block-duration=15m on prometheus side make the data acceptable by thanos-receive.

Is there a reason on why it is 1h? And not 2h or more for default 2h prometheus WAL?
Maybe i misunderstand your sentence - but issue when started to happen will not fix by itself. I do see in prometheus logs each 2h that WAL is rotated, block written, but lastSendTimestamp stays the same and data is not accepted by thanos-receive for 6h.

brancz · 2020-02-14T18:20:45Z

The heuristic is 0.5x of the smallest block produced and since the default smallest block is 2h, the heuristic determines 1h. Further explained: this is because the WAL accumulates 1.5x the smallest block worth of data, and as it stops accepting writes for 0.5x it can safely flush 1.0x of the head block to disk, without having to worry about appends happening from earlier than that.
Your logs say the receive component returned status code 409, on 4xx errors the Prometheus remote write client drops the requests it's trying to make and goes to the next and tries that. Could you enable debug logs on Prometheus?

stale · 2020-03-15T19:02:44Z

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

Kampe · 2020-10-15T17:57:14Z

Wish this wasn't stale - this is useful to have. Prometheus attempts to write them, Receive should accept them?

bwplotka · 2021-06-04T08:22:57Z

Still valid, and a big chunk of work to make it happen.

Some idea presented already in the past to Prometheus and Cortex teams: Start second, separate WAL just for older series.

stale · 2021-08-04T22:43:25Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

Kampe · 2021-08-24T20:58:10Z

Love to see the stale tag removed <3

bwplotka · 2021-09-01T16:30:03Z

Still valid!

stale · 2022-01-09T11:33:39Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

yeya24 · 2022-01-10T08:24:39Z

Valid

stale · 2022-04-17T04:54:20Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

Kampe · 2022-04-18T23:16:22Z

Seems there's work being done upstream on prometheus for support here: https://github.com/grafana/mimir-prometheus/compare/out-of-order

stale · 2022-09-21T02:13:36Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

yeya24 · 2022-09-21T18:44:43Z

prometheus/prometheus#11075 Finally this is merged into Prometheus so probably we could give it a try.

yeya24 · 2022-11-13T03:48:25Z

Since the OOO feature has been merged into main, I will close this issue. Feel free to reopen if you think it is not addressed.

stale bot added the stale label Mar 15, 2020

stale bot closed this as completed Mar 22, 2020

This was referenced Oct 15, 2020

setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration prometheus-operator/prometheus-operator#1375

Closed

Add an option to pass block Min/Max duration to prometheus prometheus-operator/prometheus-operator#2965

Closed

bwplotka reopened this Jun 4, 2021

stale bot removed the stale label Jun 4, 2021

bwplotka changed the title ~~receive: to support backfill~~ receive: Support backfilling data through remote write (in case of origin got disconnected) Jun 4, 2021

bwplotka added difficulty: hard feature request/improvement help wanted labels Jun 4, 2021

stale bot added the stale label Aug 4, 2021

GiedriusS removed the stale label Aug 11, 2021

stale bot added the stale label Jan 9, 2022

stale bot removed the stale label Jan 10, 2022

stale bot added the stale label Apr 17, 2022

stale bot removed the stale label Apr 18, 2022

stale bot added the stale label Sep 21, 2022

stale bot removed the stale label Sep 21, 2022

yeya24 mentioned this issue Oct 30, 2022

Receive: Add parameter to set out-of-order time window #5839

Merged

2 tasks

yeya24 closed this as completed Nov 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

receive: Support backfilling data through remote write (in case of origin got disconnected) #2114

receive: Support backfilling data through remote write (in case of origin got disconnected) #2114

sepich commented Feb 10, 2020

squat commented Feb 10, 2020

sepich commented Feb 10, 2020

squat commented Feb 10, 2020

brancz commented Feb 10, 2020

sepich commented Feb 14, 2020 •

edited

Loading

brancz commented Feb 14, 2020

sepich commented Feb 14, 2020

brancz commented Feb 14, 2020

stale bot commented Mar 15, 2020

Kampe commented Oct 15, 2020 •

edited

Loading

bwplotka commented Jun 4, 2021

stale bot commented Aug 4, 2021

Kampe commented Aug 24, 2021

bwplotka commented Sep 1, 2021

stale bot commented Jan 9, 2022

yeya24 commented Jan 10, 2022

stale bot commented Apr 17, 2022

Kampe commented Apr 18, 2022

stale bot commented Sep 21, 2022

yeya24 commented Sep 21, 2022 •

edited

Loading

yeya24 commented Nov 13, 2022

receive: Support backfilling data through remote write (in case of origin got disconnected) #2114

receive: Support backfilling data through remote write (in case of origin got disconnected) #2114

Comments

sepich commented Feb 10, 2020

squat commented Feb 10, 2020

sepich commented Feb 10, 2020

squat commented Feb 10, 2020

brancz commented Feb 10, 2020

sepich commented Feb 14, 2020 • edited Loading

brancz commented Feb 14, 2020

sepich commented Feb 14, 2020

brancz commented Feb 14, 2020

stale bot commented Mar 15, 2020

Kampe commented Oct 15, 2020 • edited Loading

bwplotka commented Jun 4, 2021

stale bot commented Aug 4, 2021

Kampe commented Aug 24, 2021

bwplotka commented Sep 1, 2021

stale bot commented Jan 9, 2022

yeya24 commented Jan 10, 2022

stale bot commented Apr 17, 2022

Kampe commented Apr 18, 2022

stale bot commented Sep 21, 2022

yeya24 commented Sep 21, 2022 • edited Loading

yeya24 commented Nov 13, 2022

sepich commented Feb 14, 2020 •

edited

Loading

Kampe commented Oct 15, 2020 •

edited

Loading

yeya24 commented Sep 21, 2022 •

edited

Loading