Skip to content

receive: Support backfilling data through remote write (in case of origin got disconnected) #2114

Closed
@sepich

Description

@sepich

Thanos, Prometheus and Golang version used:
v0.10.1

Object Storage Provider:
AWS S3

What happened:
We use thanos-receive to get data from multiple locations.
Due to #1624 we have to use short tsdb.block-duration=15m. So, in case any issue happens and thanos-receive is down for >15m remote prometheuses are accumulating data. When connectivity to thanos-receive reestablishes, prometheuses start to send delayed data - which is great. But thanos-receive won't accept that as timestamp is in the past chunk. So, we're getting holes on graphs.

What you expected to happen:
Thanos-receive is able to upload old data to S3.

Note that we cannot do both - upload to S3 from remote locations and also send to remote-write. Because thanos-compact cannot deduplicate such blobs. Which leads to sum() = x2 etc.

These remote prometheuses are running inside k8s on emptyDir. Switching to just thanos-sidecar (s3 upload) model would lead again to holes on graphs when prometheus pod is restarted. Because sidecar only uploads completed chunk, and we loose WAL in emptyDir on restart. Another option is to use pvc, but this would bind prometheus to some AZ, as EBS are per-AZ entity. And single AZ could be down.

That's why remote-write model looks so competitive for us.

How to reproduce it (as minimally and precisely as possible):
Shutdown working thanos-receive for >2h, then turn it back on. See that prometheuses uploads stale data, but thanos-receive reject it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions