Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidecar: Add /api/v1/flush endpoint #7358

Closed
wants to merge 42 commits into from

Conversation

Nashluffy
Copy link

@Nashluffy Nashluffy commented May 14, 2024

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Adds a sidecar API with one endpoint: /api/v1/flush which calls the TSDB snapshot endpoint on the prometheus instance, then uploads all not-already-present blocks in the snapshot to object store.

There are a few issues that explain the motivation:

Essentially if this is the last time sidecar will be running (ie. cluster is being deleted, shard being removed, etc...) then without some flushing mechanism you will permanently lose up to 2 hours of data.

Verification

Beside the unit tests, running prometheus locally and calling the endpoint works as expected.

image

Nashluffy and others added 29 commits May 14, 2024 17:25
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
If the prometheus that belongs to a sidecar is down we dont need to
query the sidecar. This PR makes it so that we take the sidecar out of
the endpoint set then. We do the same for all other store APIs by
retuning an error in the info/Info gRPC call if they are marked as not
ready.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
…nos-io#7305)

* Query|Receiver|Store: Do not log full request on ProxyStore by default

We had a problem on our production where a sudden increase in requests with long matchers was putting our receivers under a lot of pressure.
Upon checking profiles we saw that the problem was calls to Log()

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Adding changelog

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
* *: Updating hashicorp LRU cache to v2

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Adding some new comments regarding removing complexity of TTL

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Using new version everywhere

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* rephrase the comment

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Remove a long-standing TODO item in the code - let's use the great loser
tree implementation by Bryan. It is faster than the heap because less
comparisons are needed. Should be a nice improvement given that the heap
is used in a lot of hot paths.

Since Prometheus also uses this library, it's tricky to import the "any"
version. I tried doing bboreham/go-loser#3 but
it's still impossible to do that. Let's just copy/paste the code, it's
not a lot.

Bench:

```
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/store
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
             │   oldkway   │               newkway               │
             │   sec/op    │    sec/op     vs base               │
KWayMerge-16   2.292m ± 3%   2.075m ± 15%  -9.47% (p=0.023 n=10)

             │   oldkway    │               newkway               │
             │     B/op     │     B/op      vs base               │
KWayMerge-16   1.553Mi ± 0%   1.585Mi ± 0%  +2.04% (p=0.000 n=10)

             │   oldkway   │              newkway               │
             │  allocs/op  │  allocs/op   vs base               │
KWayMerge-16   27.26k ± 0%   26.27k ± 0%  -3.66% (p=0.000 n=10)
```

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Batch TSDB Infos for bucket store for blocks with overlapping ranges.

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: mluffman <nashluffman@gmail.com>
…io#7310)

* Proxy: acceptance test for proxy store with replica labels

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Stores: handle replica labels in label_value and label_names grpcs

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

---------

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Kartikay <kartikay_2101ce32@iitp.ac.in>
Signed-off-by: mluffman <nashluffman@gmail.com>
This commit adds a resource_attributes field to the OTLP tracing configuration.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
For thanos-io#6775, it would be useful
to know the exact block IDs to aid debugging.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: mluffman <nashluffman@gmail.com>
Adding a minimal test case for issue thanos-io#6775 - reproduces the panic in the
compactor.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
This commit adds a new tracing span for remotely delegated queries
with attributes related to the query and remote engine.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
* Adding repro case for broken query with distributed engine

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Fixing problem with distributed queries and xfunctios

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Adding support for extended functions in tenancy enforcement

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Moving custom parser to new package

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* fixing go-lint

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Using same opts and reorganize imports

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* fixing problem with query format

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* fixing flaky tests

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* removing extra test

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* yet another flaky test

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

---------

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
* rule

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>

* rule-changes

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>

* prettier

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>

* Rebuild

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>

* changes after make react-app

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>

---------

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
When using the exemplars proxy to search for exemplars on receivers, if one receiver had tenants that did not match the selector on the external label it would get
skipped completely even if it had a tenant that actually matched

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
pedro-stanaka and others added 12 commits May 14, 2024 17:25
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
* Update minio-go to v7.0.70

Add support for EKS Pod Identity
fix issue: thanos-io#7157

Signed-off-by: farhad <eqfarhad@gmail.com>

* Changelog - support for EKS Pod Identity

Updated changelog

Signed-off-by: farhad <eqfarhad@gmail.com>

---------

Signed-off-by: farhad <eqfarhad@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
thanos-io#7338)

* fixing extended functions support in more places

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Adding new failint for the Parse() method

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Adding new method for ParseMetricSelector

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Fixing missing imports

Extending test to check behavior

More missing imports

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Fixing method name

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Solving references to forbidden functions

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>

* Treating promql validation from ParseExpr

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fixing funcs

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Bumps [webpack](https://github.com/webpack/webpack) from 5.70.0 to 5.91.0.
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](webpack/webpack@v5.70.0...v5.91.0)

---
updated-dependencies:
- dependency-name: webpack
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
* Align tenant pruning according to wall clock.

Pruning a tenant currently acquires a lock on the tenant's TSDB,
which blocks reads from incoming queries. We have noticed spikes in
query latency when tenants get decomissioned since each receiver will
prune the tenant at a different time.

To reduce the window where queries get degraded, this commit makes sure that
pruning happens at predictable intervals by aligning it to the wall clock, similar
to how head compaction is aligned.

The commit also changes the tenant deletion condition to look at the duration
from the min time of the tenant, rather than from the last append time.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Improve tests

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Bumps [ip](https://github.com/indutny/node-ip) from 1.1.5 to 1.1.9.
- [Commits](indutny/node-ip@v1.1.5...v1.1.9)

---
updated-dependencies:
- dependency-name: ip
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
…hanos-io#7348)

Bumps [webpack-dev-middleware](https://github.com/webpack/webpack-dev-middleware) from 5.3.1 to 5.3.4.
- [Release notes](https://github.com/webpack/webpack-dev-middleware/releases)
- [Changelog](https://github.com/webpack/webpack-dev-middleware/blob/v5.3.4/CHANGELOG.md)
- [Commits](webpack/webpack-dev-middleware@v5.3.1...v5.3.4)

---
updated-dependencies:
- dependency-name: webpack-dev-middleware
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
Signed-off-by: mluffman <nashluffman@gmail.com>
@Nashluffy Nashluffy reopened this May 14, 2024
@Nashluffy Nashluffy closed this May 14, 2024
@Nashluffy Nashluffy deleted the flush-endpoint branch May 14, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.