Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA handling for store nodes #199

Closed
fabxc opened this issue Jan 31, 2018 · 7 comments
Closed

HA handling for store nodes #199

fabxc opened this issue Jan 31, 2018 · 7 comments

Comments

@fabxc
Copy link
Collaborator

fabxc commented Jan 31, 2018

Store nodes are currently generally run as a single replica. It's not super critical to have HA in general since several hours or even days of recent data are HA via the Prometheus servers. But for some scenarios it might still be preferable.

Two could simply be deployed and the query node would take care of deduplication/merging just like for Prometheus HA pairs. But unlike Prometheus servers, the underlying data is truly the same in this case and fetching twice the amount is unnecessary overhead.

Some simple logic could be added to the query node to recognize real duplicates (Prometheus HA pairs are actually different through a replica label) and to only query one of them.

@bwplotka
Copy link
Member

bwplotka commented Feb 7, 2018

I think we might need that sooner than later... (: How can we do it easily? Basically we need to tell querier that these X stores are the same thing.. Can we reuse labels field from store Info endpoint?

@dupondje
Copy link

The most basic way would just be the option to add for example --bucketid="xxx" to the storage command.
If the query command notices 2 (or more) buckets with the same ID, it could just take a random one to get its data from instead of all of them.

@deejay1
Copy link
Contributor

deejay1 commented May 16, 2018

For active/passive this could be done using a leader latch protocol and sharing the data downloaded by the leader as it could announce any new downloaded bucket via gossip (for a faster failover) and share it via HTTP/gRPC. This would eliminate the need to fetch the data from an object store directly and allow for the query nodes to have only a single source of truth (the current leader)

@mattbostock
Copy link
Contributor

mattbostock commented May 31, 2018

I'd like to volunteer to take this on. For our use case, downtime caused by the store instance fronting an S3 bucket being rescheduled to another machine is not really palatable.

I'm thinking of an active-active solution, since it avoids some of the complexities around deciding which instance is 'active' and would be more efficient with resources. As store nodes are essentially just caches, I think it should reasonable straightforward to achieve.

While thinking about high availability, we should also consider allowing the store nodes to scale horizontally for very large deployments, effectively allowing horizontal scaling the LRU cache of indices.

I propose:

  • We shard the index cache across multiple store instances.
  • Optionally, we replicate the shards to provide high availability for a single shard - though by having multiple shards, we can already improve overall availability and reduce the time to recovery.

Just an idea: If we have multiple shards, we might simplify the store instances by avoiding persisting the cache to disk, since the amount of data to pull from object storage would be reduced by 1/n where n is the number of shards when restarting a singe instance (though recovery would be slow if all instances are restarted).

@bwplotka
Copy link
Member

bwplotka commented Jun 1, 2018

@mattbostock Thanks!

It all works for one assumption: Thanos setup has only bucket to take data from, are we ok with it? I have seen some use cases for multiple buckets connected to same Thanos "cluster/network/setup", because "it is easier to manage", "my object storage is specific" etc. Maybe that's separate issue, but woth to be aware of this while implementing HA.

We shard the index cache across multiple store instances.

Makes sense, just I would love to hear/see more about the implementation details. As you suggested offline: https://godoc.org/github.com/golang/groupcache sound nice but it means that we are talking about sharding fully on stores (you ask whatever store and it gives you correct answer 100% time even if it needs to ask its peers) or maybe we want thanos-query to be aware of store sharding? Also are we are talking about sharding index cache based on... what? On matchers 0.o? __name__ only? what if someone asks for __name__~=.*?

though by having multiple shards, we can already improve overall availability and reduce the time to recovery.

Totally agree and thanks for example 👍 However, I would start from something simple first - just replicating (so true HA), because that is what you need (from you what you say). This will enable horizontal scaling (will offload single store) and potentially improve performance as well. Just sharding will ONLY improve the availability (but will still have some major disruption time), regarding the performance it is hard to say without #346 (which is in progress).

@mattbostock
Copy link
Contributor

mattbostock commented Jul 4, 2018

Added a proposal for high-availability for store instances here: #404

@bwplotka
Copy link
Member

This can be solved by just by running multiple of Store Gateways behind any Loadbalancer (like Kuberentes Service) and without gossip.

fpetkovski added a commit to fpetkovski/thanos that referenced this issue Jan 26, 2024
…hanos-io#199)

* Replace summary in extprom metrics with histogram (thanos-io#6327)

* Replaced summary in extprom metrics with histogram

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

* Added changelog

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

* Removed unused parameters from NewInstrumentationMiddleware

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

* Reverted NewInstrumentationMiddleware

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

---------

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>

* Avoid expensive log.Valuer evaluation for disallowed levels (thanos-io#6322)

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Fix inconsistent error for series limits in Store API (thanos-io#6330)

* store: fix inconsistent error for series limits

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* update changelog

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Update pkg/store/bucket.go

Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* Update pkg/store/bucket.go

Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* rename labelValues serires liimiter test function

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* *: remove unmaintained gzip library (thanos-io#6332)

Switch from nytimes gzip library to the klaustpost's gzip code. The old
gzip HTTP handler shows up a lot in allocs so that's how I ended up
doing this change.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Traces sampler env var (thanos-io#6306)

* Issue#5947 OTEL_TRACES_SAMPLER env var

Signed-off-by: shayyxi <shazi12384@gmail.com>

* Test correction

Signed-off-by: shayyxi <shazi12384@gmail.com>

* doc failure correction. parse float argument correction.

Signed-off-by: shayyxi <shazi12384@gmail.com>

* added the changelog.

Signed-off-by: shayyxi <shazi12384@gmail.com>

* ran make docs to fix the build failure.

Signed-off-by: shayyxi <shazi12384@gmail.com>

* corrected the incorrect change in tools.md

Signed-off-by: shayyxi <shazi12384@gmail.com>

* fixed review comments.

Signed-off-by: shayyxi <shazi12384@gmail.com>

---------

Signed-off-by: shayyxi <shazi12384@gmail.com>
Signed-off-by: Shazi <42436533+shayyxi@users.noreply.github.com>
Co-authored-by: shayyxi <shazi12384@gmail.com>

* query: use storepb.SeriesServer (thanos-io#6334)

Use storepb.SeriesServer instead of the concrete struct. This allows
implementing functionality on top of the proxy.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* cacheutil: upgrade `rueidis` to v1.0.2 to improve error handling while shrinking a redis cluster. redis/rueidis#209 (thanos-io#6342)

* use github.com/onsi/gomega/gleak to detect goroutine leak with timeout

Signed-off-by: Rueian <rueiancsie@gmail.com>

* fix: spelling errors DoInSpanWtihErr to DoInSpanWithErr (thanos-io#6345)

Signed-off-by: aimuz <mr.imuz@gmail.com>

* Return grpc code resource exhausted for byte limit error (thanos-io#6325)

* return grpc code resource exhausted for byte limit error

Signed-off-by: Ben Ye <benye@amazon.com>

* fix lint

Signed-off-by: Ben Ye <benye@amazon.com>

* update partial response strategy

Signed-off-by: Ben Ye <benye@amazon.com>

* fix limit

Signed-off-by: Ben Ye <benye@amazon.com>

* try to fix tests

Signed-off-by: Ben Ye <benye@amazon.com>

* fix test error message

Signed-off-by: Ben Ye <benye@amazon.com>

* fix test

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>

* Expose info for each TSDB

This commit exposes the label set alongside the min and max time
for each TSDB covered by a Store.

This information is used to scope the min time for a remote query
so that we do not produce partial aggregates in distriuted mode.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add test case for proxy store

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Bump promql-engine to fix thanos-io/promql-engine#239 (thanos-io#6349)

Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>

* Updates busybox SHA (thanos-io#6365)

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com>

* Query: Add +Inf bucket to query duration metrics  (thanos-io#6358)

* Query: Add +Inf bucket to query duration metrics

For the query duration metrics
(`thanos_store_api_query_duration_seconds`), we record query respond
latency, based on the size of the query (samples/series), and save to a
histogram.

However, when a query is made which exceeds the biggest sample/serie
size, we would prior to this commit, put the request into the largest
bucket.

With this commit, we instead create an `+Inf` bucket, and put requests
which are larger than the biggest defined bucket into that. This gives
more accurate results, and also allow one to see if the bucket sizes are
incorrectly sized.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

* Tests: Mutex around non-thread safe random source

When creating test blocks, we use a non-thread safe random source, in
multiple goroutines. Due to this, tests would sometime panic.

This commits puts a mutex around calls using the same source, in order
to avoid this.

This should hopefully improve reliability of e2e tests.

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

---------

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

* e2e(query): Reproduce dedup issue from thanos-io#6257

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add dedup e2e test for Receive

With internal and external labels support.

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Simplify generated blocks for query test

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Improve query dedup test

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Write a query test for dedup with sidecar

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Refactor query dedup test with sidecar

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix Receive query test

Now it properly ensures the double dedup works (on internal and external labels).

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix receive drawing

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add one extra test caes for query dedup from store

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Complement test for Receive query with dedup

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Complement test for Sidecar query dedup

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Expected failure of block label query dedup tests

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rerun CI

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rerun CI

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Check context when expanding postings (thanos-io#6363)

* check context when expanding postings

Signed-off-by: Ben Ye <benye@amazon.com>

* update changelog

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>

* ui: only keep name in store_matches param (thanos-io#6371)

We are doing store matching on the `name` field hence only keep that
field in the URL because otherwise the URL could get quite lengthy with
external labelsets inside of it.

Besides unit tests, I have also tested locally:
- Enable store filtering;
- Select store(-s);
- Copy/paste URL into the new tab and see that the same stores are
  loaded like expected;
- See that URL only has names in them.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* docs: replace --store with --endpoint

Replace deprecated `--store` with `--endpoint` in docs.

Signed-off-by: Paul Gier <paul.gier@datastax.com>

* Optimizing "grafana generated" regex matchers (thanos-io#6376)

* Opmizing Group Regex

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* fixing native histogram tests

Signed-off-by: Alan Protasio <alanprot@gmail.com>

---------

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Cache: various index cache client improvements (thanos-io#6374)

* Query Explanation (thanos-io#6346)

* Return Query Explaination in QueryAPI

A param `explain` is added to QueryAPI, if true then explanation
returned by the `Explain()` method of the query having structure
`ExplainOutputNode` is returned in response.
Query Explanation is added under new field in response that is
`thanosInfo`.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Add explain checkbox in thanos UI

A explain checkbox is added to Thanos Query UI, that requests for
query explanation from thanos query api.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Add ExpandableNode Component

ExpandableNode component renders Query Explanation in the thanos
UI. Requires a new package `react-accessible-treeview`.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Disable Explain checkbox on prometheus engine

Prometheus engine sends out error if toggle explain button. To
provide better experience, the explain checkbox get disbaled on
switching to prometheus engine and enable back on switching to
thanos engine.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Add alert box with horizontal scrolling for Explanation

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Remove ExpandableNode and Add ListTree

Updates the design for query explanation box, removes
`ExpandableNode` and the dependency. Builts a new `ListTree` that
does the same using reactstrap and custom css.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Minor refactor in Query API response

`thanosInfo` is removed from Query reponse and used `explanation`
directly. `disableCheckbox` is also renamed to
`disableExplainCheckbox` in thanos UI.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Update UI tests to passing

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Minor UI changes and test fix

UI improvements and Panel test fix other way around, resetting
the results on panel construction.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Update promql-engine to use Explain method

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Build UI assets

Build UI assets, that runs new thanos UI with explain button.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Revert proxy url change from package.json

`proxy` was accidently changed and committed with package.json
when removed dependency. Hence, reverting it back.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Minor changes in UI

Fix requested changes in UI.
- Rename `state` and `setState` to `mapping` and `setMapping`.
- Rename `NodeTree` to `QueryTree`.
- Use unicode characters instead of `-` and `+`.
- Fix blue box on explain button.

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Update UI assets

Signed-off-by: Pradyumna Krishna <git@onpy.in>

---------

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Implementing Regex optimization on the `MatchNotRegexp` and `MatchNotEqual` matcher type (thanos-io#6379)

* Implementing Regex optimization on the MatchNotRegexp matcher type

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Opmizing MatchNotEqual

Signed-off-by: Alan Protasio <alanprot@gmail.com>

---------

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Put back the correct makefile

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Remove extra line that broke untouched test

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add back line break at end of makefile

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix Receive single ingestor test

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Reproduce dedup issue in Receive

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add even more test cases for dedup on store gw

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Reproduce dedup bug in Sidecar

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Reuse nginx image name

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Let all users read the metrics file from static metrics server

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rerun CI

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Rerun CI

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Reformat asciiflow chart

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Reuse static metrics server from e2e framework

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* add de-cix as adopter (thanos-io#6386)

Signed-off-by: Raul Garcia Sanchez <info@raulgarcia.de>

* [chore] Updating Query Engine and Prometheus (thanos-io#6392)

* Updating Query Engine

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* fix prometheus breaking change

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Update prometheus with prometheus/prometheus#12387

Signed-off-by: Alan Protasio <alanprot@gmail.com>

---------

Signed-off-by: Alan Protasio <alanprot@gmail.com>

* Receive: Allow specifying tenant-specific external labels in RouterIngestor (thanos-io#5777)

Signed-off-by: haanhvu <haanh6594@gmail.com>

* check context cancel when doing posting batches (thanos-io#6396)

Signed-off-by: Ben Ye <benye@amazon.com>

* Expose store gateway query stats in series response hints (thanos-io#6352)

* expose query stats hints

Signed-off-by: Ben Ye <benye@amazon.com>

* update

Signed-off-by: Ben Ye <benye@amazon.com>

* add query stats hints in result

Signed-off-by: Ben Ye <benye@amazon.com>

* update changelog

Signed-off-by: Ben Ye <benye@amazon.com>

* add merge method

Signed-off-by: Ben Ye <benye@amazon.com>

* fix unit test

Signed-off-by: Ben Ye <benye@amazon.com>

modify hints proto

Signed-off-by: Ben Ye <benye@amazon.com>

fix unit test

Signed-off-by: Ben Ye <benye@amazon.com>

update format

Signed-off-by: Ben Ye <benye@amazon.com>

* update comments

Signed-off-by: Ben Ye <benye@amazon.com>

* try again

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>

* receive: make az aware ketama hashring (thanos-io#6369)

* receive: make az aware ketama hashring

Signed-off-by: Alexander Rickardsson <alxric@aiven.io>

* receive: pass endpoints in hashring config as object

Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io>

* receive: add some tests for consistent hashing in presence of AZs

Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io>

* receive,docs: add migration note for az aware hashring

Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io>

---------

Signed-off-by: Alexander Rickardsson <alxric@aiven.io>
Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io>
Co-authored-by: Michael Hoffmann <michael.hoffmann@aiven.io>

* Proposal: query path tenancy  (thanos-io#6320)

* Add 1st version of query path tenancy proposal

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Update proposal after initial feedback

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add cool picture

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Include example in cross tenant query complications

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Improve reasoning for why not using the QFE

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Improve writing in "How" section

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix owner profile link

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Address few more PR review comments

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Address feedback on flag name text

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Update diagram

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Improve non-goals text

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Update diagram

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Update docs/proposals-accepted/202304-query-path-tenancy.md

Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Clarify scenario for pitfalls of current solution

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Clarify that Store doesn't care about tenant label

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add an action plan

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Mention alternative idea of modifying Store API

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Fix typo

Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Address lots of feedback on the proposal

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Format query path tenancy proposal doc

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add a "Tenancy Model" subsection to "Goals"

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Mention header semanthics in comparison with gRPC message field

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Improve action plan structure and writing

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

---------

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* Fix double-counting bug in http_request_duration metric (thanos-io#6399)

* fix double-counting bug in http_request_duration metric

Signed-off-by: 4orty <kwk5178@gmail.com>

* Update Changelog

Signed-off-by: 4orty <kwk5178@gmail.com>

---------

Signed-off-by: 4orty <kwk5178@gmail.com>

* Updates busybox SHA (thanos-io#6403)

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com>

* Fix series stats merge (thanos-io#6408)

* fix series stats merge

Signed-off-by: Ben Ye <benye@amazon.com>

* update license header

Signed-off-by: Ben Ye <benye@amazon.com>

* use reflect

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>

* Receive: allow unlimited head_series_limit tenants (thanos-io#6406)

With this commit we now allow to configure tenants with unlimited active
series limit by setting the limit to `0`. Prior to this commit setting a
per tenant limit to `0` would cause the tenant to be unable to write any
metrics at all.

This fixes: thanos-io#6393

Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>

* expose downloaded data size in query hints (thanos-io#6409)

Signed-off-by: Ben Ye <benye@amazon.com>

* maintainers: add myself to triagers (thanos-io#6414)

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Add `@douglascamata` to triagers (thanos-io#6418)

Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>

* Add Blog (thanos-io#6411)

* Add LFX blog

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Add Headers to blog

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* Lint blog

Signed-off-by: Pradyumna Krishna <git@onpy.in>

---------

Signed-off-by: Pradyumna Krishna <git@onpy.in>

* blog: Fix images for LFX post (thanos-io#6422)

* blog: Fix images for LFX post

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* fix lint

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Index Cache: Change cache key for postings (thanos-io#6405)

* extend postings cache key with codec

Signed-off-by: Ben Ye <benye@amazon.com>

* add changelog

Signed-off-by: Ben Ye <benye@amazon.com>

* update code back

Signed-off-by: Ben Ye <benye@amazon.com>

* add colon

Signed-off-by: Ben Ye <benye@amazon.com>

* update changelog

Signed-off-by: Ben Ye <benye@amazon.com>

* fix another test

Signed-off-by: Ben Ye <benye@amazon.com>

* add compression scheme const to remote index cache

Signed-off-by: Ben Ye <benye@amazon.com>

* address required comments

Signed-off-by: Ben Ye <benye@amazon.com>

* fix compression scheme name

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>

* Receive: upgrading logs for failed uploads to error (thanos-io#6427)

* FIX: upgrading log for failed upload to error

Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com>

* docs: added changelog entry

Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com>

---------

Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com>

* fix postings test

Signed-off-by: Ben Ye <benye@amazon.com>

* Add aiven as adopter... more soon! (thanos-io#6430)

Signed-off-by: Jonah Kowall <jkowall@kowall.net>

* Report gRPC connnection errors to the caller (thanos-io#6428)

By default `grpc.DialContext()` is non-blocking so any connection issue
will not be surfaced to the user. This change makes it blocking and
configures the gRPC dialer to report the underlying error if any
happens.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* chore: remove duplicated `gopkg.in/fsnotify.v1` dep (thanos-io#6432)

* chore: remove duplicated `gopkg.in/fsnotify.v1` dep

`github.com/fsnotify/fsnotify` and `gopkg.in/fsnotify.v1` are the same
dependency. We can keep `github.com/fsnotify/fsnotify` and remove
`gopkg.in/fsnotify.v1`.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* docs: add changelog

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

---------

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* Expose estimated chunk and series size as configurable options (thanos-io#6426)

* expose estimated chunk and series size as configurable options

Signed-off-by: Ben Ye <benye@amazon.com>

* fix lint

Signed-off-by: Ben Ye <benye@amazon.com>

* fix test

Signed-off-by: Ben Ye <benye@amazon.com>

* fix test

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>

* Receive: make tsdb stats limit configurable (thanos-io#6437)

* Receive: make tsdb stats limit configurable

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* Receive: make tsdb stats limit configurable

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

---------

Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>

* *: wire new Engine/Explain fields in query-frontend (thanos-io#6433)

- Pass Engine/Explain fields in query-frontend codecs
- Add Engine field to QFE cache key
- Add e2e tests for all cases

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* index cache: Cache expanded postings (thanos-io#6420)

* cache expanded postings in index cache

Signed-off-by: Ben Ye <benye@amazon.com>

* update changelog

Signed-off-by: Ben Ye <benye@amazon.com>

* fix

Signed-off-by: Ben Ye <benye@amazon.com>

* fix lint

Signed-off-by: Ben Ye <benye@amazon.com>

* rebase main and added compression name to key

Signed-off-by: Ben Ye <benye@amazon.com>

* update key

Signed-off-by: Ben Ye <benye@amazon.com>

* add e2e test for memcached

Signed-off-by: Ben Ye <benye@amazon.com>

* fix cache config

Signed-off-by: Ben Ye <benye@amazon.com>

* address review comments

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>

* add approximate series size to index stats (thanos-io#6425)

Signed-off-by: Ben Ye <benye@amazon.com>

* index stats: fix chunk size calculation (thanos-io#6424)

Signed-off-by: Ben Ye <benye@amazon.com>

* Remove some unused Cortex vendored code and metrics (thanos-io#6440)

* Fixed DefaultPromConfig

* Fixed imports

* Back to diffVarintSnappyEncode

* Merge pull request thanos-io#180 from Shopify/optimize-timerange-calculation

Cache calculated mint and maxt for each remote engine

* Updated busybox

* fixing lint

* Fixing merge conflict

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Fixing missing import

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fix lint again

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* resolving conflict merges

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Fixed import and fn order

* Fixed unit tests

* Updated promdoc.sum

* Back to custom promql engine

* Removed custom promql engine and moved to latest upstream

* Ran go mod tidy

* Fixed GetQueryAPIClients

* Store: fix crash on empty regex matcher

Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io>

---------

Signed-off-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: shayyxi <shazi12384@gmail.com>
Signed-off-by: Shazi <42436533+shayyxi@users.noreply.github.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: aimuz <mr.imuz@gmail.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
Signed-off-by: GitHub <noreply@github.com>
Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Signed-off-by: Paul Gier <paul.gier@datastax.com>
Signed-off-by: Alan Protasio <alanprot@gmail.com>
Signed-off-by: Pradyumna Krishna <git@onpy.in>
Signed-off-by: Raul Garcia Sanchez <info@raulgarcia.de>
Signed-off-by: haanhvu <haanh6594@gmail.com>
Signed-off-by: Alexander Rickardsson <alxric@aiven.io>
Signed-off-by: Michael Hoffmann <michael.hoffmann@aiven.io>
Signed-off-by: 4orty <kwk5178@gmail.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Victor Fernandes <victorhbfernandes@gmail.com>
Signed-off-by: Jonah Kowall <jkowall@kowall.net>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Co-authored-by: Sebastian Rabenhorst <4246554+rabenhorst@users.noreply.github.com>
Co-authored-by: Xiaochao Dong <the.xcdong@gmail.com>
Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Shazi <42436533+shayyxi@users.noreply.github.com>
Co-authored-by: shayyxi <shazi12384@gmail.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: aimuz <mr.imuz@gmail.com>
Co-authored-by: Ben Ye <benye@amazon.com>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: fpetkovski <fpetkovski@users.noreply.github.com>
Co-authored-by: Jacob Baungård Hansen <jacobbaungard@redhat.com>
Co-authored-by: Douglas Camata <159076+douglascamata@users.noreply.github.com>
Co-authored-by: Paul Gier <paul.gier@datastax.com>
Co-authored-by: Alan Protasio <alanprot@gmail.com>
Co-authored-by: Pradyumna Krishna <git@onpy.in>
Co-authored-by: Raúl Garcia Sanchez <info@raulgarcia.de>
Co-authored-by: Ha Anh Vu <75315486+haanhvu@users.noreply.github.com>
Co-authored-by: Alexander Rickardsson <alxric@aiven.io>
Co-authored-by: Michael Hoffmann <michael.hoffmann@aiven.io>
Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Co-authored-by: Wonki Kim <kwk5178@gmail.com>
Co-authored-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Victor Hugo Brito Fernandes <victorhbfernandes@gmail.com>
Co-authored-by: Jonah Kowall <jkowall@kowall.net>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Co-authored-by: Eng Zer Jun <engzerjun@gmail.com>
Co-authored-by: Sebastian Rabenhorst <sebastian.rabenhorst@shopify.com>
fpetkovski pushed a commit to fpetkovski/thanos that referenced this issue Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants