-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
querier: Rate over deduplicated counter from many replicas can lead to double reset account. #2401
Comments
Some potential issue is that overlaps are not handled really well: https://github.com/thanos-io/thanos/pull/2400/files |
It can be something related to fact that GitLab is using Store GW in HA without loadbalancer (querying both in same time), so the data is duplicated and unsorted (chunks) for sure. |
@bwplotka We initially reproduced the issue with direct to Prometheus sidecars, not Store GW. These are in HA pairs as well. |
Got some test data from @SuperQ so hopefully will be able to repro locally 🤗 Fingers crossed 🤞 |
Looks like it is dup of #1326 - let's continue discussion there. |
Actually let's not be so sure, this might be different (here deduplication does not cause it) |
Reproduces: #2401 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@SuperQ this repro is so amazing. can explore all details. Definitely we have overlapping and unsorted chunks. We should be able to find a problem in our algorithm soon, thanks! BTW... I kind of overengineered (as you can imagine) and wrote So I can get your file (actually anything generated by |
Tooling looks like it works, but I think we don't have enough chunks to repro it 🤔 Tried all sorts of time ranges, steps and rate ranges.. no luck: All good everywhere... cc @SuperQ , can you send me bit wider time span? 🤔 is this for sure time span you can reproduce the problem with? What if this is caching, some layer above Thanos Querier? |
Tries to reproduces: #2401 I would still merge as it is a great test, and allows us to quickly check data provided by Ben. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Tries to reproduces: #2401 I would still merge as it is a great test, and allows us to quickly check data provided by Ben. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Here's another data sample that reproduces it. The time range to reproduce is this:
When I turn off dedupe, the issue goes away: |
I don't see full data @SuperQ (you gave only half of it I think), but I can repro 🎉 |
Investigating |
Fixes #2401 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Fixes #2401 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Fixes #2401 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Fixes #2401 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
…2522) * Added LocalStore and realistic data for querier counter reset bug. Tries to reproduces: #2401 I would still merge as it is a great test, and allows us to quickly check data provided by Ben. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed tsdbstore required component type. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed ineffectual set. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed liche. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed unknown store issue. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
…2522) * Added LocalStore and realistic data for querier counter reset bug. Tries to reproduces: #2401 I would still merge as it is a great test, and allows us to quickly check data provided by Ben. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed tsdbstore required component type. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed ineffectual set. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed liche. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed unknown store issue. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
…2522) (#2538) * Added LocalStore and realistic data for querier counter reset bug. Tries to reproduces: #2401 I would still merge as it is a great test, and allows us to quickly check data provided by Ben. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed tsdbstore required component type. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed ineffectual set. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed liche. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed unknown store issue. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Not super clear how to fix the issue long term (: Some deep dive https://docs.google.com/spreadsheets/d/13A8ChunqbVdRq9j5kqrtfzknO6mvVFQPUkXwUiBuV_4/edit?usp=sharing |
TL;DR: The problem is with deduplicating a counter series from 2 or more Prometheus replicas.Let's say they scrape the same counter from the same application. Accounting resets correctly in generic deduplication algorithm data is really hard as presented in this spreadsheet. This is due to a different view of END value for each counter by different replicas (different scrape time!). Crafting a deduplication algorithm when we know it's counter metric is quite trivial. The problem is... we don't know. So ideally we need a generic dedup algorithm for replicas. Any ideas @brancz @brian-brazil @beorn7 @SuperQ ? (: My current idea to move this forward:The current idea is to actually have special deduplication for counters. Generally, we don't know what metric is a counter on the offline level (unless it's downsampled data, then we know). However, for query part it's clear. It is counter if On offline rewrite / deduplication level, for raw data, we have no idea what type is. However, for a quick win, we, for now, could just not worry about offline dedup yet and just solve query issues. Then we can maybe for the offline figure something else. Maybe generic dedup that will work for those, or something that will base on |
Or... we should collaborate on different dedup algorithm for future. Maybe scrape interval based? (downside: What if scrape interval changes) |
PR with just tests, not fix yet. Reproduces: #2401 * Added regressions tests for CounterSeriesIterator; Simplified aggregators. * Fixes edge dedup cases for Next and added tests for deduplication. * Refactored downsampling tests, added more realistic cases. * Added check for duplicated chunks during downsampling. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
PR with just tests, not fix yet. Reproduces: #2401 * Added regressions tests for CounterSeriesIterator; Simplified aggregators. * Fixes edge dedup cases for Next and added tests for deduplication. * Refactored downsampling tests, added more realistic cases. * Added check for duplicated chunks during downsampling. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
PR with just tests, not fix yet. Reproduces: #2401 * Added regressions tests for CounterSeriesIterator; Simplified aggregators. * Fixes edge dedup cases for Next and added tests for deduplication. * Refactored downsampling tests, added more realistic cases. * Added check for duplicated chunks during downsampling. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
PR with just tests, not fix yet. Reproduces: #2401 * Added regressions tests for CounterSeriesIterator; Simplified aggregators. * Fixes edge dedup cases for Next and added tests for deduplication. * Refactored downsampling tests, added more realistic cases. * Added check for duplicated chunks during downsampling. * Removed duplicates for efficiency on promSeriesSet. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
PR with just tests, not fix yet. Reproduces: #2401 * Added regressions tests for CounterSeriesIterator; Simplified aggregators. * Fixes edge dedup cases for Next and added tests for deduplication. * Refactored downsampling tests, added more realistic cases. * Added check for duplicated chunks during downsampling. * Removed duplicates for efficiency on promSeriesSet. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
PR with just tests, not fix yet. Reproduces: #2401 * Added regressions tests for CounterSeriesIterator; Simplified aggregators. * Fixes edge dedup cases for Next and added tests for deduplication. * Refactored downsampling tests, added more realistic cases. * Added check for duplicated chunks during downsampling. * Removed duplicates for efficiency on promSeriesSet. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
PR with just tests, not fix yet. Reproduces: #2401 * Added regressions tests for CounterSeriesIterator; Simplified aggregators. * Fixes edge dedup cases for Next and added tests for deduplication. * Refactored downsampling tests, added more realistic cases. * Added check for duplicated chunks during downsampling. * Removed duplicates for efficiency on promSeriesSet. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Help wanted for review! |
That's not common, but you could depend on noone having a scrape interval over 2 minutes as that's not sane for other reasons. |
That's quite good idea 👍 (for #2547) |
* querier: Added regressions tests for counter missed bug. PR with just tests, not fix yet. Reproduces: #2401 * Added regressions tests for CounterSeriesIterator; Simplified aggregators. * Fixes edge dedup cases for Next and added tests for deduplication. * Refactored downsampling tests, added more realistic cases. * Added check for duplicated chunks during downsampling. * Removed duplicates for efficiency on promSeriesSet. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Giedrius comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
It actually saddens me that Prometheus “by design” doesn't really cope with scrape intervals >2m. I'd love to see future Prometheus versions lifting that arbitrary limit, and I'd therefore prefer if Thanos didn't bake in that limit into its own design, too. Interestingly, I'd also love to see future Prometheus version to have 1st class support for metric types. That would then also solve your problem of how to safely recognize a counter. |
* Removed dependency on Cortex fork; Moved to official one. (#2199) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Typo corrections quick-tutorial.md (#2196) * Corrected all Prometheus possessives to read `Prometheus's`, this matches Prometheus's own documentation. * Corrected `simple` to `simply` when describing compactor scanning behaviour Signed-off-by: Peter Avdjian <pavdjian@paperlesspost.com> * tracing: Simplified creation of spans. (#2202) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed links to dashboards json files. (#2203) Signed-off-by: Roman Grytskiv <roman.grytskiv@gmail.com> * Skip deleting files that we just deleted (#2185) * Skip deleting files that we just deleted We see this happening with Swift. Because the consistency of swift is eventual, swift sometimes didn't process the deletion of the meta file yet, and so it turns up in the bkt.Iter(). The second deletion then causes a 404 and compaction fails. Signed-off-by: Wim Fournier <github@fournier.nl> * return, as this is a func. Add debug log and comment Signed-off-by: Wim Fournier <github@fournier.nl> * fixing build: wrong parameter name Signed-off-by: Wim Fournier <github@fournier.nl> * fix lint Signed-off-by: Wim Fournier <github@fournier.nl> * Refactor deleteDir into deleteDirRec and add a parameter for a function that allows to keep certain files. Signed-off-by: Wim Fournier <github@fournier.nl> * Fix lint Signed-off-by: Wim Fournier <github@fournier.nl> * implementing suggested fixes Signed-off-by: Wim Fournier <github@fournier.nl> * improve web.route-prefix handling (#2208) This makes the handling of web.route-prefix more similar to the behavior in Prometheus. Correctly handles '/' and prefixes which do not begin with a '/'. Signed-off-by: Paul Gier <pgier@redhat.com> * Merge release-0.11 back into master (#2212) * Create release v0.11.0-rc.0 (#2156) * Update version to v0.11.0-rc.0 * Update CHANGELOG with all PRs for v0.11 * Improve CHANGELOG by being more explicit * Bumped minio-go library to v6.0.49, fixing an IAM bug in v6.0.45 (#2189) Signed-off-by: Kraig Amador <kraig.amador@ticketmaster.com> * Create release candidate v0.11.0-rc.1 (#2192) Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Release v0.11.0 (#2205) Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Update VERSION to 0.12.0-dev Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Resolve go.sum merge conflict and run go mod tidy Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> Co-authored-by: Kraig Amador <508403+bigkraig@users.noreply.github.com> * returns error messages when trigger reload with http (#1848) * returns error messages when trigger reload with http Signed-off-by: arthur yang <yang_yapo@126.com> * use simple reloadRules function instead of magic chan error error Signed-off-by: yapo.yang <yang_yapo@126.com> * add tailing period for comment Signed-off-by: yapo.yang <yang_yapo@126.com> * fix comment Signed-off-by: arthur yang <yang_yapo@126.com> * add white space for better code reading Signed-off-by: arthur yang <yang_yapo@126.com> * collect thanos rule metrics into one struct Signed-off-by: arthur yang <yang_yapo@126.com> * remove termination logic and keep log only Signed-off-by: arthur yang <yang_yapo@126.com> * update changelog for #1848 Signed-off-by: arthur yang <yang_yapo@126.com> * add tailing period Signed-off-by: arthur yang <yang_yapo@126.com> * check whether registry is nil Signed-off-by: arthur yang <yang_yapo@126.com> * tailing period in metrics Signed-off-by: arthur yang <yang_yapo@126.com> * cancel with context Signed-off-by: arthur yang <yang_yapo@126.com> * return ctx.Err() instead of errors.New Signed-off-by: arthur yang <yang_yapo@126.com> * register thanos rule metrics with promauto Signed-off-by: arthur yang <yang_yapo@126.com> * return errs before set success related metrics Signed-off-by: arthur yang <yang_yapo@126.com> * revert go.sum go.mod change Signed-off-by: arthur yang <yang_yapo@126.com> * reload webhandler/sighup in one for loop Signed-off-by: arthur yang <yang_yapo@126.com> * reload with chan chan error Signed-off-by: yapo.yang <yang_yapo@126.com> * Fix error in component status help message (#2216) Signed-off-by: mcsammac Date: Wed Mar 4 13:50:17 2020 -0500 On branch master Changes to be committed: modified: pkg/prober/intrumentation.go Signed-off-by: s320009 <sam.mcadams@8451.com> * tutorials: fix typo in image version (#2223) Signed-off-by: Paul Gier <pgier@redhat.com> * Blocked classic prometheus constructors, moved all to promauto; Removed unnecessary printfs. (#2228) Fixes: https://github.com/thanos-io/thanos/issues/2102 Also blocked them on CI side, thanks to https://github.com/fatih/faillint/pull/8 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * ruler: Fix #2204 bug where alert queue is unpoppable causing full queue and dropped alerts (#2238) * Add test for alert queue Pop after multiple Push Signed-off-by: Robin Clarke-Williams <43950815+robincw-gr@users.noreply.github.com> * Fix alert queue bug by resignal after Pop (#2204) Signed-off-by: Robin Clarke-Williams <43950815+robincw-gr@users.noreply.github.com> * Fix alert queue test and simplify Signed-off-by: Robin Clarke-Williams <43950815+robincw-gr@users.noreply.github.com> * Update CHANGELOG.md Signed-off-by: Robin Clarke-Williams <43950815+robincw-gr@users.noreply.github.com> * Link to thanos-io/thanos PR in CHANGELOG.md Signed-off-by: Robin Clarke-Williams <43950815+robincw-gr@users.noreply.github.com> * bucket: improve shard label handling (#2219) Signed-off-by: Jacob Colvin <Jacob.Colvin@8451.com> * fixing querier deployment kube manifest example 404 error (#2229) Signed-off-by: Rajesh Rajendran <rjshrjndrn@gmail.com> * *: Fix misuse of pkg/errors.Errorf and error directive (#2253) * Fix pkg/errors error directive issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix misuse of Errorf Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix false metric name in Store GW e2e test (#2256) Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add scheme to the alertmanagers.url in ruler example (#2255) Signed-off-by: gitlawr <lawrleegle@gmail.com> * Sort chunks by thanos.downsample.resolution for better grouping (#2231) Signed-off-by: Paul Traylor <paul.traylor@linecorp.com> * Remove duplicate log.level arg in quickstart.sh (#2148) Signed-off-by: Richard Poole <richard.poole@cudoventures.com> * tutorials: fix incorrect query (#2239) You would have to query `prometheus_tsdb_head_series` instead of `sum(prometheus_tsdb_head_series)` in order to get the 5 results when deduplicating. Signed-off-by: John Chen <johnchen456@gmail.com> * Use new go jsonnet formatter (#2258) Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * docs: Document Thanos Sharding (#1922) * docs: Document Thanos Sharding Signed-off-by: Xiang Dai <764524258@qq.com> * Add time partitioning Signed-off-by: Xiang Dai <764524258@qq.com> * feedback Signed-off-by: Xiang Dai <764524258@qq.com> * Sharding: document supported relabel action and add store gateway backgroud (#2272) * Sharding: document supported relabel action and add store gateway background Signed-off-by: Xiang Dai <764524258@qq.com> * add hashmod Signed-off-by: Xiang Dai <764524258@qq.com> * Add wait-interval flag (#2265) Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * store: Optimized labels conversion on store.Series; Added unsafe labels conversion. (#2230) ## Changes * method TranslateLables CPU Optimized (streamed sorting). * All store GW label conversation to []storepb.Label are now alloc-less. ``` go test -bench=BenchmarkUnsafeVSSafeLabelsConversion -run=^$ -benchmem -timeout 2h -benchtime 10s ./pkg/store/storepb/... goos: linux goarch: amd64 pkg: github.com/thanos-io/thanos/pkg/store/storepb BenchmarkUnsafeVSSafeLabelsConversion/safe-12 34822 339076 ns/op 655368 B/op 2 allocs/op BenchmarkUnsafeVSSafeLabelsConversion/unsafe-12 1000000000 2.32 ns/op 0 B/op 0 allocs/op PASS ``` TODO: Do the same on Querier. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * fix: Ignore the OS-X Trash (#2274) Signed-off-by: kushthedude <kushthedude@gmail.com> * docs/sharding.md: fix a typo (#2273) Signed-off-by: Xiang Dai <764524258@qq.com> * fix replicate duplicate metrics (#2254) Signed-off-by: yeya24 <yb532204897@gmail.com> * Document downsample component (#2090) * scripts/genflagdocs.sh: Generate downsample flag Signed-off-by: Xiang Dai <764524258@qq.com> * Document downsample component Signed-off-by: Xiang Dai <764524258@qq.com> * Move downsample as bucket sub-command Signed-off-by: Xiang Dai <764524258@qq.com> * update docs Signed-off-by: Xiang Dai <764524258@qq.com> * feedback Signed-off-by: Xiang Dai <764524258@qq.com> * Crashing error messages now will print stacktrace. (#2277) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Downsample: update changelog (#2278) * Downsample: update changelog Signed-off-by: Xiang Dai <764524258@qq.com> * feedback Signed-off-by: Xiang Dai <764524258@qq.com> * thanos-mixin: clear units/axis (#2279) * thanos-mixin: clear units/axis Signed-off-by: Xiang Dai <764524258@qq.com> * fix nits Signed-off-by: Xiang Dai <764524258@qq.com> * store, compact, bucket: Delay deletes by scheduling block deletion with deletion-mark.json file (#2136) Signed-off-by: khyatisoneji <khyatisoneji5@gmail.com> * Use maxInt instead of math.MaxInt64 (#2268) math.MaxInt64 doesn't work on 32-bit systems (like linux/arm builds) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Replace objstore.Exists function calls with bkt.Exists (#2284) Signed-off-by: khyatisoneji <khyatisoneji5@gmail.com> * Added Xiang to Triage Role. (#2289) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Enrich Memcached client logs (#2292) * Enrich Memcached client logs Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/cacheutil/memcached_client.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com> * Update pkg/cacheutil/memcached_client.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added Kemal to Triage Role. (#2293) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * bucket: handle instances where no blocks are loaded (#2271) * bucket: handle instances where no blocks are loaded Signed-off-by: Jacob Colvin <Jacob.Colvin@8451.com> * bucket: reject all falsy label values Signed-off-by: Jacob Colvin <Jacob.Colvin@8451.com> * bucket: update changelog Signed-off-by: Jacob Colvin <Jacob.Colvin@8451.com> * docs/sharding.md: Replace example floating link with permalink (#2296) Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * Added latest release badge. (#2300) I think there are NOT enough badges, so added one more! Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * store: Postings fetching optimizations (#2294) * Avoid fetching duplicate keys. Simplified groups with add/remove keys. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added shortcuts Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Optimize away fetching of ALL postings, if possible. Only remove postings for each key once. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't do individual index.Without, but merge them first. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't use map for fetching postings, but return slice instead. This is in line with original code. Using a map was nicer, but more expensive in terms of allocations and hashing labels. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Renamed 'all' to 'allRequested'. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Typo Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Make linter happy. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added comment to fetchPostings. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Group vars Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comments Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use allPostings and emptyPostings variables for special cases. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Unify terminology to "special All postings" Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Address feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added CHANGELOG.md entry. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix check for empty group. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Special All postings is now added as a new group No special handling required anymore. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Updated comment Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * cmd/thanos/receive: Remove unused TLSClientConfig from Options (#2299) Signed-off-by: mrIncompetent <henrik@henrik-schmidt.de> * compactor: Add ReplicaLabelRemover as MetaFetcher filter to enable offline vertical compaction/deduplication for replicated data (#2250) * Create ReplicaLabelsFilter to allow for offline deduplication Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Start adding a e2e test for offline-deduplication with Thanos compact Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Address issues that have discovered after review Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix e2e test service issue Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Improve fetcher unit tests Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add simple compactor e2e tests with replica remover Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Remove unnecessary interface Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add more test cases Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Improve and stabilize e2e tests Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Increase ruler sd refresh interval Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Address review issues Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Separate filters and modifiers Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Matthias Loibl <mail@matthiasloibl.com> * docs/release: squat to release v0.12.0 (#2312) Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * cmd/thanos/receive: Serve TLS when TLSConfig is given (#2311) Signed-off-by: mrIncompetent <henrik@henrik-schmidt.de> Signed-off-by: Lucas Servén Marín <lserven@gmail.com> Co-authored-by: mrIncompetent <henrik@henrik-schmidt.de> * cmd/thanos/compact: add bucket UI (#1714) This commit enhances the compact component so that it runs the bucket UI whenever the --wait flag is also passed. In order to reduce the overhead of running the UI in addition to the compactor, this commit also refactors the compactor and bucket commands a bit in order to re-use a single meta fetcher. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * reloadRules initlialization should fail (#2301) Signed-off-by: arthur yang <yang_yapo@126.com> * Fixed inconsistent metrics and methods (#2319) Signed-off-by: jojohappy <sarahdj0917@gmail.com> * e2e: Refactored compactor test; Fixed flakiness. (#2313) Also: * Reduced number of services for e2e for latency * Fixed halting * Improved logging. * Improved test cases (e.g added test for compaction and halting) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * pkg/store: Report no data if no stores discovered (#2310) * pkg/store: Report no data if no stores discovered Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * CHANGELOG.md: Add timespan reported on empty stores Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * Added max_item_size to Memcached client (#2304) * Added max_item_size to Memcached client Signed-off-by: Marco Pracucci <marco@pracucci.com> * Changed imports order and splitted tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed type casting Signed-off-by: Marco Pracucci <marco@pracucci.com> * Changed imports grouping Signed-off-by: Marco Pracucci <marco@pracucci.com> * Changed memcached max_item_size default from 0 to 1MB Signed-off-by: Marco Pracucci <marco@pracucci.com> * Increased e2e tests timeout Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed typo in CHANGELOG Signed-off-by: Marco Pracucci <marco@pracucci.com> * Reverted Makefile changes Signed-off-by: Marco Pracucci <marco@pracucci.com> * tesutil: Enchanced testutil, refactored for our needs. (#2325) Changed LICENSE as we no longer use version we copied back then. Most of it was reimplemented. Why? * Much richer diff (inspired by testify packages * Consistent API * Less indentation. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * make, ci: Check example alerts and rules in CI (#2318) * Check example alerts and rules in CI Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add require clean tree Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix latency alerts (#2316) Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fixed e2e. (#2327) Sorry, was late when we merged the fix. Funny bug: It would start to fail exactly 12h AFTER 25.03 8:00 GMT Should be fine now... and in future until changed ;p Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * store: added option to reencode and compress postings before storing them to the cache (#2297) * Added "diff+varint+snappy" codec for postings. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added option to reencode and compress postings stored in cache Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Expose enablePostingsCompression flag as CLI parameter. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use "github.com/pkg/errors" instead of "errors" package. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * remove break Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed empty branch Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added copyright headers. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added CHANGELOG.md entry Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added comments. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use Encbuf and Decbuf. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix comments in test file. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Another comment... Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed diffVarintSnappyEncode function. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment on usage with in-memory cache. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * var block Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed extra comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Move comment to error message. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Separated snappy compression and postings reencoding into two functions. There is now header only for snappy-compressed postings. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added comment on using diff+varint+snappy. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Shorten header Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Lint... Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Changed experimental.enable-postings-compression to experimental.enable-index-cache-postings-compression Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added metrics for postings compression Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added metrics for postings decompression Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Reorder metrics Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use encode/decode labels. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * mixin: Make alert threshold values parametric (#2317) * Make alert threshold values parametric Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Rename variable Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Adjsut default values for latency thresholds Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update UW logo (#2329) Signed-off-by: Povilas Versockas <p.versockas@gmail.com> * block fetcher with errgroup (#2309) * block fetcher with errgroup Signed-off-by: arthur yang <yang_yapo@126.com> * errorgroup goroutine defer close Signed-off-by: arthur yang <yang_yapo@126.com> * website: fix 404 on root of sections (#2328) Signed-off-by: Prem Kumar <prmsrswt@gmail.com> * Add mallgroup.com to adopters (#2331) Signed-off-by: Daniel Rataj <daniel.rataj@mall.cz> Co-authored-by: Daniel Rataj <daniel.rataj@mall.cz> * store: Binary index header is now production ready and enabled by default (#2330) * store: Binary index header is now production ready and enabled by default. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed typo. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Add leboncoin company as adopter (#2333) Signed-off-by: Guillaume Chenuet <guillaume.chenuet@adevinta.com> * website: Collapsible menu sections (#2336) * website: make sidemenu collapsed by default Signed-off-by: Prem Kumar <prmsrswt@gmail.com> * website: add caret svg in expandble sidemenu Signed-off-by: Prem Kumar <prmsrswt@gmail.com> * website: expand current section's sidemenu by default Signed-off-by: Prem Kumar <prmsrswt@gmail.com> * ui: fix store never removed from /stores page bug (#2339) * ui: fix store never removed from /stores page bug We need to update `LastCheck` only if the error is non-nil. That field is used in the cleanup function to know when to remove the StoreAPI from the UI. If we always update it, even if an error has happened, that means that `--store.unhealthy-timeout` is never respected. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * query: fix storeset Update() test Now let's start with a proper state where LastCheck is not 0 at the beginning and we have 2 active stores, 3 store statuses just like the original author had intended. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * fix typo in readme (#2342) data -> date Signed-off-by: afirth <afirth@users.noreply.github.com> * query: add --store-strict flag (#2337) * query: add --store-strict flag Add a new flag called `--store-strict` as agreed per https://thanos.io/proposals/202001_thanos_query_health_handling.md/ I have updated the proposal to reflect the reality. Third time's the charm, I believe it :-) Now the flag is called `--store-strict` which only accepts statically defined nodes. I guess the code is even simpler now. I have also fixed one small issue where `%w` was used in `errors.Errorf`. Couldn't compile Thanos locally with Go 1.14 without this fix. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * CHANGELOG: fix changelog item Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Register grpc prometheus middleware metrics (#2347) Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * website: Enabled two scripts to fix Google analytics. (#2346) * website: Enabled two scripts to fix Google analytics. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed also inline style. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added Workfront as adopter (#2351) Signed-off-by: Ryan Orth <ryanorth@workfront.com> Co-authored-by: Ryan Orth <ryanorth@workfront.com> * compact: Fixed minor logging issues. (#2353) Fixes: https://github.com/thanos-io/thanos/issues/2322 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * fetcher: Made metaFetcher go routine safe; Fixed multiple bucket UI + fetcher issues. (#2354) Fixed https://github.com/thanos-io/thanos/issues/2349 Fixed races (we were reusing fetcher by both bucket UI and compaction syncs... Fixed logging Added singleflight to ensure we don't synchronize too often. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * test/e2e: Add timestamp to e2e test log output (#2358) Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * store & compact: For components that operates on blocks - expose the UI on /loaded-blocks (#2357) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * rule: fix query addr parsing (#2288) * rule: fix query addr parsing Signed-off-by: Tobiasz Heller <tobiaszheller@gmail.com> * CR: support different schemas Signed-off-by: Tobiasz Heller <tobiaszheller@gmail.com> * CR: docs and err Signed-off-by: Tobiasz Heller <tobiaszheller@gmail.com> * CR: improve error handling and more TC Signed-off-by: Tobiasz Heller <tobiaszheller@gmail.com> * mixin: Remove unused jobPrefix field (#2364) Signed-off-by: Lili Cosic <cosiclili@gmail.com> * Create release v0.12.0-rc.0 (#2360) Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * Allow more connection reuse than the default of 2 (#2343) Signed-off-by: Jakob Kartschall <j.kartschall@syseleven.de> * Makefile: ignore GCS in CI (#2368) We got booted from the GCS account, so skip this in CI for now. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * Revert "Makefile: ignore GCS in CI (#2368)" (#2373) This reverts commit 8591434856ced5803e399b4d9d1bf2d1459c0ee0. * mixin: Added critical Rules alerts. (#2374) * mixin: Added critical Rules alerts. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * mixin: Made sure Rule alerts are not firing if one replica is failing. (#2375) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Update S3 endpoint mapping link (#2377) The link for the AWS Region Endpoint Mappings for S3 was out of date, this PR updates it to point to the new location. Signed-off-by: João Carvalho <joaopecarvalho@gmail.com> * Fix2213 0.12 (#2382) * binaryHeader: Fixed partial write issue for index-header. Fixes https://github.com/thanos-io/thanos/issues/2213 This caused was indicated as regression of latency, and also causes potential critical issue for store GW, where manual delete of index-header from local storage was required. This might be considered as blocker for 0.12, so it would be worth to port it to 0.12 TBH @squat. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * binary_reader: ensure fs is synced before renaming Signed-off-by: Lucas Servén Marín <lserven@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * objstore: Added WithExpectedErrs which allows to control instrumentation (e.g not increment failures for expected not found) (#2383) * objstore: Added WithExpectedErrs to Reader which allows to control instrumentation (e.g not increment failures for expected not found). This allows to not wake up oncall in the middle of night, becuase of expeced, properly handled case (: Also: Has to move inmem to objstore for testing. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * pkg/objstore: fix NewBucket comments. This commit fixes the documentation comments for the NewBucket funcs. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * pkge/receive: trace TSDB ingestion (#2384) This commit adds a tracing span around the writing of remote-write requests into TSDB. This will help us differentiate between the latencies in the forwarding of requests around the hashring and the latencies of appending to the database. This commit also removes the `thanos_` prefix from the forwarding span to better align with the span naming in the rest of the project. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * compact: Made MarkForDeletion less strict; Added more debugability to block deletion logic, made meta sync explicit. (#2385) Also: * Changed order: Now BestEffortCleanAbortedPartialUploads is before DeleteMarkedBlocks. * Increment markedForDeletion counter only when we actually uploaded block. * Fixed logging issues. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Compactor: Document reasons and solutions about overlaps (#2191) * troubleshooting.md: document overlaps Signed-off-by: Xiang Dai <764524258@qq.com> * feedback Signed-off-by: Xiang Dai <764524258@qq.com> * feedback Signed-off-by: Xiang Dai <764524258@qq.com> * add reminder label to stale bot config (#2378) Signed-off-by: yeya24 <yb532204897@gmail.com> * fix sharding docs style; fix promtail link (#2379) Signed-off-by: yeya24 <yb532204897@gmail.com> * store: Fixed binary header bug that was causing all postings to be kept in memory instead of 1/32 as we meant. (#2390) * store: Fixed binary header bug that was causing all postings to be kept in memory instead of 1/32 as we meant. Spotted by @mkabischev! Thanks to you and @d-ulyanov as well! Epic finding +1 Test output before fix: testutil.Equals(t, 1, br.version) testutil.Equals(t, 2, br.indexVersion) testutil.Equals(t, &BinaryTOC{Symbols: headerLen, PostingsOffsetTable: 66}, br.toc) testutil.Equals(t, int64(626), br.indexLastPostingEnd) testutil.Equals(t, 8, br.symbols.Size()) testutil.Equals(t, map[string]*postingValueOffsets{ "": { offsets: []postingOffset{{value: "", tableOff: 4}}, lastValOffset: 392, }, "a": { offsets: []postingOffset{ {value: "1", tableOff: 9}, {value: "11", tableOff: 16}, {value: "12", tableOff: 24}, {value: "2", tableOff: 32}, {value: "3", tableOff: 39}, {value: "4", tableOff: 46}, {value: "5", tableOff: 53}, {value: "6", tableOff: 60}, {value: "7", tableOff: 67}, {value: "8", tableOff: 74}, {value: "9", tableOff: 81}, }, lastValOffset: 572, }, "longer-string": { offsets: []postingOffset{{value: "1", tableOff: 88}}, lastValOffset: 622, }, }, br.postings) testutil.Equals(t, 0, len(br.postingsV1)) testutil.Equals(t, 2, len(br.nameSymbols)) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added CHANGELOG item. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed build errs. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Lucas comment. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * store: Fixed critical bug, when certain not-existing value queried was causing "invalid size" error. (#2393) Reason why we could not reproduce it locally was that for most of non-existing value we were lucky that buffer was still long enough and we could read and decode some (malformed) variadic type. For certain rare cases, buffer was not long enough. Fixed and spotted thanks to amazing @mkabischev! * Added more regression tests for binary header. Without the fix it fails with: ``` header_test.go:154: header_test.go:154: exp: range not found got: get postings offset entry: invalid size ``` Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * VERSION: cut v0.12.0-rc.1 (#2396) Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * mixin: Change critical rule alert to be symtom based (#2398) This change makes the critical (typically paging) alert more symptom based, rather than observing data written to disk. Additionally after this change the alert will only fire if there are actually rules loaded. Additionally to no rules loaded the previous alert was also prone to rules that legitimately are not writing data. Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * scripts: Added grpcurl script useful for Thanos debugging. (#2403) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * bucket docs: fix "thanos downsample" remnant (#2409) and follow formatting of the other bucket commands Signed-off-by: John Belmonte <john@neggie.net> * docs: Added Thanos Go style guide and some development tips. (#2359) * docs: Added Thanos Go style guide and some development tips. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments; added TOC and image. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added more rules. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Grammarly fixes! Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * docs: Fixed table formatting for coding style guide. (#2421) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added extra check for sorting time Duration and int strings (#2416) Signed-off-by: kadern0 <kaderno@gmail.com> * docs: Added minor note to single rule. (#2422) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed TOC. (#2424) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * store dashboard: fix gRPC streamed detail panels (#2426) Fixes #2425 Signed-off-by: John Belmonte <john@neggie.net> * use bytes unit where appropriate on grafana dashboards (#2423) Signed-off-by: John Belmonte <john@neggie.net> * bucket verify: document that compactor should be disabled (#2418) Signed-off-by: John Belmonte <john@neggie.net> * docs: Fixed typo in coding guide. (#2427) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added Marco as Thanos Maintainer (#2428) Also, reordered list alphabetically. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * store: proxy: fix queries never timing out bug (#2411) * store: proxy: add test for deadlocking problem Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * store: proxy: add fix for timeouts Checking here if the series context has ended is the correct fix here. We want to check it because if any of the other Series() calls error out then the context is canceled. So, it is equal to checking for errors "downstream", in `mergedSeriesSet`. Also, `handleErr()` here is the correct function to use because in such a case we want to set `s.err` -- if `io.EOF` still hasn't been received then it means that StoreAPI still has some data that it wants to send but hasn't yet. With this, the previously added test passes. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * docs: fixed typo in coding style guide (#2431) Signed-off-by: Stephan Kirsten <vebis@gmx.net> * docs/release-process: make shell command copyable (#2433) In general, I think it is easier for users of guides when shell commands are listed without a preceeding `$`, otherwise the commands cannot be directly copied and pasted into a terminal. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * docs/contributing: clean up style guide grammar (#2432) This commit makes some small grammar fixes to the coding style guide. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * cut v0.12.0 (#2437) Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * .circleci: use consistent ci image tags (#2440) We were not using the latest thanos-ci image tag for every part of the CI pipeline: we were using 0.3.0 for tests but 0.2.0 for all builds. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * CHANGELOG.md: fix changelog The changelog in the release-0.12 branch is correct, but somewhere in the merge back into master, the changelog was mangled. This puts the fixes in their correct places. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * store: proxy: fix queries never timing out bug (#2411) (#2443) * store: proxy: add test for deadlocking problem Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * store: proxy: add fix for timeouts Checking here if the series context has ended is the correct fix here. We want to check it because if any of the other Series() calls error out then the context is canceled. So, it is equal to checking for errors "downstream", in `mergedSeriesSet`. Also, `handleErr()` here is the correct function to use because in such a case we want to set `s.err` -- if `io.EOF` still hasn't been received then it means that StoreAPI still has some data that it wants to send but hasn't yet. With this, the previously added test passes. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com> * proposal: Added proposal for new Thanos component: Thanos Frontend. (#2434) * proposal: Added proposal for new Thanos component: Thanos Frontend. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added more rationales for separate binary. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Marco comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed lucas comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Changed to approved. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Moved to query-frontend command. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed memcached client metrics initialization (#2446) Signed-off-by: Marco Pracucci <marco@pracucci.com> * store: Added regex-set optimization to ExpandedPostings (#2450) * Added regex-set optimization to ExpandedPostings Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed capitalization. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed unnecessary change. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Remove whitespace Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use testutil instead of testify. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added copyright header, from original Prometheus querier.go Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use Thanos copyright header. :facepalm: Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added · at the end of the sentence. :exploding_head:. I will randomly add emojis and GitHub emoji markup to commit messages that fix frustrating checks like this one. And intentionally not break the line. Let's see how lint deals with that! Ha. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * docs/contributing: use Before for IsExpired (#2456) Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com> * cmd/thanos: clean gosimple S1039 (#2464) Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com> * docs: Update CONTRIBUTING.md with DCO (#2465) * docs: Update CONTRIBUTING.md with DCO Signed-off-by: ranjithkumar007 <ranjith.dakshana2015@gmail.com> * Update CONTRIBUTING.md Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: ranjithkumar007 <ranjith.dakshana2015@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added tests to reproduce #2459. (#2462) Related to: https://github.com/thanos-io/thanos/issues/2459 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added a page for documenting beginner issues (#2461) * Added some documentation for beginner issues Signed-off-by: Yash <yashrsharma44@gmail.com> * Edited some lines Signed-off-by: Yash <yashrsharma44@gmail.com> * Update docs/operating/troubleshooting.md Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Yash <yashrsharma44@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * pkg/block/fetcher: fix concurrent map usage (#2474) Fixes: #2471 This commit fixes an issue where multiple goroutines in the block fetcher filtering were concurrently accessing the same map. The goroutines were concurrently writing AND reading to the shared metas map. This commit guards this concurrent access by giving the DeduplicateFilter struct a mutex. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * Reverted addition of deletion mark for partial uploads. (#2472) Fixes https://github.com/thanos-io/thanos/issues/2459 (quick fix). This keeps the logic from the 0.11.0 which was good enough. Some improvement for future: https://github.com/thanos-io/thanos/issues/2470 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Remove optimizations for label=~".*" and label!~".*". (#2475) * Remove optimizations for label=~".*" and label!~".*". They are not correct. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * cut v0.12.1 (#2476) Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * fix thanos web route prefix register twice (#2489) Signed-off-by: yeya24 <yb532204897@gmail.com> Signed-off-by: Lucas Servén Marín <lserven@gmail.com> Co-authored-by: yeya24 <yb532204897@gmail.com> * Do not lock DNS Provider.Address() while Resolve() is running (#2492) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Compact: Update compact documentation to better clarify dedupeReplicaLabels. (#2481) * Update compact documentation to better clarify dedupeReplicaLabels. Signed-off-by: Johnathan Falk <johnathan.falk@gmail.com> * Fix capitalization. Signed-off-by: Johnathan Falk <johnathan.falk@gmail.com> * Gracefully handle additional oneof fields in SeriesResponse (#2501) * Gracefully handle additional oneof fields in SeriesResponse Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed unnecessary continue Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated CHANGELOG Signed-off-by: Marco Pracucci <marco@pracucci.com> * fix typo (#2509) Signed-off-by: arthur yang <yang_yapo@126.com> * Adjust memcached operation buckets (#2504) Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * pkg/query: remove obsolete 'thanos_store_node_info' metric (#2505) Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Add Community information (#2510) * Add Community information Signed-off-by: Povilas Versockas <p.versockas@gmail.com> * Fixes after review Signed-off-by: Povilas Versockas <p.versockas@gmail.com> * Move to contributing menu Signed-off-by: Povilas Versockas <p.versockas@gmail.com> * Remove incompleteView field from fetcher response. (#2455) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added hints support to store protobuf (#2502) * Added hints support to store protobuf Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated CHANGELOG Signed-off-by: Marco Pracucci <marco@pracucci.com> * Reworded hints doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * Removed hints_enabled from SeriesRequest Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove spurious newline after rebase Signed-off-by: Marco Pracucci <marco@pracucci.com> * Leveraging docker layer caching (#2508) Signed-off-by: ankitjain28may <ankitjain28may77@gmail.com> * add gofmt -s step to makefile and golangci (#2463) * gofmt -s files Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com> * golangci: add gofmt to linters Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com> * makefile: add gofmt to format Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com> * Update coding-style-guide.md (#2520) make `doSomething` a function call. Signed-off-by: Halil Kaskavalci <halil@kaskavalci.com> * Let's be more nicer on stale things (: (#2517) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * docs/proposals/202003_thanos_rules_federation: initial commit (#2263) Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> * cmd: Moved all no-service commands under new tools subcommand. (#2513) This will allow better extensibility for future for non-bucket related tools we plan to add. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added hints support to BucketStore.Series() (#2516) * Added hints support to BucketStore.Series() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed goimport grouping Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added missing copyright Signed-off-by: Marco Pracucci <marco@pracucci.com> * Addressed review comments Signed-off-by: Marco Pracucci <marco@pracucci.com> * Exclude zoom.us from liche (because zoom.us response headers are over 4KB) Signed-off-by: Marco Pracucci <marco@pracucci.com> * update uswitch logo and branding (#2529) Signed-off-by: Joseph-Irving <joseph.irving500@gmail.com> * *: add metrics to the reloader package (#2521) Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Added LocalStore and realistic data for querier counter reset bug. (#2522) (#2538) * Added LocalStore and realistic data for querier counter reset bug. Tries to reproduces: https://github.com/thanos-io/thanos/issues/2401 I would still merge as it is a great test, and allows us to quickly check data provided by Ben. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed tsdbstore required component type. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed ineffectual set. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed liche. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed unknown store issue. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * docs: fixed broken links in documentation (#2540) * fix tiny typo Signed-off-by: Dan Potepa <dan@danpotepa.co.uk> * fix link to example manifest files Signed-off-by: Dan Potepa <dan@danpotepa.co.uk> * fixed some broken links Signed-off-by: Dan Potepa <dan@danpotepa.co.uk> * Clear duplicateIDs at the beginning of Filter. (#2544) * Clear duplicateIDs at the beginning of Filter. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Address review feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix whitespace noise. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * :whale: :neckbeard: :kick_scooter: Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * cmd: rule: do not wrap reload endpoint with prefix twice (#2533) * cmd: rule: do not wrap reload endpoint with '/' Do not wrap the router with `/` on the `/-/reload` endpoint. Otherwise, it is inaccessible when no prefix has been specified by the user. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * CHANGELOG: update Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * e2e: rule: add test for reloading rules via /-/reload Add a test-case to the e2e tests for testing whether reloading rules via /-/reload works. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * VERSION: cut release v0.12.2 (#2545) Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * ui: bump jQuery version to v3.5.0 (#2549) Signed-off-by: Prem Kumar <prmsrswt@gmail.com> * Bumped minio-go library to v6.0.53 (#2536) * Bumped minio-go library to v6.0.53 Signed-off-by: alicek106 <alice_k106@naver.com> * Updated CHANGELOG with PR Signed-off-by: alicek106 <alice_k106@naver.com> * Add deleteSeries skeleton to return bad request (#2530) Signed-off-by: darshanime <deathbullet@gmail.com> * Revert "Add deleteSeries skeleton to return bad request (#2530)" (#2551) This reverts commit d0bcbff8375b6384292533ffa84b6408b85b0acb. * Fixed the timezone url (#2553) Signed-off-by: Yash <yashrsharma44@gmail.com> * Updated to golang v1.14.2 (#2194) * Update golang:1.14.2 Signed-off-by: Raúl Naveiras <me@raulnaveiras.com> * Update thanos-ci:go1.14.2-node It requires a manual process to generate and push this container. ``` make docker-ci DOCKER_CI_TAG=go1.14.2-node ``` Signed-off-by: Raúl Naveiras <me@raulnaveiras.com> * Update golang:1.14.2 for github actions Signed-off-by: Raúl Naveiras <me@raulnaveiras.com> * Update CHANGELOG Signed-off-by: Raúl Naveiras <me@raulnaveiras.com> * Fix yaml indentation Signed-off-by: Raúl Naveiras <me@raulnaveiras.com> * Added Bartek as next release shepherd. (#2556) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * receive: Add support for TSDB per tenant (#2012) * receive: Add support for TSDB per tenant Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * pkg/store: Merge SeriesSets of multiple TSDB stores This is required as the Series gRPC method of the StoreAPI requires the Series returned to be sorted. Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * pkg/receive: Add multitsdb shipper support Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * Address comments Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * Add more comments on types and functions Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * pkg/store/multitsdb.go: Remove unused struct field Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * pkg/receive/multitsdb.go: Remove unused Close method TSDBs are implicitly closed by flushing the database, which is ensured on shutdown, hence there is no need to have the explicit close method. Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * pkg/store/multitsdb.go: Make errors and warnings tenant aware Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * pkg/store/multitsdb.go: Consistent tenant aware errors and warnings Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * cmd/thanos/receive.go: Auto migrate legacy to multitsdb disk layout (#2557) Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * Merge 0.12 into master (#2559) * Clear duplicateIDs at the beginning of Filter. (#2544) * Clear duplicateIDs at the beginning of Filter. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Address review feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix whitespace noise. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * :whale: :neckbeard: :kick_scooter: Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * cmd: rule: do not wrap reload endpoint with prefix twice (#2533) * cmd: rule: do not wrap reload endpoint with '/' Do not wrap the router with `/` on the `/-/reload` endpoint. Otherwise, it is inaccessible when no prefix has been specified by the user. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * CHANGELOG: update Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * e2e: rule: add test for reloading rules via /-/reload Add a test-case to the e2e tests for testing whether reloading rules via /-/reload works. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * VERSION: cut release v0.12.2 (#2545) Signed-off-by: Lucas Servén Marín <lserven@gmail.com> Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com> Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Revert "Merge 0.12 into master (#2559)" (#2560) This reverts commit 003d245282bd683826304d25d1719c39d7401629. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * querier: Added regressions tests for counter missed reset bug. (#2528) * querier: Added regressions tests for counter missed bug. PR with just tests, not fix yet. Reproduces: https://github.com/thanos-io/thanos/issues/2401 * Added regressions tests for CounterSeriesIterator; Simplified aggregators. * Fixes edge dedup cases for Next and added tests for deduplication. * Refactored downsampling tests, added more realistic cases. * Added check for duplicated chunks during downsampling. * Removed duplicates for efficiency on promSeriesSet. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Giedrius comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * receive: Use read locks where possible to read tenants (#2563) Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * receive: Block WAL replay when starting receive component (#2564) Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * docs: Added mention about thanos-remote-read integration. (#2566) Thanks to G-Research as per: https://cloud-native.slack.com/archives/CL25937SP/p1588687640060200?thread_ts=1588167992.463800&cid=CL25937SP Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * query/storeset: do not close the connection if strict mode enabled (#2568) * query/storeset: do not close the connection if strict mode enabled Do not close the gRPC connection if establishing a connection has succeeded but we have failed to get response to a Info() call. Without this and with strict mode in such a case, we will always keep around a closed connection that won't work anymore unless the whole Thanos Query process will be restarted. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * query/storeset: add test, add CHANGELOG item Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Update gitignore with integration tests directory (#2552) Signed-off-by: Ranjith Kumar <ranjith.dakshana2015@gmail.com> * Fixed thanos_compact_garbage_collected_blocks_total metric help (#2572) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Chunks caching at bucket level (#2532) * Added generic cache interface. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added memcached implementation of Cache. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Chunks-caching bucket. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix sentences Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix sentences Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix sentences Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Rename config objects. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Review feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Review feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added metrics for object size. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added requested chunk bytes metric. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Caching bucket docs. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed tests. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix test. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Update docs/components/store.md Update pkg/store/cache/caching_bucket.go Co-authored-by: Marco Pracucci <marco@pracucci.com> Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Dots Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Always set lastBlockOffset. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Merged cached metric into fetched metric, added labels. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added CHANGELOG.md entry Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Reworded help for thanos_store_bucket_cache_fetched_chunk_bytes_total Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added tracing around getRangeChunkFile method. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Updated CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Options Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix parameter name. (store. got dropped by accident) Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use embedded Bucket Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added comments. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Hide store.caching-bucket.config flags. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Renamed block to subrange. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Renamed block to subrange. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Header Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added TODO Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed TODO, in favor of creating issue. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use NopCloser. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> * Reword block deletion comments and logs in compactor (#2574) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Coding Style typos and a few grammar improvements (#2448) Changes mainly made for consistency, like section headers being in imperative tense: "do this thing" instead of "this is the thing" Signed-off-by: Stephen Weber <stephen.t.weber@gmail.com> * quickstart: fix bucket web after recent changes (#2580) The subcommand is called now `tools bucket web` after the recent changes. Without this, the quickstart script outputs: ``` Error parsing commandline arguments: expected command but got "bucket" thanos: error: expected command but got "bucket" ``` Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Fix typo on reload function (#2584) Signed-off-by: Joel Bastos <kintoandar@gmail.com> * Refactor of commands and flag parsing for sidecar (#2267) Signed-off-by: Philip Gough <philip.p.gough@gmail.com> * ui: add new React UI from Prometheus (#2412) * ui: add React UI from upstream Prometheus Signed-off-by: Adrien Fillon <adrien.fillon@gmail.com> * ui: incorporate new changes from Prometheus React UI Signed-off-by: Prem Kumar <prmsrswt@gmail.com> * ui: adapted the React UI to Thanos Signed-off-by: Prem Kumar <prmsrswt@gmail.com> Co-authored-by: Adrien Fillon <adrien.fillon@gmail.com> Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Fix minor typos (#2586) Signed-off-by: Pierre-Yves Aillet <pyaillet@gmail.com> * react: update deps (#2589) * react: graph/panel: revert changes temporarily Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * react-app: apply 'Update React vendoring' Add the commit https://github.com/prometheus/prometheus/commit/65a19421a42c69e16241eec24c66b98e4c8fa5da via a 3-way merge. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * ui/react-app: update yarn deps Should fix security warnings. Ported from https://github.com/prometheus/prometheus/commit/24ecae995691dabf782a6b4a7464f7aab561b554. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * ui: update bindata Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Makefile: remove --coverage from test run (#2591) Found out that there is some weird interaction between `jest --coverage` and `babel-plugin-istanbul`. Maybe related to: https://github.com/facebook/jest/issues/6827. From my testing, removing `--coverage` makes this work again. Probably worth investigating in the future why that happens. Also, this is really not needed during CI because we do not use the coverage data anywhere anyway. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * ci: use GitHub Actions to test React UI (#2595) * ci: test React UI using GitHub actions Signed-off-by: Prem Kumar <prmsrswt@gmail.com> * ci: remove react-app-test from CircleCI as we now use GH Actions Signed-off-by: Prem Kumar <prmsrswt@gmail.com> * pkg/ui: bump jQuery to 3.5.0 (#2597) Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * Added receiver multidb unit tests for basic cases. (#2593) Unfortunately, all passes. ): Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed make docs; Updated last disprepancies. (#2611) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * mixin: Alert on receive not uploading recent data (#2612) Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com> * Metadata caching in bucket (#2579) * Added caching for Iter. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added cache for Exists call for meta-files. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added cache for reading block metadata files. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Make caching bucket configurable with different caches for different type of objects. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed tests. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added caching for ObjectSize. Enabled caching of index. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Lint feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use single set of metrics for all operations. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Constants. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use operation specific config. Generic configuration is only for user. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix typo, make lint happy. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Simplify constants. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Simplify caching configuration. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Refactor cache configuration. Configuration is now passed to the cache when created. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Review feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix operationRequests and operationHits for getRange. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Make codec for Iter results configurable. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added header. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Renamed "dir" config to "blocks-iter". Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Bump default values for meta exists/doesntExist ttls. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed example how cache could be configured for index. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Address review feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Get now implements streaming reader, and buffers object in memory. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added test for partial read. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed unused function. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Updated the help message for --data-di…
Found by GitLab, we were investigating offline with @SuperQ
Their issue: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/9293
This can be only reproducible with large rates
[30m+]
which means it has to do with chunks ordering or overlaps.The text was updated successfully, but these errors were encountered: