Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querier: Rate over deduplicated counter from many replicas can lead to double reset account. #2401

Closed
bwplotka opened this issue Apr 8, 2020 · 22 comments · Fixed by #2548
Closed

Comments

@bwplotka
Copy link
Member

bwplotka commented Apr 8, 2020

Found by GitLab, we were investigating offline with @SuperQ

Their issue: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/9293

This can be only reproducible with large rates [30m+] which means it has to do with chunks ordering or overlaps.

@bwplotka
Copy link
Member Author

bwplotka commented Apr 8, 2020

Some potential issue is that overlaps are not handled really well: https://github.com/thanos-io/thanos/pull/2400/files

@bwplotka
Copy link
Member Author

bwplotka commented Apr 9, 2020

It can be something related to fact that GitLab is using Store GW in HA without loadbalancer (querying both in same time), so the data is duplicated and unsorted (chunks) for sure.

@SuperQ
Copy link
Contributor

SuperQ commented Apr 11, 2020

@bwplotka We initially reproduced the issue with direct to Prometheus sidecars, not Store GW. These are in HA pairs as well.

@bwplotka
Copy link
Member Author

Got some test data from @SuperQ so hopefully will be able to repro locally 🤗

Fingers crossed 🤞

@bwplotka
Copy link
Member Author

Looks like it is dup of #1326 - let's continue discussion there.

@bwplotka
Copy link
Member Author

Actually let's not be so sure, this might be different (here deduplication does not cause it)

@bwplotka bwplotka reopened this Apr 23, 2020
bwplotka added a commit that referenced this issue Apr 23, 2020
Reproduces: #2401

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

bwplotka commented Apr 24, 2020

@SuperQ this repro is so amazing. can explore all details. Definitely we have overlapping and unsorted chunks. We should be able to find a problem in our algorithm soon, thanks!

BTW... I kind of overengineered (as you can imagine) and wrote thanos tools storeapi serve --json=<file x> which can serve JSON (protobuf based) and as Store API 🎉

So I can get your file (actually anything generated by grpcurl and put into thanos tools storeapi serve --json , run querier and connect storeapi serve as --store, and see your results:

image

@bwplotka
Copy link
Member Author

bwplotka commented Apr 24, 2020

Tooling looks like it works, but I think we don't have enough chunks to repro it 🤔

Tried all sorts of time ranges, steps and rate ranges.. no luck:

image

All good everywhere...

cc @SuperQ , can you send me bit wider time span? 🤔 is this for sure time span you can reproduce the problem with? What if this is caching, some layer above Thanos Querier?

bwplotka added a commit that referenced this issue Apr 24, 2020
Tries to reproduces: #2401

I would still merge as it is a great test, and allows us to quickly
check data provided by Ben.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 24, 2020
Tries to reproduces: #2401

I would still merge as it is a great test, and allows us to quickly
check data provided by Ben.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@SuperQ
Copy link
Contributor

SuperQ commented Apr 25, 2020

Here's another data sample that reproduces it. The time range to reproduce is this:

  • End time: 2020-04-24 04:00
  • Range: 4h
  • Step: 1800
  • Query: rate(gitlab_transaction_cache_read_hit_count_total[30m])

data-web-08.json.gz
image

When I turn off dedupe, the issue goes away:

image

@bwplotka
Copy link
Member Author

bwplotka commented Apr 27, 2020

I don't see full data @SuperQ (you gave only half of it I think), but I can repro 🎉

image

@bwplotka
Copy link
Member Author

Investigating

@bwplotka
Copy link
Member Author

Step does not matter, it's deduplication bug:

image

bwplotka added a commit that referenced this issue Apr 27, 2020
Fixes #2401

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 27, 2020
Fixes #2401

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 27, 2020
Fixes #2401

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 27, 2020
Fixes #2401

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
brancz pushed a commit that referenced this issue Apr 29, 2020
…2522)

* Added LocalStore and realistic data for querier counter reset bug.

Tries to reproduces: #2401

I would still merge as it is a great test, and allows us to quickly
check data provided by Ben.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed tsdbstore required component type.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed ineffectual set.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed liche.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed unknown store issue.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 29, 2020
…2522)

* Added LocalStore and realistic data for querier counter reset bug.

Tries to reproduces: #2401

I would still merge as it is a great test, and allows us to quickly
check data provided by Ben.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed tsdbstore required component type.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed ineffectual set.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed liche.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed unknown store issue.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
brancz pushed a commit that referenced this issue Apr 29, 2020
…2522) (#2538)

* Added LocalStore and realistic data for querier counter reset bug.

Tries to reproduces: #2401

I would still merge as it is a great test, and allows us to quickly
check data provided by Ben.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed tsdbstore required component type.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed ineffectual set.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed liche.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed unknown store issue.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

Not super clear how to fix the issue long term (: Some deep dive https://docs.google.com/spreadsheets/d/13A8ChunqbVdRq9j5kqrtfzknO6mvVFQPUkXwUiBuV_4/edit?usp=sharing

@bwplotka bwplotka changed the title querier: Merging multiple chunks causes reset to be ignored by PromQL querier: Rate over deduplicated counter from many replicas can lead doubly accounted reset. Apr 30, 2020
@bwplotka
Copy link
Member Author

bwplotka commented Apr 30, 2020

TL;DR: The problem is with deduplicating a counter series from 2 or more Prometheus replicas.

Let's say they scrape the same counter from the same application.

Accounting resets correctly in generic deduplication algorithm data is really hard as presented in this spreadsheet. This is due to a different view of END value for each counter by different replicas (different scrape time!).

Crafting a deduplication algorithm when we know it's counter metric is quite trivial. The problem is... we don't know. So ideally we need a generic dedup algorithm for replicas.

Any ideas @brancz @brian-brazil @beorn7 @SuperQ ? (:

My current idea to move this forward:

The current idea is to actually have special deduplication for counters. Generally, we don't know what metric is a counter on the offline level (unless it's downsampled data, then we know). However, for query part it's clear. It is counter if rate func was in hints from PromQL. So we can use special counter-based dedup.

On offline rewrite / deduplication level, for raw data, we have no idea what type is. However, for a quick win, we, for now, could just not worry about offline dedup yet and just solve query issues.

Then we can maybe for the offline figure something else. Maybe generic dedup that will work for those, or something that will base on _total metric name (but that's sketchy)

@bwplotka
Copy link
Member Author

Or... we should collaborate on different dedup algorithm for future. Maybe scrape interval based? (downside: What if scrape interval changes)

bwplotka added a commit that referenced this issue Apr 30, 2020
PR with just tests, not fix yet.

Reproduces: #2401

* Added regressions tests for CounterSeriesIterator; Simplified aggregators.
* Fixes edge dedup cases for Next and added tests for deduplication.
* Refactored downsampling tests, added more realistic cases.
* Added check for duplicated chunks during downsampling.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 30, 2020
PR with just tests, not fix yet.

Reproduces: #2401

* Added regressions tests for CounterSeriesIterator; Simplified aggregators.
* Fixes edge dedup cases for Next and added tests for deduplication.
* Refactored downsampling tests, added more realistic cases.
* Added check for duplicated chunks during downsampling.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 30, 2020
PR with just tests, not fix yet.

Reproduces: #2401

* Added regressions tests for CounterSeriesIterator; Simplified aggregators.
* Fixes edge dedup cases for Next and added tests for deduplication.
* Refactored downsampling tests, added more realistic cases.
* Added check for duplicated chunks during downsampling.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 30, 2020
PR with just tests, not fix yet.

Reproduces: #2401

* Added regressions tests for CounterSeriesIterator; Simplified aggregators.
* Fixes edge dedup cases for Next and added tests for deduplication.
* Refactored downsampling tests, added more realistic cases.
* Added check for duplicated chunks during downsampling.
* Removed duplicates for efficiency on promSeriesSet.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 30, 2020
PR with just tests, not fix yet.

Reproduces: #2401

* Added regressions tests for CounterSeriesIterator; Simplified aggregators.
* Fixes edge dedup cases for Next and added tests for deduplication.
* Refactored downsampling tests, added more realistic cases.
* Added check for duplicated chunks during downsampling.
* Removed duplicates for efficiency on promSeriesSet.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 30, 2020
PR with just tests, not fix yet.

Reproduces: #2401

* Added regressions tests for CounterSeriesIterator; Simplified aggregators.
* Fixes edge dedup cases for Next and added tests for deduplication.
* Refactored downsampling tests, added more realistic cases.
* Added check for duplicated chunks during downsampling.
* Removed duplicates for efficiency on promSeriesSet.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Apr 30, 2020
PR with just tests, not fix yet.

Reproduces: #2401

* Added regressions tests for CounterSeriesIterator; Simplified aggregators.
* Fixes edge dedup cases for Next and added tests for deduplication.
* Refactored downsampling tests, added more realistic cases.
* Added check for duplicated chunks during downsampling.
* Removed duplicates for efficiency on promSeriesSet.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka bwplotka changed the title querier: Rate over deduplicated counter from many replicas can lead doubly accounted reset. querier: Rate over deduplicated counter from many replicas can lead to double reset account. Apr 30, 2020
@bwplotka
Copy link
Member Author

Fix: #2548
Tests: #2528

@bwplotka
Copy link
Member Author

bwplotka commented Apr 30, 2020

Help wanted for review!

@brian-brazil
Copy link
Contributor

(downside: What if scrape interval changes)

That's not common, but you could depend on noone having a scrape interval over 2 minutes as that's not sane for other reasons.

@bwplotka
Copy link
Member Author

bwplotka commented May 1, 2020

That's quite good idea 👍 (for #2547)

bwplotka added a commit that referenced this issue May 5, 2020
* querier: Added regressions tests for counter missed bug.

PR with just tests, not fix yet.

Reproduces: #2401

* Added regressions tests for CounterSeriesIterator; Simplified aggregators.
* Fixes edge dedup cases for Next and added tests for deduplication.
* Refactored downsampling tests, added more realistic cases.
* Added check for duplicated chunks during downsampling.
* Removed duplicates for efficiency on promSeriesSet.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Giedrius comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@beorn7
Copy link
Contributor

beorn7 commented May 5, 2020

It actually saddens me that Prometheus “by design” doesn't really cope with scrape intervals >2m. I'd love to see future Prometheus versions lifting that arbitrary limit, and I'd therefore prefer if Thanos didn't bake in that limit into its own design, too.

Interestingly, I'd also love to see future Prometheus version to have 1st class support for metric types. That would then also solve your problem of how to safely recognize a counter.

bwplotka added a commit that referenced this issue May 20, 2020
* Removed dependency on Cortex fork; Moved to official one. (#2199)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Typo corrections quick-tutorial.md (#2196)

* Corrected all Prometheus possessives to read `Prometheus's`, this matches Prometheus's own documentation.
* Corrected `simple` to `simply` when describing compactor scanning behaviour

Signed-off-by: Peter Avdjian <pavdjian@paperlesspost.com>

* tracing: Simplified creation of spans. (#2202)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed links to dashboards json files. (#2203)

Signed-off-by: Roman Grytskiv <roman.grytskiv@gmail.com>

* Skip deleting files that we just deleted (#2185)

* Skip deleting files that we just deleted

We see this happening with Swift. Because the consistency of swift is eventual, swift sometimes didn't process the deletion of the meta file yet, and so it turns up in the bkt.Iter(). The second deletion then causes a 404 and compaction fails.

Signed-off-by: Wim Fournier <github@fournier.nl>

* return, as this is a func. Add debug log and comment

Signed-off-by: Wim Fournier <github@fournier.nl>

* fixing build: wrong parameter name

Signed-off-by: Wim Fournier <github@fournier.nl>

* fix lint

Signed-off-by: Wim Fournier <github@fournier.nl>

* Refactor deleteDir into deleteDirRec and add a parameter for a function that allows to keep certain files.

Signed-off-by: Wim Fournier <github@fournier.nl>

* Fix lint

Signed-off-by: Wim Fournier <github@fournier.nl>

* implementing suggested fixes

Signed-off-by: Wim Fournier <github@fournier.nl>

* improve web.route-prefix handling (#2208)

This makes the handling of web.route-prefix more similar to the
behavior in Prometheus.  Correctly handles '/' and prefixes which
do not begin with a '/'.

Signed-off-by: Paul Gier <pgier@redhat.com>

* Merge release-0.11 back into master (#2212)

* Create release v0.11.0-rc.0 (#2156)

* Update version to v0.11.0-rc.0

* Update CHANGELOG with all PRs for v0.11

* Improve CHANGELOG by being more explicit

* Bumped minio-go library to v6.0.49, fixing an IAM bug in v6.0.45 (#2189)

Signed-off-by: Kraig Amador <kraig.amador@ticketmaster.com>

* Create release candidate  v0.11.0-rc.1 (#2192)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Release v0.11.0 (#2205)

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Update VERSION to 0.12.0-dev

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Resolve go.sum merge conflict and run go mod tidy

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

Co-authored-by: Kraig Amador <508403+bigkraig@users.noreply.github.com>

* returns error messages when trigger reload with http (#1848)

* returns error messages when trigger reload with http

Signed-off-by: arthur yang <yang_yapo@126.com>

* use simple reloadRules function instead of magic chan error error

Signed-off-by: yapo.yang <yang_yapo@126.com>

* add tailing period for comment

Signed-off-by: yapo.yang <yang_yapo@126.com>

* fix comment

Signed-off-by: arthur yang <yang_yapo@126.com>

* add white space for better code reading

Signed-off-by: arthur yang <yang_yapo@126.com>

* collect thanos rule metrics into one struct

Signed-off-by: arthur yang <yang_yapo@126.com>

* remove termination logic and keep log only

Signed-off-by: arthur yang <yang_yapo@126.com>

* update changelog for #1848

Signed-off-by: arthur yang <yang_yapo@126.com>

* add tailing period

Signed-off-by: arthur yang <yang_yapo@126.com>

* check whether registry is nil

Signed-off-by: arthur yang <yang_yapo@126.com>

* tailing period in metrics

Signed-off-by: arthur yang <yang_yapo@126.com>

* cancel with context

Signed-off-by: arthur yang <yang_yapo@126.com>

* return ctx.Err() instead of errors.New

Signed-off-by: arthur yang <yang_yapo@126.com>

* register thanos rule metrics with promauto

Signed-off-by: arthur yang <yang_yapo@126.com>

* return errs before set success related metrics

Signed-off-by: arthur yang <yang_yapo@126.com>

* revert go.sum go.mod change

Signed-off-by: arthur yang <yang_yapo@126.com>

* reload webhandler/sighup in one for loop

Signed-off-by: arthur yang <yang_yapo@126.com>

* reload with chan chan error

Signed-off-by: yapo.yang <yang_yapo@126.com>

* Fix error in component status help message (#2216)

Signed-off-by: mcsammac

Date:      Wed Mar 4 13:50:17 2020 -0500
On branch master
Changes to be committed:
	modified:   pkg/prober/intrumentation.go

Signed-off-by: s320009 <sam.mcadams@8451.com>

* tutorials: fix typo in image version (#2223)

Signed-off-by: Paul Gier <pgier@redhat.com>

* Blocked classic prometheus constructors, moved all to promauto; Removed unnecessary printfs. (#2228)

Fixes: https://github.com/thanos-io/thanos/issues/2102

Also blocked them on CI side, thanks to https://github.com/fatih/faillint/pull/8

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* ruler: Fix #2204 bug where alert queue is unpoppable causing full queue and dropped alerts (#2238)

* Add test for alert queue Pop after multiple Push

Signed-off-by: Robin Clarke-Williams <43950815+robincw-gr@users.noreply.github.com>

* Fix alert queue bug by resignal after Pop (#2204)

Signed-off-by: Robin Clarke-Williams <43950815+robincw-gr@users.noreply.github.com>

* Fix alert queue test and simplify

Signed-off-by: Robin Clarke-Williams <43950815+robincw-gr@users.noreply.github.com>

* Update CHANGELOG.md

Signed-off-by: Robin Clarke-Williams <43950815+robincw-gr@users.noreply.github.com>

* Link to thanos-io/thanos PR in CHANGELOG.md

Signed-off-by: Robin Clarke-Williams <43950815+robincw-gr@users.noreply.github.com>

* bucket: improve shard label handling (#2219)

Signed-off-by: Jacob Colvin <Jacob.Colvin@8451.com>

* fixing querier deployment kube manifest example 404 error (#2229)

Signed-off-by: Rajesh Rajendran <rjshrjndrn@gmail.com>

* *: Fix misuse of pkg/errors.Errorf and error directive (#2253)

* Fix pkg/errors error directive issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix misuse of Errorf

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix false metric name in Store GW e2e test (#2256)

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add scheme to the alertmanagers.url in ruler example (#2255)

Signed-off-by: gitlawr <lawrleegle@gmail.com>

* Sort chunks by thanos.downsample.resolution for better grouping (#2231)

Signed-off-by: Paul Traylor <paul.traylor@linecorp.com>

* Remove duplicate log.level arg in quickstart.sh (#2148)

Signed-off-by: Richard Poole <richard.poole@cudoventures.com>

* tutorials: fix incorrect query (#2239)

You would have to query `prometheus_tsdb_head_series` instead of `sum(prometheus_tsdb_head_series)` in order to get the 5 results when deduplicating.

Signed-off-by: John Chen <johnchen456@gmail.com>

* Use new go jsonnet formatter (#2258)

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* docs: Document Thanos Sharding (#1922)

* docs: Document Thanos Sharding

Signed-off-by: Xiang Dai <764524258@qq.com>

* Add time partitioning

Signed-off-by: Xiang Dai <764524258@qq.com>

* feedback

Signed-off-by: Xiang Dai <764524258@qq.com>

* Sharding: document supported relabel action and add store gateway backgroud (#2272)

* Sharding: document supported relabel action and add store gateway background

Signed-off-by: Xiang Dai <764524258@qq.com>

* add hashmod

Signed-off-by: Xiang Dai <764524258@qq.com>

* Add wait-interval flag (#2265)

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* store: Optimized labels conversion on store.Series; Added unsafe labels conversion. (#2230)

## Changes

* method TranslateLables CPU Optimized (streamed sorting).
* All store GW label conversation to []storepb.Label are now alloc-less.

```
go test -bench=BenchmarkUnsafeVSSafeLabelsConversion -run=^$ -benchmem -timeout 2h -benchtime 10s ./pkg/store/storepb/...
 goos: linux
 goarch: amd64
 pkg: github.com/thanos-io/thanos/pkg/store/storepb
 BenchmarkUnsafeVSSafeLabelsConversion/safe-12         	   34822	    339076 ns/op	  655368 B/op	       2 allocs/op
 BenchmarkUnsafeVSSafeLabelsConversion/unsafe-12       	1000000000	         2.32 ns/op	       0 B/op	       0 allocs/op
PASS
```

TODO: Do the same on Querier.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* fix: Ignore the OS-X Trash (#2274)

Signed-off-by: kushthedude <kushthedude@gmail.com>

* docs/sharding.md: fix a typo (#2273)

Signed-off-by: Xiang Dai <764524258@qq.com>

* fix replicate duplicate metrics (#2254)

Signed-off-by: yeya24 <yb532204897@gmail.com>

* Document downsample component (#2090)

* scripts/genflagdocs.sh: Generate downsample flag

Signed-off-by: Xiang Dai <764524258@qq.com>

* Document downsample component

Signed-off-by: Xiang Dai <764524258@qq.com>

* Move downsample as bucket sub-command

Signed-off-by: Xiang Dai <764524258@qq.com>

* update docs

Signed-off-by: Xiang Dai <764524258@qq.com>

* feedback

Signed-off-by: Xiang Dai <764524258@qq.com>

* Crashing error messages now will print stacktrace. (#2277)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Downsample: update changelog (#2278)

* Downsample: update changelog

Signed-off-by: Xiang Dai <764524258@qq.com>

* feedback

Signed-off-by: Xiang Dai <764524258@qq.com>

* thanos-mixin: clear units/axis (#2279)

* thanos-mixin: clear units/axis

Signed-off-by: Xiang Dai <764524258@qq.com>

* fix nits

Signed-off-by: Xiang Dai <764524258@qq.com>

* store, compact, bucket: Delay deletes by scheduling block deletion with deletion-mark.json file (#2136)

Signed-off-by: khyatisoneji <khyatisoneji5@gmail.com>

* Use maxInt instead of math.MaxInt64 (#2268)

math.MaxInt64 doesn't work on 32-bit systems (like linux/arm builds)

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Replace objstore.Exists function calls with bkt.Exists (#2284)

Signed-off-by: khyatisoneji <khyatisoneji5@gmail.com>

* Added Xiang to Triage Role. (#2289)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Enrich Memcached client logs (#2292)

* Enrich Memcached client logs

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Update pkg/cacheutil/memcached_client.go

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>

* Update pkg/cacheutil/memcached_client.go

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added Kemal to Triage Role. (#2293)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* bucket: handle instances where no blocks are loaded (#2271)

* bucket: handle instances where no blocks are loaded

Signed-off-by: Jacob Colvin <Jacob.Colvin@8451.com>

* bucket: reject all falsy label values

Signed-off-by: Jacob Colvin <Jacob.Colvin@8451.com>

* bucket: update changelog

Signed-off-by: Jacob Colvin <Jacob.Colvin@8451.com>

* docs/sharding.md: Replace example floating link with permalink (#2296)

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Added latest release badge. (#2300)

I think there are NOT enough badges, so added one more!

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* store: Postings fetching optimizations (#2294)

* Avoid fetching duplicate keys.
Simplified groups with add/remove keys.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added shortcuts

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Optimize away fetching of ALL postings, if possible.
Only remove postings for each key once.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Don't do individual index.Without, but merge them first.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Don't use map for fetching postings, but return slice instead.

This is in line with original code. Using a map was nicer,
but more expensive in terms of allocations and hashing
labels.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Renamed 'all' to 'allRequested'.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Typo

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Make linter happy.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added comment to fetchPostings.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Group vars

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comments

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use allPostings and emptyPostings variables for special cases.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Unify terminology to "special All postings"

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Address feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added CHANGELOG.md entry.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix check for empty group.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Special All postings is now added as a new group

No special handling required anymore.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Updated comment

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* cmd/thanos/receive: Remove unused TLSClientConfig from Options (#2299)

Signed-off-by: mrIncompetent <henrik@henrik-schmidt.de>

* compactor: Add ReplicaLabelRemover as MetaFetcher filter to enable offline vertical compaction/deduplication for replicated data (#2250)

* Create ReplicaLabelsFilter to allow for offline deduplication

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Start adding a e2e test for offline-deduplication with Thanos compact

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

* Address issues that have discovered after review

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix e2e test service issue

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Improve fetcher unit tests

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add simple compactor e2e tests with replica remover

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Remove unnecessary interface

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add more test cases

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Improve and stabilize e2e tests

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Increase ruler sd refresh interval

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Address review issues

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Separate filters and modifiers

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

Co-authored-by: Matthias Loibl <mail@matthiasloibl.com>

* docs/release: squat to release v0.12.0 (#2312)

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* cmd/thanos/receive: Serve TLS when TLSConfig is given (#2311)

Signed-off-by: mrIncompetent <henrik@henrik-schmidt.de>
Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

Co-authored-by: mrIncompetent <henrik@henrik-schmidt.de>

* cmd/thanos/compact: add bucket UI (#1714)

This commit enhances the compact component so that it runs the bucket UI
whenever the --wait flag is also passed. In order to reduce the overhead
of running the UI in addition to the compactor, this commit also
refactors the compactor and bucket commands a bit in order to re-use a
single meta fetcher.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* reloadRules initlialization should fail (#2301)

Signed-off-by: arthur yang <yang_yapo@126.com>

* Fixed inconsistent metrics and methods (#2319)

Signed-off-by: jojohappy <sarahdj0917@gmail.com>

* e2e: Refactored compactor test; Fixed flakiness. (#2313)

Also:

* Reduced number of services for e2e for latency
* Fixed halting
* Improved logging.
* Improved test cases (e.g added test for compaction and halting)


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* pkg/store: Report no data if no stores discovered (#2310)

* pkg/store: Report no data if no stores discovered

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* CHANGELOG.md: Add timespan reported on empty stores

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Added max_item_size to Memcached client (#2304)

* Added max_item_size to Memcached client

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Changed imports order and splitted tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed type casting

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Changed imports grouping

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Changed memcached max_item_size default from 0 to 1MB

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Increased e2e tests timeout

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed typo in CHANGELOG

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Reverted Makefile changes

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* tesutil: Enchanced testutil, refactored for our needs. (#2325)

Changed LICENSE as we no longer use version we copied back then.
Most of it was reimplemented.

Why?
* Much richer diff (inspired by testify packages
* Consistent API
* Less indentation.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* make, ci: Check example alerts and rules in CI (#2318)

* Check example alerts and rules in CI

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Add require clean tree

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fix latency alerts (#2316)

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Fixed e2e. (#2327)

Sorry, was late when we merged the fix. Funny bug: It would start to fail exactly 12h AFTER 25.03 8:00 GMT

Should be fine now... and in future until changed ;p

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* store: added option to reencode and compress postings before storing them to the cache (#2297)

* Added "diff+varint+snappy" codec for postings.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added option to reencode and compress postings stored in cache

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Expose enablePostingsCompression flag as CLI parameter.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use "github.com/pkg/errors" instead of "errors" package.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* remove break

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed empty branch

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added copyright headers.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added CHANGELOG.md entry

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added comments.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use Encbuf and Decbuf.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix comments in test file.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Another comment...

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed diffVarintSnappyEncode function.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment on usage with in-memory cache.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* var block

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed extra comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Move comment to error message.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Separated snappy compression and postings reencoding into two functions.
There is now header only for snappy-compressed postings.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added comment on using diff+varint+snappy.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Shorten header

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Lint...

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Changed experimental.enable-postings-compression to experimental.enable-index-cache-postings-compression

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added metrics for postings compression

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added metrics for postings decompression

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Reorder metrics

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use encode/decode labels.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* mixin: Make alert threshold values parametric (#2317)

* Make alert threshold values parametric

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Rename variable

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Adjsut default values for latency thresholds

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* Update UW logo (#2329)

Signed-off-by: Povilas Versockas <p.versockas@gmail.com>

* block fetcher with errgroup (#2309)

* block fetcher with errgroup

Signed-off-by: arthur yang <yang_yapo@126.com>

* errorgroup goroutine defer close

Signed-off-by: arthur yang <yang_yapo@126.com>

* website: fix 404 on root of sections (#2328)

Signed-off-by: Prem Kumar <prmsrswt@gmail.com>

* Add mallgroup.com to adopters (#2331)

Signed-off-by: Daniel Rataj <daniel.rataj@mall.cz>

Co-authored-by: Daniel Rataj <daniel.rataj@mall.cz>

* store: Binary index header is now production ready and enabled by default (#2330)

* store: Binary index header is now production ready and enabled by default.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed typo.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Add leboncoin company as adopter (#2333)

Signed-off-by: Guillaume Chenuet <guillaume.chenuet@adevinta.com>

* website: Collapsible menu sections (#2336)

* website: make sidemenu collapsed by default

Signed-off-by: Prem Kumar <prmsrswt@gmail.com>

* website: add caret svg in expandble sidemenu

Signed-off-by: Prem Kumar <prmsrswt@gmail.com>

* website: expand current section's sidemenu by default

Signed-off-by: Prem Kumar <prmsrswt@gmail.com>

* ui: fix store never removed from /stores page bug (#2339)

* ui: fix store never removed from /stores page bug

We need to update `LastCheck` only if the error is non-nil. That field
is used in the cleanup function to know when to remove the StoreAPI from
the UI. If we always update it, even if an error has happened, that
means that `--store.unhealthy-timeout` is never respected.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* query: fix storeset Update() test

Now let's start with a proper state where LastCheck is not 0 at the
beginning and we have 2 active stores, 3 store statuses just like the
original author had intended.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* fix typo in readme (#2342)

data -> date

Signed-off-by: afirth <afirth@users.noreply.github.com>

* query: add --store-strict flag (#2337)

* query: add --store-strict flag

Add a new flag called `--store-strict` as agreed per
https://thanos.io/proposals/202001_thanos_query_health_handling.md/

I have updated the proposal to reflect the reality.

Third time's the charm, I believe it :-)

Now the flag is called `--store-strict` which only accepts statically
defined nodes. I guess the code is even simpler now.

I have also fixed one small issue where `%w` was used in
`errors.Errorf`. Couldn't compile Thanos locally with Go 1.14 without
this fix.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* CHANGELOG: fix changelog item

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* Register grpc prometheus middleware metrics (#2347)

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* website: Enabled two scripts to fix Google analytics. (#2346)

* website: Enabled two scripts to fix Google analytics.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed also inline style.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added Workfront as adopter (#2351)

Signed-off-by: Ryan Orth <ryanorth@workfront.com>

Co-authored-by: Ryan Orth <ryanorth@workfront.com>

* compact: Fixed minor logging issues. (#2353)

Fixes: https://github.com/thanos-io/thanos/issues/2322

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* fetcher: Made metaFetcher go routine safe; Fixed multiple bucket UI + fetcher issues. (#2354)

Fixed https://github.com/thanos-io/thanos/issues/2349
Fixed races (we were reusing fetcher by both bucket UI and compaction syncs...
Fixed logging
Added singleflight to ensure we don't synchronize too often.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* test/e2e: Add timestamp to e2e test log output (#2358)

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* store & compact: For components that operates on blocks - expose the UI on /loaded-blocks (#2357)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* rule: fix query addr parsing (#2288)

* rule: fix query addr parsing

Signed-off-by: Tobiasz Heller <tobiaszheller@gmail.com>

* CR: support different schemas

Signed-off-by: Tobiasz Heller <tobiaszheller@gmail.com>

* CR: docs and err

Signed-off-by: Tobiasz Heller <tobiaszheller@gmail.com>

* CR: improve error handling and more TC

Signed-off-by: Tobiasz Heller <tobiaszheller@gmail.com>

* mixin: Remove unused jobPrefix field (#2364)

Signed-off-by: Lili Cosic <cosiclili@gmail.com>

* Create release v0.12.0-rc.0 (#2360)

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* Allow more connection reuse than the default of 2 (#2343)

Signed-off-by: Jakob Kartschall <j.kartschall@syseleven.de>

* Makefile: ignore GCS in CI (#2368)

We got booted from the GCS account, so skip this in CI for now.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* Revert "Makefile: ignore GCS in CI (#2368)" (#2373)

This reverts commit 8591434856ced5803e399b4d9d1bf2d1459c0ee0.

* mixin: Added critical Rules alerts. (#2374)

* mixin: Added critical Rules alerts.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* mixin: Made sure Rule alerts are not firing if one replica is failing. (#2375)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Update S3 endpoint mapping link (#2377)

The link for the AWS Region Endpoint Mappings for S3 was out of date, this PR updates it to point to the new location.

Signed-off-by: João Carvalho <joaopecarvalho@gmail.com>

* Fix2213 0.12 (#2382)

* binaryHeader: Fixed partial write issue for index-header.

Fixes https://github.com/thanos-io/thanos/issues/2213

This caused was indicated as regression of latency, and also causes potential critical issue
for store GW, where manual delete of index-header from local storage was required.

This might be considered as blocker for 0.12, so it would be worth to port it to 0.12 TBH @squat.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* binary_reader: ensure fs is synced before renaming

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>

* objstore: Added WithExpectedErrs which allows to control instrumentation (e.g not increment failures for expected not found) (#2383)

* objstore: Added WithExpectedErrs to Reader which allows to control instrumentation (e.g not increment failures for expected not found).

This allows to not wake up oncall in the middle of night, becuase of expeced, properly handled case (:

Also: Has to move inmem to objstore for testing.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* pkg/objstore: fix NewBucket comments.

This commit fixes the documentation comments for the NewBucket funcs.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>

* pkge/receive: trace TSDB ingestion (#2384)

This commit adds a tracing span around the writing of remote-write
requests into TSDB. This will help us differentiate between the
latencies in the forwarding of requests around the hashring and the
latencies of appending to the database.

This commit also removes the `thanos_` prefix from the forwarding span
to better align with the span naming in the rest of the project.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* compact: Made MarkForDeletion less strict; Added more debugability to block deletion logic, made meta sync explicit. (#2385)

Also:

* Changed order: Now BestEffortCleanAbortedPartialUploads is before DeleteMarkedBlocks.
* Increment markedForDeletion counter only when we actually uploaded block.
* Fixed logging issues.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Compactor: Document reasons and solutions about overlaps (#2191)

* troubleshooting.md: document overlaps

Signed-off-by: Xiang Dai <764524258@qq.com>

* feedback

Signed-off-by: Xiang Dai <764524258@qq.com>

* feedback

Signed-off-by: Xiang Dai <764524258@qq.com>

* add reminder label to stale bot config (#2378)

Signed-off-by: yeya24 <yb532204897@gmail.com>

* fix sharding docs style; fix promtail link (#2379)

Signed-off-by: yeya24 <yb532204897@gmail.com>

* store: Fixed binary header bug that was causing all postings to be kept in memory instead of 1/32 as we meant. (#2390)

* store: Fixed binary header bug that was causing all postings to be kept in memory instead of 1/32 as we meant.

Spotted by @mkabischev! Thanks to you and @d-ulyanov as well! Epic finding +1


Test output before fix:
					testutil.Equals(t, 1, br.version)
					testutil.Equals(t, 2, br.indexVersion)
					testutil.Equals(t, &BinaryTOC{Symbols: headerLen, PostingsOffsetTable: 66}, br.toc)
					testutil.Equals(t, int64(626), br.indexLastPostingEnd)
					testutil.Equals(t, 8, br.symbols.Size())
					testutil.Equals(t, map[string]*postingValueOffsets{
						"": {
							offsets:       []postingOffset{{value: "", tableOff: 4}},
							lastValOffset: 392,
						},
						"a": {
							offsets: []postingOffset{
								{value: "1", tableOff: 9},
								{value: "11", tableOff: 16},
								{value: "12", tableOff: 24},
								{value: "2", tableOff: 32},
								{value: "3", tableOff: 39},
								{value: "4", tableOff: 46},
								{value: "5", tableOff: 53},
								{value: "6", tableOff: 60},
								{value: "7", tableOff: 67},
								{value: "8", tableOff: 74},
								{value: "9", tableOff: 81},
							},
							lastValOffset: 572,
						},
						"longer-string": {
							offsets:       []postingOffset{{value: "1", tableOff: 88}},
							lastValOffset: 622,
						},
					}, br.postings)
					testutil.Equals(t, 0, len(br.postingsV1))
					testutil.Equals(t, 2, len(br.nameSymbols))

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added CHANGELOG item.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed build errs.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Lucas comment.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* store: Fixed critical bug, when certain not-existing value queried was causing "invalid size" error. (#2393)

Reason why we could not reproduce it locally was that for most of non-existing value
we were lucky that buffer was still long enough and we could read and decode some (malformed) variadic type.
For certain rare cases, buffer was not long enough.

Fixed and spotted thanks to amazing @mkabischev!

* Added more regression tests for binary header.

Without the fix it fails with:
```
            header_test.go:154: header_test.go:154:

                	exp: range not found

                	got: get postings offset entry: invalid size
```

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* VERSION: cut v0.12.0-rc.1 (#2396)

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* mixin: Change critical rule alert to be symtom based (#2398)

This change makes the critical (typically paging) alert more symptom
based, rather than observing data written to disk. Additionally after
this change the alert will only fire if there are actually rules loaded.

Additionally to no rules loaded the previous alert was also prone to
rules that legitimately are not writing data.

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* scripts: Added grpcurl script useful for Thanos debugging. (#2403)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* bucket docs: fix "thanos downsample" remnant (#2409)

and follow formatting of the other bucket commands

Signed-off-by: John Belmonte <john@neggie.net>

* docs: Added Thanos Go style guide and some development tips. (#2359)

* docs: Added Thanos Go style guide and some development tips.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed comments; added TOC and image.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added more rules.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Grammarly fixes!

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* docs: Fixed table formatting for coding style guide. (#2421)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added extra check for sorting time Duration and int strings (#2416)

Signed-off-by: kadern0 <kaderno@gmail.com>

* docs: Added minor note to single rule. (#2422)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed TOC. (#2424)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* store dashboard: fix gRPC streamed detail panels (#2426)

Fixes #2425

Signed-off-by: John Belmonte <john@neggie.net>

* use bytes unit where appropriate on grafana dashboards (#2423)

Signed-off-by: John Belmonte <john@neggie.net>

* bucket verify: document that compactor should be disabled (#2418)

Signed-off-by: John Belmonte <john@neggie.net>

* docs: Fixed typo in coding guide. (#2427)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added Marco as Thanos Maintainer (#2428)

Also, reordered list alphabetically.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* store: proxy: fix queries never timing out bug (#2411)

* store: proxy: add test for deadlocking problem

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* store: proxy: add fix for timeouts

Checking here if the series context has ended is the correct fix here.
We want to check it because if any of the other Series() calls error out
then the context is canceled. So, it is equal to checking for errors
"downstream", in `mergedSeriesSet`.

Also, `handleErr()` here is the correct function to use because in such
a case we want to set `s.err` -- if `io.EOF` still hasn't been received
then it means that StoreAPI still has some data that it wants to send
but hasn't yet.

With this, the previously added test passes.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* docs: fixed typo in coding style guide (#2431)

Signed-off-by: Stephan Kirsten <vebis@gmx.net>

* docs/release-process: make shell command copyable (#2433)

In general, I think it is easier for users of guides when shell commands
are listed without a preceeding `$`, otherwise the commands cannot be
directly copied and pasted into a terminal.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* docs/contributing: clean up style guide grammar (#2432)

This commit makes some small grammar fixes to the coding style
guide.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* cut v0.12.0 (#2437)

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* .circleci: use consistent ci image tags (#2440)

We were not using the latest thanos-ci image tag for every part of the
CI pipeline: we were using 0.3.0 for tests but 0.2.0 for all builds.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* CHANGELOG.md: fix changelog

The changelog in the release-0.12 branch is correct, but somewhere in
the merge back into master, the changelog was mangled. This puts the
fixes in their correct places.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* store: proxy: fix queries never timing out bug (#2411) (#2443)

* store: proxy: add test for deadlocking problem

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* store: proxy: add fix for timeouts

Checking here if the series context has ended is the correct fix here.
We want to check it because if any of the other Series() calls error out
then the context is canceled. So, it is equal to checking for errors
"downstream", in `mergedSeriesSet`.

Also, `handleErr()` here is the correct function to use because in such
a case we want to set `s.err` -- if `io.EOF` still hasn't been received
then it means that StoreAPI still has some data that it wants to send
but hasn't yet.

With this, the previously added test passes.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* proposal: Added proposal for new Thanos component: Thanos Frontend. (#2434)

* proposal: Added proposal for new Thanos component: Thanos Frontend.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added more rationales for separate binary.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Marco comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed lucas comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Changed to approved.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Moved to query-frontend command.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed memcached client metrics initialization (#2446)

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* store: Added regex-set optimization to ExpandedPostings (#2450)

* Added regex-set optimization to ExpandedPostings

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed capitalization.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed unnecessary change.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Remove whitespace

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use testutil instead of testify.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added copyright header, from original Prometheus querier.go

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use Thanos copyright header. :facepalm:

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added · at the end of the sentence. :exploding_head:.

I will randomly add emojis and GitHub emoji markup to commit messages that fix frustrating checks like this one. And intentionally not break the line. Let's see how lint deals with that! Ha.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* docs/contributing: use Before for IsExpired (#2456)

Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com>

* cmd/thanos: clean gosimple S1039 (#2464)

Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com>

* docs: Update CONTRIBUTING.md with DCO (#2465)

* docs: Update CONTRIBUTING.md with DCO

Signed-off-by: ranjithkumar007 <ranjith.dakshana2015@gmail.com>

* Update CONTRIBUTING.md

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: ranjithkumar007 <ranjith.dakshana2015@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added tests to reproduce #2459. (#2462)

Related to: https://github.com/thanos-io/thanos/issues/2459

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added a page for documenting beginner issues (#2461)

* Added some documentation for beginner issues

Signed-off-by: Yash <yashrsharma44@gmail.com>

* Edited some lines

Signed-off-by: Yash <yashrsharma44@gmail.com>

* Update docs/operating/troubleshooting.md

Co-Authored-By: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Yash <yashrsharma44@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>

* pkg/block/fetcher: fix concurrent map usage (#2474)

Fixes: #2471

This commit fixes an issue where multiple goroutines in the block
fetcher filtering were concurrently accessing the same map. The
goroutines were concurrently writing AND reading to the shared metas
map. This commit guards this concurrent access by giving the
DeduplicateFilter struct a mutex.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* Reverted addition of deletion mark for partial uploads. (#2472)

Fixes https://github.com/thanos-io/thanos/issues/2459 (quick fix).

This keeps the logic from the 0.11.0 which was good enough.

Some improvement for future: https://github.com/thanos-io/thanos/issues/2470

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Remove optimizations for label=~".*" and label!~".*". (#2475)

* Remove optimizations for label=~".*" and label!~".*".

They are not correct.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* cut v0.12.1 (#2476)

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* fix thanos web route prefix register twice (#2489)

Signed-off-by: yeya24 <yb532204897@gmail.com>
Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

Co-authored-by: yeya24 <yb532204897@gmail.com>

* Do not lock DNS Provider.Address() while Resolve() is running (#2492)

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Compact: Update compact documentation to better clarify dedupeReplicaLabels. (#2481)

* Update compact documentation to better clarify dedupeReplicaLabels.

Signed-off-by: Johnathan Falk <johnathan.falk@gmail.com>

* Fix capitalization.

Signed-off-by: Johnathan Falk <johnathan.falk@gmail.com>

* Gracefully handle additional oneof fields in SeriesResponse (#2501)

* Gracefully handle additional oneof fields in SeriesResponse

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Removed unnecessary continue

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Updated CHANGELOG

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* fix typo (#2509)

Signed-off-by: arthur yang <yang_yapo@126.com>

* Adjust memcached operation buckets (#2504)

Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com>

* pkg/query: remove obsolete 'thanos_store_node_info' metric (#2505)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Add Community information (#2510)

* Add Community information

Signed-off-by: Povilas Versockas <p.versockas@gmail.com>

* Fixes after review

Signed-off-by: Povilas Versockas <p.versockas@gmail.com>

* Move to contributing menu

Signed-off-by: Povilas Versockas <p.versockas@gmail.com>

* Remove incompleteView field from fetcher response. (#2455)

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added hints support to store protobuf (#2502)

* Added hints support to store protobuf

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Updated CHANGELOG

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Reworded hints doc

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Removed hints_enabled from SeriesRequest

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Remove spurious newline after rebase

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Leveraging docker layer caching (#2508)

Signed-off-by: ankitjain28may <ankitjain28may77@gmail.com>

* add gofmt -s step to makefile and golangci (#2463)

* gofmt -s files

Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com>

* golangci: add gofmt to linters

Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com>

* makefile: add gofmt to format

Signed-off-by: Davor Kapsa <davor.kapsa@gmail.com>

* Update coding-style-guide.md (#2520)

make `doSomething` a function call.

Signed-off-by: Halil Kaskavalci <halil@kaskavalci.com>

* Let's be more nicer on stale things (: (#2517)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* docs/proposals/202003_thanos_rules_federation: initial commit (#2263)

Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com>

* cmd: Moved all no-service commands under new tools subcommand. (#2513)

This will allow better extensibility for future for non-bucket related tools we plan to add.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added hints support to BucketStore.Series() (#2516)

* Added hints support to BucketStore.Series()

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed goimport grouping

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added missing copyright

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Addressed review comments

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Exclude zoom.us from liche (because zoom.us response headers are over 4KB)

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* update uswitch logo and branding (#2529)

Signed-off-by: Joseph-Irving <joseph.irving500@gmail.com>

* *: add metrics to the reloader package (#2521)

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Added LocalStore and realistic data for querier counter reset bug. (#2522) (#2538)

* Added LocalStore and realistic data for querier counter reset bug.

Tries to reproduces: https://github.com/thanos-io/thanos/issues/2401

I would still merge as it is a great test, and allows us to quickly
check data provided by Ben.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed tsdbstore required component type.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed ineffectual set.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed liche.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed unknown store issue.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* docs: fixed broken links in documentation (#2540)

* fix tiny typo

Signed-off-by: Dan Potepa <dan@danpotepa.co.uk>

* fix link to example manifest files

Signed-off-by: Dan Potepa <dan@danpotepa.co.uk>

* fixed some broken links

Signed-off-by: Dan Potepa <dan@danpotepa.co.uk>

* Clear duplicateIDs at the beginning of Filter. (#2544)

* Clear duplicateIDs at the beginning of Filter.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Address review feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix whitespace noise.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* :whale: :neckbeard: :kick_scooter:

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* cmd: rule: do not wrap reload endpoint with prefix twice (#2533)

* cmd: rule: do not wrap reload endpoint with '/'

Do not wrap the router with `/` on the `/-/reload` endpoint. Otherwise,
it is inaccessible when no prefix has been specified by the user.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* CHANGELOG: update

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* e2e: rule: add test for reloading rules via /-/reload

Add a test-case to the e2e tests for testing whether reloading rules via
/-/reload works.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* VERSION: cut release v0.12.2 (#2545)

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* ui: bump jQuery version to v3.5.0 (#2549)

Signed-off-by: Prem Kumar <prmsrswt@gmail.com>

* Bumped minio-go library to v6.0.53 (#2536)

* Bumped minio-go library to v6.0.53

Signed-off-by: alicek106 <alice_k106@naver.com>

* Updated CHANGELOG with PR

Signed-off-by: alicek106 <alice_k106@naver.com>

* Add deleteSeries skeleton to return bad request (#2530)

Signed-off-by: darshanime <deathbullet@gmail.com>

* Revert "Add deleteSeries skeleton to return bad request (#2530)" (#2551)

This reverts commit d0bcbff8375b6384292533ffa84b6408b85b0acb.

* Fixed the timezone url (#2553)

Signed-off-by: Yash <yashrsharma44@gmail.com>

* Updated to golang v1.14.2 (#2194)

* Update golang:1.14.2

Signed-off-by: Raúl Naveiras <me@raulnaveiras.com>

* Update thanos-ci:go1.14.2-node

It requires a manual process to generate and push this container.

```
make docker-ci DOCKER_CI_TAG=go1.14.2-node
```

Signed-off-by: Raúl Naveiras <me@raulnaveiras.com>

* Update golang:1.14.2 for github actions

Signed-off-by: Raúl Naveiras <me@raulnaveiras.com>

* Update CHANGELOG

Signed-off-by: Raúl Naveiras <me@raulnaveiras.com>

* Fix yaml indentation

Signed-off-by: Raúl Naveiras <me@raulnaveiras.com>

* Added Bartek as next release shepherd. (#2556)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* receive: Add support for TSDB per tenant (#2012)

* receive: Add support for TSDB per tenant

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* pkg/store: Merge SeriesSets of multiple TSDB stores

This is required as the Series gRPC method of the StoreAPI requires the
Series returned to be sorted.

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* pkg/receive: Add multitsdb shipper support

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Address comments

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Add more comments on types and functions

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* pkg/store/multitsdb.go: Remove unused struct field

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* pkg/receive/multitsdb.go: Remove unused Close method

TSDBs are implicitly closed by flushing the database, which is ensured
on shutdown, hence there is no need to have the explicit close method.

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* pkg/store/multitsdb.go: Make errors and warnings tenant aware

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* pkg/store/multitsdb.go: Consistent tenant aware errors and warnings

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* cmd/thanos/receive.go: Auto migrate legacy to multitsdb disk layout (#2557)

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Merge 0.12 into master (#2559)

* Clear duplicateIDs at the beginning of Filter. (#2544)

* Clear duplicateIDs at the beginning of Filter.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Address review feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix whitespace noise.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* :whale: :neckbeard: :kick_scooter:

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* cmd: rule: do not wrap reload endpoint with prefix twice (#2533)

* cmd: rule: do not wrap reload endpoint with '/'

Do not wrap the router with `/` on the `/-/reload` endpoint. Otherwise,
it is inaccessible when no prefix has been specified by the user.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* CHANGELOG: update

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* e2e: rule: add test for reloading rules via /-/reload

Add a test-case to the e2e tests for testing whether reloading rules via
/-/reload works.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* VERSION: cut release v0.12.2 (#2545)

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

Co-authored-by: Peter Štibraný <peter.stibrany@grafana.com>
Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* Revert "Merge 0.12 into master (#2559)" (#2560)

This reverts commit 003d245282bd683826304d25d1719c39d7401629.

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* querier: Added regressions tests for counter missed reset bug. (#2528)

* querier: Added regressions tests for counter missed bug.

PR with just tests, not fix yet.

Reproduces: https://github.com/thanos-io/thanos/issues/2401

* Added regressions tests for CounterSeriesIterator; Simplified aggregators.
* Fixes edge dedup cases for Next and added tests for deduplication.
* Refactored downsampling tests, added more realistic cases.
* Added check for duplicated chunks during downsampling.
* Removed duplicates for efficiency on promSeriesSet.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Giedrius comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* receive: Use read locks where possible to read tenants (#2563)

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* receive: Block WAL replay when starting receive component (#2564)

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* docs: Added mention about thanos-remote-read integration. (#2566)

Thanks to G-Research as per: https://cloud-native.slack.com/archives/CL25937SP/p1588687640060200?thread_ts=1588167992.463800&cid=CL25937SP

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* query/storeset: do not close the connection if strict mode enabled (#2568)

* query/storeset: do not close the connection if strict mode enabled

Do not close the gRPC connection if establishing a connection has
succeeded but we have failed to get response to a Info() call. Without
this and with strict mode in such a case, we will always keep around a
closed connection that won't work anymore unless the whole Thanos Query
process will be restarted.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* query/storeset: add test, add CHANGELOG item

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* Update gitignore with integration tests directory (#2552)

Signed-off-by: Ranjith Kumar <ranjith.dakshana2015@gmail.com>

* Fixed thanos_compact_garbage_collected_blocks_total metric help (#2572)

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Chunks caching at bucket level (#2532)

* Added generic cache interface.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added memcached implementation of Cache.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Chunks-caching bucket.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix sentences

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix sentences

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix sentences

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Rename config objects.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Review feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Review feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added metrics for object size.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added requested chunk bytes metric.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Caching bucket docs.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed tests.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix test.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Update docs/components/store.md
Update pkg/store/cache/caching_bucket.go

Co-authored-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Dots

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Always set lastBlockOffset.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Merged cached metric into fetched metric, added labels.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added CHANGELOG.md entry

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Reworded help for thanos_store_bucket_cache_fetched_chunk_bytes_total

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added tracing around getRangeChunkFile method.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Updated CHANGELOG.md

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Options

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix parameter name. (store. got dropped by accident)

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use embedded Bucket

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added comments.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Hide store.caching-bucket.config flags.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Renamed block to subrange.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Renamed block to subrange.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Header

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added TODO

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed TODO, in favor of creating issue.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use NopCloser.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

Co-authored-by: Marco Pracucci <marco@pracucci.com>

* Reword block deletion comments and logs in compactor (#2574)

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Coding Style typos and a few grammar improvements (#2448)

Changes mainly made for consistency, like section headers being in imperative tense: "do this thing" instead of "this is the thing"

Signed-off-by: Stephen Weber <stephen.t.weber@gmail.com>

* quickstart: fix bucket web after recent changes (#2580)

The subcommand is called now `tools bucket web` after the recent
changes.

Without this, the quickstart script outputs:
```
Error parsing commandline arguments: expected command but got "bucket"
thanos: error: expected command but got "bucket"
```

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* Fix typo on reload function (#2584)

Signed-off-by: Joel Bastos <kintoandar@gmail.com>

* Refactor of commands and flag parsing for sidecar (#2267)

Signed-off-by: Philip Gough <philip.p.gough@gmail.com>

* ui: add new React UI from Prometheus (#2412)

* ui: add React UI from upstream Prometheus

Signed-off-by: Adrien Fillon <adrien.fillon@gmail.com>

* ui: incorporate new changes from Prometheus React UI

Signed-off-by: Prem Kumar <prmsrswt@gmail.com>

* ui: adapted the React UI to Thanos

Signed-off-by: Prem Kumar <prmsrswt@gmail.com>

Co-authored-by: Adrien Fillon <adrien.fillon@gmail.com>
Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* Fix minor typos (#2586)

Signed-off-by: Pierre-Yves Aillet <pyaillet@gmail.com>

* react: update deps (#2589)

* react: graph/panel: revert changes temporarily

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* react-app: apply 'Update React vendoring'

Add the commit
https://github.com/prometheus/prometheus/commit/65a19421a42c69e16241eec24c66b98e4c8fa5da
via a 3-way merge.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* ui/react-app: update yarn deps

Should fix security warnings. Ported from
https://github.com/prometheus/prometheus/commit/24ecae995691dabf782a6b4a7464f7aab561b554.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* ui: update bindata

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* Makefile: remove --coverage from test run (#2591)

Found out that there is some weird interaction between `jest --coverage`
and `babel-plugin-istanbul`. Maybe related to:
https://github.com/facebook/jest/issues/6827.

From my testing, removing `--coverage` makes this work again. Probably
worth investigating in the future why that happens.

Also, this is really not needed during CI because we do not use the
coverage data anywhere anyway.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* ci: use GitHub Actions to test React UI (#2595)

* ci: test React UI using GitHub actions

Signed-off-by: Prem Kumar <prmsrswt@gmail.com>

* ci: remove react-app-test from CircleCI as we now use GH Actions

Signed-off-by: Prem Kumar <prmsrswt@gmail.com>

* pkg/ui: bump jQuery to 3.5.0 (#2597)

Signed-off-by: Lucas Servén Marín <lserven@gmail.com>

* Added receiver multidb unit tests for basic cases. (#2593)

Unfortunately, all passes. ):

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed make docs; Updated last disprepancies. (#2611)

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* mixin: Alert on receive not uploading recent data (#2612)

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Metadata caching in bucket (#2579)

* Added caching for Iter.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added cache for Exists call for meta-files.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added cache for reading block metadata files.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Make caching bucket configurable with different caches for different type of objects.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed tests.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added caching for ObjectSize. Enabled caching of index.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Lint feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use single set of metrics for all operations.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Constants.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use operation specific config. Generic configuration is only for user.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix typo, make lint happy.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Simplify constants.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Simplify caching configuration.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Refactor cache configuration.

Configuration is now passed to the cache when created.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Review feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix operationRequests and operationHits for getRange.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Make codec for Iter results configurable.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added header.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Renamed "dir" config to "blocks-iter".

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Bump default values for meta exists/doesntExist ttls.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed example how cache could be configured for index.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Address review feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Get now implements streaming reader, and buffers object in memory.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added test for partial read.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed unused function.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Updated the help message for --data-di…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants