How to configure correctly for HA Thanos #520

Alexvianet · 2018-09-13T14:34:50Z

Can't understand examples in the documentation, what i have:

1)Prometheus 2 nodes different zone with Thanos sidecar
2)Grafana 2 nodes different zone
3) haproxy 2 nodes different zone for load Prometheus grafana ...
4) thanos store 1 node
5) thanos query 2 node different zone
6) thanos compact 1 node
7) S3 bucket as an object storage

prometheus, version 2.3.2 (branch: HEAD, revision: 71af5e29e815795e9dd14742ee7725682fa14b7b)
build user: root@5258e0bd9cc1
build date: 20180712-14:02:52
go version: go1.10.3

thanos, version 0.1.0-rc.2 (branch: HEAD, revision: 53e4d69)
build user: root@c7199d758b5e
build date: 20180705-12:54:50
go version: go1.10.3

What happened
level=debug ts=2018-09-13T14:17:22.530346817Z caller=cluster.go:278 component=cluster msg="refresh cluster done, peers joined" peers=127.0.0.1:10900 before=5 after=1
What you expected to happen
Need to understand what is the best practices of Thanos configuration for such infrastructure.
Want some real example of the multinode cluster of all Thanos components.

Did thanos metrics get automatically availble from query ?
I get thanos_cluster_members metric only when add thanos_query http adress to prometheus targets in
prometheus config.
How to reproduce it (as minimally and precisely as possible):

All s3 configuration added with export ...

./prometheus --storage.tsdb.no-lockfile --storage.tsdb.retention=1h

./thanos query --query.replica-label replica --log.level=debug --cluster.peers="127.0.0.1:10900"

./thanos sidecar --cluster.peers="thanos_query:10900"

./thanos store --tsdb.path=./store --cluster.peers="thanos_query:10900"

./thanos compacts --data-dir=./data

Environment:
CentOS Linux release 7.5.1804 (Core)
Linux 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

bwplotka · 2018-09-14T17:35:13Z

Can you provide some simple diagram? I am kind of confused where for example HA proxy is in your setup. (:

bwplotka · 2018-09-14T17:38:10Z

The key here is to avoid gossip (since it will be removed soon) - and just configure your thanos-queriers (both of them) with

--store=<thanos-sidecar>:<grpc-port>
--store=<thanos-sidecar2>:<grpc-port>
--store=<thanos-store>:<grpc-port>

And point grafana to thanos-query endpoint (behind HAproxy I guess I you want)

And that's it (: Gossip seems to be overkill here.

Alexvianet · 2018-09-16T21:32:59Z

Looks like this one:

ps: The key here is to avoid gossip (since it will be removed soon) <--- in new version thanos ?

bwplotka · 2018-09-16T22:41:37Z

Yup, see this: #493

So essentially static configuration is what you want. In future there will be DNS based discovery as well as FILE SD that will allow to make it more flexible.

Alexvianet · 2018-09-17T07:57:23Z

For test i have started for one node each component
I have configure thanos:

    exec thanos sidecar \
    --log.level="debug" \
    --prometheus.url="https://prometheus.net" \
    --http-address="0.0.0.0:19191" \
    --grpc-address="0.0.0.0:19091"  \
    --cluster.address="0.0.0.0:19391"   \
    --cluster.gossip-interval="5s"  \
    --cluster.pushpull-interval="5s" \
    --cluster.refresh-interval="1m0s" \
    --tsdb.path="/var/vcap/store/prometheus2" \
    --reloader.config-envsubst-file="/var/vcap/jobs/prometheus2/config/prometheus.yml

    exec thanos store \
    --tsdb.path="/var/vcap/store/thanos/store" \
    --log.level="debug" \
    --http-address="0.0.0.0:19193" \
    --grpc-address="0.0.0.0:19093"  \
    --cluster.address="0.0.0.0:19891"   \
    --cluster.gossip-interval="5s"  \
    --cluster.pushpull-interval="5s" \
    --cluster.refresh-interval="1m0s" \
    --index-cache-size="1GB" \
    --chunk-pool-size="2GB"

    exec thanos query \
    --log.level="debug" \
    --http-address="0.0.0.0:19192" \
    --grpc-address="0.0.0.0:19092"  \
    --cluster.address="0.0.0.0:19591" \
    --cluster.peers="0.0.0.0:19591"  \
    --cluster.gossip-interval="5s" \
    --cluster.pushpull-interval="5s" \
    --cluster.refresh-interval="1m0s" \
    --query.timeout=2m  --query.replica-label=thanos_query_replica  \
    --query.max-concurrent="20" \
    --store=<thanos-sidecar>:19091 \
    --store=<thanos-store>:19093

query logs:

level=info ts=2018-09-17T07:09:43.334407506Z caller=flags.go:53 msg="StoreAPI address that will be propagated through gossip" address=<thanos-query>:19092
level=info ts=2018-09-17T07:09:43.337044628Z caller=flags.go:68 msg="QueryAPI address that will be propagated through gossip" address=<thanos-query>:19192
level=info ts=2018-09-17T07:09:43.342363255Z caller=query.go:256 msg="starting query node"
level=info ts=2018-09-17T07:09:43.346142795Z caller=query.go:230 msg="Listening for query and metrics" address=0.0.0.0:19192
level=info ts=2018-09-17T07:09:43.346198456Z caller=query.go:248 component=query msg="Listening for StoreAPI gRPC" address=0.0.0.0:19092
level=info ts=2018-09-17T07:09:43.347591644Z caller=storeset.go:226 component=storeset msg="adding new store to query storeset" address=<thanos-sidecar>:19091
level=info ts=2018-09-17T07:09:43.347663034Z caller=storeset.go:226 component=storeset msg="adding new store to query storeset" address=<thanos-store>:19093
level=warn ts=2018-09-17T07:09:53.348700243Z caller=cluster.go:300 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-09-17T07:10:03.348689325Z caller=cluster.go:300 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"
level=warn ts=2018-09-17T07:10:13.348686534Z caller=cluster.go:300 component=cluster NumMembers=1 msg="I appear to be alone in the cluster"

and no query on web ui

bwplotka · 2018-09-17T10:00:58Z

Remove all peer flags configuration. Let's not duplicate discovery mechanisms.

level=info ts=2018-09-17T07:09:43.347591644Z caller=storeset.go:226 component=storeset msg="adding new store to query storeset" address=:19091
level=info ts=2018-09-17T07:09:43.347663034Z caller=storeset.go:226 component=storeset msg="adding new store to query storeset" address=:19093

This indicates that query has access (: so now is the question, do you have metric anywhere (: worth to check Prometheus UI (where sidecar is) if the metrics is there, sidecar logs, and make sure you have correct time range.

Alexvianet · 2018-09-17T16:27:29Z

thanks works now

thesaadarshad · 2019-11-22T15:54:12Z

hi @Alexvianet I have a question regarding your Thanos implementation if i may ask?

Alexvianet · 2019-11-22T16:40:25Z

all ok thanks

Alexvianet · 2019-11-22T23:02:53Z

please

…

On Fri, Nov 22, 2019, 17:54 saad. ***@***.***> wrote: hi @Alexvianet <https://github.com/Alexvianet> I have a question regarding your Thanos implementation if i may ask? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#520?email_source=notifications&email_token=AEFMORLGNQRR5W2EYOZKPYDQU76CLA5CNFSM4FU5RH42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE6BSOI#issuecomment-557586745>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFMORLC3ZPHE7Q5NCG7X73QU76CLANCNFSM4FU5RH4Q> .

thesaadarshad · 2019-11-23T01:35:32Z

my stack is somewhat similar to yours that is

Prometheus 2 nodes different zone with Thanos sidecar
Deployed as Docker container on Query Node
Thanos store 1 node
Thanos query 1 node (at the moment)
Thanos compact 1 node
S3 bucket as an object storage

My Question is about the 3rd, Store Node. would you suggest to deploy Store Node separately on a different node and what would be its HA?

Alexvianet · 2019-11-23T08:23:16Z

we use nginx for it with upstream

…

On Sat, Nov 23, 2019, 03:35 saad. ***@***.***> wrote: my stack is somewhat similar to yours that is 1. Prometheus 2 nodes different zone with Thanos sidecar 2. Deployed as Docker container on Query Node 3. Thanos store 1 node 4. Thanos query 1 node (at the moment) 5. Thanos compact 1 node 6. S3 bucket as an object storage My Question is about the 3rd, Store Node. would you suggest to deploy Store Node separately on a different node and what would be its HA? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#520?email_source=notifications&email_token=AEFMORNDLZYHOQZ5Q5JXXMDQVCCGJA5CNFSM4FU5RH42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7KBUA#issuecomment-557752528>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFMORPCYMIEYJ2TEGJJDETQVCCGJANCNFSM4FU5RH4Q> .

Alexvianet · 2019-11-23T08:27:20Z

also we have 2 datasources in grafana thanos and Prometheus, if we maintain thanos costumers ll be able to use Prometheus metrics, with 2days data rotation period

…

On Sat, Nov 23, 2019, 10:23 Alex G ***@***.***> wrote: we use nginx for it with upstream On Sat, Nov 23, 2019, 03:35 saad. ***@***.***> wrote: > my stack is somewhat similar to yours that is > > 1. Prometheus 2 nodes different zone with Thanos sidecar > 2. Deployed as Docker container on Query Node > 3. Thanos store 1 node > 4. Thanos query 1 node (at the moment) > 5. Thanos compact 1 node > 6. S3 bucket as an object storage > > My Question is about the 3rd, Store Node. would you suggest to deploy > Store Node separately on a different node and what would be its HA? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#520?email_source=notifications&email_token=AEFMORNDLZYHOQZ5Q5JXXMDQVCCGJA5CNFSM4FU5RH42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7KBUA#issuecomment-557752528>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AEFMORPCYMIEYJ2TEGJJDETQVCCGJANCNFSM4FU5RH4Q> > . >

thesaadarshad · 2019-11-23T12:38:42Z

makes sense but why would you want to query Prometheus directly. the low retention data stored in prometheus is queryable via Querier which automatically talks to sidecar and store at the same time? connecting Grafana directly to Querier also makes it work.

Alexvianet · 2019-11-23T17:15:02Z

just in case if thanos or s3 ll be unavailable

…

On Sat, Nov 23, 2019, 14:38 saad. ***@***.***> wrote: makes sense but why would you want to query Prometheus directly. the low retention data stored in prometheus is queryable via Querier which automatically talks to sidecar and store at the same time? connecting Grafana directly to Querier also makes it work. [image: image] <https://user-images.githubusercontent.com/1412770/69478776-0d197e80-0e18-11ea-959f-7308fea4ef88.png> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#520?email_source=notifications&email_token=AEFMORJW2N2CIQAI5KKGMODQVEP5FA5CNFSM4FU5RH42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7UJ2Q#issuecomment-557794538>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFMORLJNJWBKOXHW7STHTDQVEP5FANCNFSM4FU5RH4Q> .

thesaadarshad · 2019-11-23T20:17:08Z

makes sense. help me solve this another confusion.
did you deploy Store independently on a separate node? I'm still facing issues deploying it correctly.
tl;dr how did you connect the querier to store API to retrieve old data?
🙏

Alexvianet · 2019-11-23T20:32:45Z

yes, on separate node. get querier from grafana

…

On Sat, Nov 23, 2019, 22:17 saad. ***@***.***> wrote: makes sense. help me solve this another confusion. did you deploy Store independently on a separate node? I'm still facing issues deploying it correctly. tl;dr how did you connect the querier to store API to retrieve old data? 🙏 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#520?email_source=notifications&email_token=AEFMORIAYGVFVW6V57FWGVDQVGFULA5CNFSM4FU5RH42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE743HY#issuecomment-557829535>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFMORIEN7N46BB55LSWLADQVGFULANCNFSM4FU5RH4Q> .

thesaadarshad · 2019-11-23T21:22:06Z

but which nodes stores data on S3 Storage? surely not every instance would be uploading data into the bucket?

Alexvianet · 2019-11-23T23:54:49Z

thanos sidecar do that

…

On Sat, Nov 23, 2019, 23:22 saad. ***@***.***> wrote: but which nodes stores data on S3 Storage? surely not every instance would be uploading data into the bucket? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#520?email_source=notifications&email_token=AEFMORM4YV5HRBRBTYNOXRTQVGNH5A5CNFSM4FU5RH42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE755LI#issuecomment-557833901>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFMORJECRQXZSMB3ZHMFMTQVGNH5ANCNFSM4FU5RH4Q> .

thesaadarshad · 2019-11-24T00:47:35Z

but remember, Thanos Store and Thanos Querier are on different notes? and in nowhere we define in Querier where Thanos Store is?
pardon my ignorance but I'm a bit confused.

Alexvianet · 2019-11-24T08:35:10Z

Thanos Store and Thanos Querier are on different notes? yes

…

On Sun, Nov 24, 2019, 02:47 saad. ***@***.***> wrote: but remember, Thanos Store and Thanos Querier are on different notes? and in nowhere we define in Querier where Thanos Store is? pardon my ignorance but I'm a bit confused. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#520?email_source=notifications&email_token=AEFMORP23Q3AZ3M3HSY333DQVHFKRA5CNFSM4FU5RH42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFABBVY#issuecomment-557846743>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFMORNVZDI4NIPIZOS5CDTQVHFKRANCNFSM4FU5RH4Q> .

thesaadarshad · 2019-11-24T10:36:24Z

so how do they connect then? can you share your thanos store init params?

Alexvianet · 2019-11-24T10:37:35Z

#520 (comment) вс, 24 нояб. 2019 г. в 12:36, saad. <notifications@github.com>:

…

so how do they connect then? can you share your thanos store init params? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#520?email_source=notifications&email_token=AEFMORLEYOIXZ5WKUQ5LNM3QVJKKRA5CNFSM4FU5RH42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFAIHVI#issuecomment-557876181>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFMORKUFEVGBQXSF6UBAZ3QVJKKRANCNFSM4FU5RH4Q> .

Sunil777 · 2020-01-11T10:19:04Z

Hi,

Can anyone please share the complete setup details here... It totally confusing

Prometheus 2 nodes different zone with Thanos sidecar ..... (Sidecar running on same Prometheus host ?)

Deployed as Docker container on Query Node ........(Is this single node?)

Thanos store 1 node ........ (Is this single node?)

Thanos query 1 node (at the moment)......... (what is diff b/w Thanos query 1 node and Deployed as Docker container on Query Node)

Thanos compact 1 node .............(Is this single node?)
S3 bucket as an object storage

bwplotka added the question label Sep 14, 2018

Alexvianet closed this as completed Sep 17, 2018

Alexvianet mentioned this issue Sep 21, 2018

Thanos_query --store parameter bosh-prometheus/thanos-boshrelease#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to configure correctly for HA Thanos #520

How to configure correctly for HA Thanos #520

Alexvianet commented Sep 13, 2018 •

edited

Loading

bwplotka commented Sep 14, 2018

bwplotka commented Sep 14, 2018 •

edited

Loading

Alexvianet commented Sep 16, 2018

bwplotka commented Sep 16, 2018

Alexvianet commented Sep 17, 2018

bwplotka commented Sep 17, 2018

Alexvianet commented Sep 17, 2018

thesaadarshad commented Nov 22, 2019

Alexvianet commented Nov 22, 2019

Alexvianet commented Nov 22, 2019 via email

thesaadarshad commented Nov 23, 2019

Alexvianet commented Nov 23, 2019 via email

Alexvianet commented Nov 23, 2019 via email

thesaadarshad commented Nov 23, 2019

Alexvianet commented Nov 23, 2019 via email

thesaadarshad commented Nov 23, 2019

Alexvianet commented Nov 23, 2019 via email

thesaadarshad commented Nov 23, 2019

Alexvianet commented Nov 23, 2019 via email

thesaadarshad commented Nov 24, 2019

Alexvianet commented Nov 24, 2019 via email

thesaadarshad commented Nov 24, 2019

Alexvianet commented Nov 24, 2019 via email

Sunil777 commented Jan 11, 2020

How to configure correctly for HA Thanos #520

How to configure correctly for HA Thanos #520

Comments

Alexvianet commented Sep 13, 2018 • edited Loading

bwplotka commented Sep 14, 2018

bwplotka commented Sep 14, 2018 • edited Loading

Alexvianet commented Sep 16, 2018

bwplotka commented Sep 16, 2018

Alexvianet commented Sep 17, 2018

bwplotka commented Sep 17, 2018

Alexvianet commented Sep 17, 2018

thesaadarshad commented Nov 22, 2019

Alexvianet commented Nov 22, 2019

Alexvianet commented Nov 22, 2019 via email

thesaadarshad commented Nov 23, 2019

Alexvianet commented Nov 23, 2019 via email

Alexvianet commented Nov 23, 2019 via email

thesaadarshad commented Nov 23, 2019

Alexvianet commented Nov 23, 2019 via email

thesaadarshad commented Nov 23, 2019

Alexvianet commented Nov 23, 2019 via email

thesaadarshad commented Nov 23, 2019

Alexvianet commented Nov 23, 2019 via email

thesaadarshad commented Nov 24, 2019

Alexvianet commented Nov 24, 2019 via email

thesaadarshad commented Nov 24, 2019

Alexvianet commented Nov 24, 2019 via email

Sunil777 commented Jan 11, 2020

Alexvianet commented Sep 13, 2018 •

edited

Loading

bwplotka commented Sep 14, 2018 •

edited

Loading