Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruler: how about implements the Prometheus remote write feature into thanos ruler? #1724

Closed
GuyCheung opened this issue Nov 6, 2019 · 10 comments

Comments

@GuyCheung
Copy link

I want to use both Thanos receive and Thanos ruler components, but because Thanos ruler only store rules result in local and ship the data to remote storage like s3, the ruler node will be a single node with HA risk.

In my opinion, I think if Thanos ruler implements the remote write feature, we can write the ruler data to Thanos receive with few seconds delay, then Thanos ruler would be a stateless component (of course if we can accept minutes data loss maybe can reduce by rule failover).

@brancz
Copy link
Member

brancz commented Nov 6, 2019

The state of the ruler wouldn't disappear. The tsdb would continue to need to be a local persistent buffer. I don't think we would actually gain anything from this unfortunately.

@FUSAKLA
Copy link
Member

FUSAKLA commented Nov 8, 2019

You should run the thanos ruler twice an identical pair if you are looking for HA probably.

@GuyCheung
Copy link
Author

@brancz yes the state wouldn't disappear, but this will help to set ruler as more lighten component right? I think this situation similar to the relationship between Prometheus and Thanos receive, the Prometheus will not act as a fully stateless component but we can treat it stateless because of Thanos receive.

@GuyCheung
Copy link
Author

@FUSAKLA yes, I thought like this before, but there are some disadvantages I think:

  1. calc an identical pair twice or more will increase the system load.
  2. fan out load on Thanos query side? data will store on the ruler component and we have to add Thanos ruler in Thanos query.
  3. not simple enough for Thanos ruler, I think Thanos ruler focus on query -> exec -> write in the whole system, and Thanos receive and Thanos store act as query components, these sounds more clear, is it?

@FUSAKLA
Copy link
Member

FUSAKLA commented Nov 10, 2019

Yes, the HA always comes with some price. It will be calculated twice but it's similar to Prometheus HA where you do the scraping and evaluation twice as well. But I wouldn't say adding the ruler StoreAPI to the cluster is that much of a burden and even less if you use service discovery. Still, the remote write won't help you with HA.

But I don't see reason to not have the remote write in the ruler eventually. I agree it can make the component more lightweight same way as for a Prometheus with sidecar. But the benefits are now quite small IMHO.

@GuyCheung
Copy link
Author

I think the requirements should quite similar to why we create Thanos receive component.
Thanos had sidecar, query and store components before, these components also work well, and looks like already making Prometheus with distributing feature. but we create receive component now, I think it will be a little conflict with sidecar and query components, however, we had created it, and the concept looks like more clear and meaningful.

In my opinion, we can separate the components into several parts in the whole architecture as below:

  1. generate: Prometheus and ruler
  2. store: remote storage and compact
  3. read: receive and store

each part should focus on the main function is provided, meanwhile, the same function should keep unified and we can improve it continues. for example, write to remote storage should done by receive component, raw data (include collect from exporter and calc from rules) should delivery to receive component asap, hot data query will be served in receive but not any other part.

so here is why I want to ruler implements the remote write feature, I think it will make the system overview more clearly and keep each part simple.

@stale
Copy link

stale bot commented Jan 11, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 11, 2020
@brancz
Copy link
Member

brancz commented Jan 13, 2020

Running and scaling receive is somewhat involved I would hate for that kind of involvement to be necessary to use the component when people just use the sidecar approach and otherwise don’t need the receive component.

@stale stale bot removed the stale label Jan 13, 2020
@stale
Copy link

stale bot commented Feb 12, 2020

This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions.

@stale stale bot added the stale label Feb 12, 2020
@brancz
Copy link
Member

brancz commented Feb 19, 2020

For now I’m going to close this. We might rethink this at a later point, but at the moment we don’t feel this is the right strategy for the project. Thanks a lot for starting the discussion! :)

@stale stale bot removed the stale label Feb 19, 2020
@brancz brancz closed this as completed Feb 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants