-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
split query by time range #489
Comments
@BertHartm that sounds really interesting, and worth looking at. I was wondering what your experience is with scale. Do you think the timeouts you experience are bottlenecked by the # of timestamps, or by the age of the time range? Which provider are you using (prometheus, influxdb) and can they be improved/configured to do this kind of partitioning rather than moving it to a downstream component? Having those kinds of answers would help us make decisions like whether to slice up the requested ranges into buckets based on a configurable time window (regardless of granularity) versus size in # of timestamps, and whether we should configure any providers to enable it by default. One thing to note is that Trickster is meant to cache recent data, so, in your example, even though the query would be de-articulated upstream with this proposed feature, it would likely not be cached in trickster due to the age of the range, unless you have set a really long TTL. So you'd avoid timeouts, but likely not see a lot in the way of future acceleration. Would that still meet your use case? |
Thanks for the quick reply @jranson . We use prometheus and m3db and the main problem we're trying to solve is folks loading dashboards for long time windows, so we'll probably be bumping to the default limit on how long a window trickster caches to account for that. |
Hi @BertHartm and the rest of the Community. We are going to start implementing this feature in Trickster 2.0 and will use this issue to discuss and share progress. The main area where we'd like feedback is on the configurability of this feature - specifically, how does a team running Trickster tune the shard size to their liking via configuration. Since it is impossible for Trickster to know exactly how many series might be returned for a given query, I believe we need to focus on the number of timestamps that would be returned - which can be derived mathematically at request time from its step, start and end parameters. To that end, I believe a single configuration (e.g., |
Hey @jranson, We'd love to see this in 2.0. I agree on the controlling time instead of series or datapoints. I'd like to note that my primary concern is aligning this to the internal model of the database to optimize the fetches, so if it's possible to align based on actual utc time that might be the best way of configuring for us. I'm less familiar with things outside prometheus and m3db, but I believe the time range / block configuration aligns for all the prometheus based systems, and I would guess aligns with most tsdbs. I think it also makes reasoning as an end user easy if I can think about the configuration as multiples of the database block size (e.g. I'm okay hitting 4 2h blocks in a single hit, so set the proxy to split at 8h), as opposed to the number of timestamps and trying to factor in step size of queries in trying to optimize the system (making the above example something like 8h * 30s steps size being 960 timestamps in most cases, though it could be more than 8h if the step size is larger, which is probable when this feature is used to split up multi-day queries). Thoughts on that as an option? |
Thanks for the feedback, it's really helpful. So for your use case, i am gathering that 1) the number of unique timestamps per shard is not important and 2) when optimally configured, no two shards should request data from the same storage block. My thinking on the timestamp-based use case was that it can help with problems like Prometheus returning errors for queries that are too large, which I think there is also demand for. The good news is we should be able to implement all of these use cases, since they only determine how a list of time ranges might be further subdivided. Instead of So in that scenario, you could set Let me know if that makes sense, would meet your needs, and is an agreeable approach. |
Yeah, that actually would work well. Would it make sense to include |
Great, yes i will definitely include |
The only real (but minor) concern I'm seeing with this approach, in practice, is when |
seems reasonable. I was wondering if |
+1 to this approach, and I wonder if we could detail the use cases for |
@wesleyk the use case for points is the max 11,000 points-per-query limitation in the base Prometheus product. ML and other applications that need to get large amounts of prometheus data at a high resolution (e.g., 6 months @ 15s) must break apart those queries and reconstitute the data, manually, to avoid that limitation. In that case, an operator could set |
@jranson gotcha I may be confusing myself, but how would Also one thought re: naming. I wonder if "Range" makes more sense to use in place of "Step", given this has more to do with the time range that's being queried and not the step size of the results? |
One more comment: if we do split queries up, I imagine #291 may become more of a frequent issue? We noticed in the missing range handling that errors seem to be ignored. Not sure if they're handled elsewhere, but I imagine if we utilize similar logic with split queries, we'd more likely see this soft failure case occur more often. |
I have submitted PR #573 to implement the sharding capability. A few notes about how things turned out and responses to some of the thoughts above: I named the proposed @wesleyk as far as I am aware from my research and testing, the Prometheus 11K limitation pertains to unique timestamps (not unique timestamps * # of individual series), so this configuration should solve for that issue. To simplify the functionality, we did make Regarding the value names, we use "Range" to generally mean a timespan with absolute start and end times. In this case, the I did improve on the Missing Ranges issue just a bit in this patch: every failed shard should be reliably logged and counted in the metrics. My intention is to fully address #291 more directly in a separate patch, which will insert Warnings detailing the data integrity issues in the result set that is delivered to the client, whenever a dependent upstream request fails. This should be included in the next beta release alongside the sharding enhancement. The PR includes a new example config file for sharding, which sets |
Pretty much all users I work closely with prefer a failed request and to retry it, rather than have to guess whether the missing range is due to a partial failure or some other side effect (a lot of users assume the data is missing in storage). Most UIs don't properly show warnings the way you would like either, so I don't think a warning is sufficient, or rather I think it should at least be configurable to make it a hard failure to the end user in a partial failure case. |
Thanks, @robskillington. Just to be sure I understand fully. In a hypothetical situation where Trickster has 99% of data needed to serve a request in the cache already; if the delta proxy request to retrieve the remaining 1% fails, the originating client request should be returned a full failure? If that is the preferred route, it's great for us since that is the easiest implementation to support. Separately, we do have configurable backend retries (in the event of an origin request failure) on the list of features to provide as part of 2.0's new ALB. However, it does make sense to permit that capability for any backend provider and not just the ALB, so we'll make sure it is properly scoped. Let us know if those thoughts sound good, and if the implementation in the PR will meet your team's needs. Thanks again for the great feedback! |
This is the desired behavior, yeah. The natural next step there, IMO, would be to still populate the cache in the background with the proxy requests that did succeed. So if a request results in four sharded queries, with three succeeding and one failing, then the client will see a failure. On retry, the three successful queries would already have cached results, and the one previously failed query would be retried. |
Thanks @wesleyk. We will make sure that is part of the next beta release (and will be a separate PR from the sharding feature PR). As for the sharding issue, we are going to go ahead and merge #573 - not because we are finished with this issue, but because we renamed the org today from If there are any issues or concerns with the merged sharding implementation, report them here and we will modify it further to meet the community's needs. I will keep this issue open until we have consensus that the needed use cases are met and function as expected. |
Will do! Will ask the team to take a pass on the PR as well 🥳 |
It would be helpful if trickster could take a query for a long time range of data and have it become several queries for shorter time ranges of data.
Use case is for instances where querying say a months worth of data may time out, but querying for 1 day a month ago would succeed. By having trickster combine datapoints from several 1 day queries, it can increase the likelihood of overall success, and potentially lower the load on the backend.
The text was updated successfully, but these errors were encountered: