-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Improve memory use for queries #1649
Comments
What about just.. chunk limit? It's similar to 2. but we operate on chunks rather then samples on storeAPI level really.
Mainly because of SeriesSet API PromQL requiring full series per series iteration. This is the same for StoreAPIs but you can imagine that with fanout, we need all results to merge (essentially MergeSort) them into series per series sorted by labels. Not much we can do here. On top of it deduplication is applied but it works on per series basis so that one is streaming technically. Also 8. Count allocations (roughly) per user (: |
Are we talking about memory accounting per query here? |
Did a super quick trial of |
Just to ensure it is transparent. The data we used for your tests @ppanyukov :
.. had many chunks with *single sample each (due to some misconfiguration in the initial version of thanosbench), so while they can repro OOM they are not really representative. (: Still, it is extremely useful to see those ideas. I aggregated an umbrella issue for those here: #1705 I copied most of your ideas there! Let's propose ideas one by one in the next issues (: Closing this in favor of #1705 |
This is a tracker issue for memory utilisation in Thanos to serve Queries and the steps we want to take to make things better.
Core of the issue:
What we want is:
Repro
Base build
Data
thanosbench blockgen
)Query
count({__name__=~".+"}) by (__name__)
Memory profile
Chart
List of ideas
Per-query limits and flow optimisations looks to be the way to go.
[1.]
Simple global memory limiter in Querier.Rejected. See #1631 .[1.1] Simple global memory limiter in SG? I've just realised that the OOM problem actually may be in SG that causes most pain, not in Querier. Worth considering?
[2.] Limit concurrent queries, max samples, and series(?). As per Cortex: #1631 (comment)
[3.] Frontend query caching. See #1006
[4.] Result streaming.
why does it not stream out the results? is it something to do with deduplication perhaps? aggregation?
but in
querier.go
, we see comment which we way want to investigate further?[5.] Prevent knowingly bad queries?
[6.] Forking.
Maybe crazy idea, but what if we fork process for each query with simple mem limits?
This could be a way to have per-query memory limits etc.
[7.] (@bwplotka) Count allocations (roughly) per user / query.
[8.]
ChopNot worth it, see #1667[tmin, tmax]
in Querier. Can we chop the requested[tmin, tmax]
into smaller (1h?) chunks[tmin, T1], [T1, T3], ... [Tn, tmax]
, request those from Store Gateway one by one, assemble results in Querier like we do now, and then send over to client? Is this similar to whatFrontend query caching
would do?[9.] (@bwplotka) can we loosen StoreAPI a bit to allow series being aligned differently? This means mean the returned values could be unordered thus opening up way to stream chunks.
[10.] Intern labels (
Labels []Label
). These probably consume tons of memory (strings). And there are probably lots of duplicate identical lists. Why keep them all in memory as distinct objects? Why don't we do this at the point close to proto unmarshal:possibly look at other things we can intern?
[11.] Related work in Prometheus codebase to unify labels package and which hopefully will allow/lead to fewer allocs (alloc-free casts etc). See prometheus/prometheus#6029
[12.] Optimise for special case of:
Querier -> 1x Store Gateway -> 1x S3 bucket
. This way we essentially can do straight streaming without having to merge the results, right? [What about dedup?][12.1] Taking [12] further and look at this case:
Querier -> 1x Store Gateway -> [1x S3 bucket + 1x Prometheus]
. Can we assert that for a lot of queries the data will be either in S3 or Prometheus thus allowing optimisation [12]?[12.2] Can we take this optmisation further and do smarter query partitioning and merging when required?
[999.] Other. Inventive use of OS/k8s/other products which may make queries better.
The text was updated successfully, but these errors were encountered: