Proxy: memory bloat caused by fmt.Sprintf #6198

fpetkovski · 2023-03-09T09:29:46Z

I was looking at a querier profile and noticed that fmt.Sprintf inside the proxy uses 15% of total querier memory. This querier instance is handling alerting rulers and has a high RPS.

It's probably worth replacing dynamic messages with constant errors.

The text was updated successfully, but these errors were encountered:

fpetkovski · 2023-03-09T09:32:00Z

CPU profile looks even worse, fmt.Sprintf is ~30% of CPU usage.

douglascamata · 2023-03-16T11:35:02Z

@fpetkovski was the Querier in debug logging level?

The memory profile in the first picture shows to me that most of the memory used by fmt.Sprintf comes from endpointRef.String, that is called at

thanos/pkg/store/proxy.go

Line 276 in 6e28411

    
           storeDebugMsgs = append(storeDebugMsgs, fmt.Sprintf("store %s filtered out: %v", st, reason))

bwplotka · 2023-03-16T13:34:00Z

Yea, good finding! We should definitely ONLY append store debug msgs when level debug is enabled. I wished logger gave us this info 🙃

However, i don't expect it to affect bigger queries, it's only for NOP queries it's significant CPU overhead, which is kind of fine - it's not hot path then. Still low hanging fruit. 👍🏽

fpetkovski · 2023-03-16T13:55:44Z

Yeah I think this is because we generate debug messages even without debug mode. We use alerting scoped per tenant so we have lots of excluded stores for alerts.

douglascamata · 2023-03-16T14:20:50Z

An option might be also to change endpointRef.String() and remove the labels from it. This is where the allocations ultimately come from.

jacobbaungard · 2023-03-16T16:03:15Z

Is printing of the labelsets useful for debugging purposes, or is it sufficient to know the store addresses? If it's enough to know the store addresses, perhaps the easiest fix is to remove the printing of the labelsets as @douglascamata suggests above.

It is probably quite a bit of work to conditionally generate the debug messages, given we can't get the current log level easily from go-kit/log.

matej-g · 2023-03-17T11:09:51Z

I think to know why a store has been filtered out, labels can be useful. On the other hand, for the debug message about start of fan out, I think address might suffice? In that case, we could move gathering the stores info after if len(stores) == 0 and in other places print only Addr().

douglascamata · 2023-03-17T11:15:23Z

I think to know why a store has been filtered out, labels can be useful.

@matej-g but realistically speaking, did you ever find yourself using that information? I'm just trying to get us to think whether we believe it's useful or we know it's useful. Even looking at the size of the debug log lines that this will generate, it's a lot of work to just read through it...

matej-g · 2023-03-17T11:21:54Z

If you want my personal empirical experience, it was useful for me 😄, there were couple of times where I could not understand why exactly the store was filtered out and found it only after looking at labels. That being said, this might not be universal for all users. It's not a big deal to drop it and only include address and use that to lookup labels. I was just thinking if we could retain that info and sidestep this performance issue at the same time, but either solution is fine with me 🙂.

douglascamata · 2023-03-17T11:39:23Z

@matej-g I see. As mentioned by others, if we could check the log level there could be a "base log" at info level without the labels and at debug level it could include labels.

fpetkovski · 2023-03-17T11:46:40Z

I think we should not generate debug messages at all if debug log level is not enabled. We can propagate this as a separate flag that is as generateDebugMessages = log.level == "debug".

In debug mode we log information about which endpoints we skip and query, including the labels of each endpoint. These debug messages, can use significant memory, especially in larger setups with larger number of labelsets. Previously we would generate the debug messages, even when not in debug mode. With commit, we only generate the debug messages when debug logging is enabled. This fixes: thanos-io#6198 Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>

jacobbaungard · 2023-03-20T11:22:21Z

I attempted to create a fix for this.

I didn't manage to generate a profile similar to the look of yours though @fpetkovski. If you have an environment where it is easy to do so, perhaps it would be worth to just sanity check that the PR actually solves the memory bloat issue reported here.

fpetkovski · 2023-03-20T12:17:53Z

Hi Jacob, thanks for the PR 👍 I will deploy this today or tomorrow and take another profile.

In debug mode we log information about which endpoints we skip and query, including the labels of each endpoint. These debug messages, can use significant memory, especially in larger setups with larger number of labelsets. Previously we would generate the debug messages, even when not in debug mode. With commit, we only generate the debug messages when debug logging is enabled. This fixes: thanos-io#6198 Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>

jacobbaungard mentioned this issue Mar 20, 2023

Proxy: Only generate debug messages in debug mode #6228

Merged

2 tasks

fpetkovski closed this as completed in #6228 Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxy: memory bloat caused by fmt.Sprintf #6198

Proxy: memory bloat caused by fmt.Sprintf #6198

fpetkovski commented Mar 9, 2023

fpetkovski commented Mar 9, 2023

douglascamata commented Mar 16, 2023

bwplotka commented Mar 16, 2023

fpetkovski commented Mar 16, 2023 •

edited

douglascamata commented Mar 16, 2023

jacobbaungard commented Mar 16, 2023

matej-g commented Mar 17, 2023

douglascamata commented Mar 17, 2023

matej-g commented Mar 17, 2023

douglascamata commented Mar 17, 2023

fpetkovski commented Mar 17, 2023

jacobbaungard commented Mar 20, 2023

fpetkovski commented Mar 20, 2023

Proxy: memory bloat caused by fmt.Sprintf #6198

Proxy: memory bloat caused by fmt.Sprintf #6198

Comments

fpetkovski commented Mar 9, 2023

fpetkovski commented Mar 9, 2023

douglascamata commented Mar 16, 2023

bwplotka commented Mar 16, 2023

fpetkovski commented Mar 16, 2023 • edited

douglascamata commented Mar 16, 2023

jacobbaungard commented Mar 16, 2023

matej-g commented Mar 17, 2023

douglascamata commented Mar 17, 2023

matej-g commented Mar 17, 2023

douglascamata commented Mar 17, 2023

fpetkovski commented Mar 17, 2023

jacobbaungard commented Mar 20, 2023

fpetkovski commented Mar 20, 2023

fpetkovski commented Mar 16, 2023 •

edited