Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy: memory bloat caused by fmt.Sprintf #6198

Closed
fpetkovski opened this issue Mar 9, 2023 · 13 comments · Fixed by #6228
Closed

Proxy: memory bloat caused by fmt.Sprintf #6198

fpetkovski opened this issue Mar 9, 2023 · 13 comments · Fixed by #6228

Comments

@fpetkovski
Copy link
Contributor

I was looking at a querier profile and noticed that fmt.Sprintf inside the proxy uses 15% of total querier memory. This querier instance is handling alerting rulers and has a high RPS.

It's probably worth replacing dynamic messages with constant errors.

image

@fpetkovski
Copy link
Contributor Author

CPU profile looks even worse, fmt.Sprintf is ~30% of CPU usage.

Image

@douglascamata
Copy link
Contributor

@fpetkovski was the Querier in debug logging level?

The memory profile in the first picture shows to me that most of the memory used by fmt.Sprintf comes from endpointRef.String, that is called at

storeDebugMsgs = append(storeDebugMsgs, fmt.Sprintf("store %s filtered out: %v", st, reason))

@bwplotka
Copy link
Member

Yea, good finding! We should definitely ONLY append store debug msgs when level debug is enabled. I wished logger gave us this info 🙃

However, i don't expect it to affect bigger queries, it's only for NOP queries it's significant CPU overhead, which is kind of fine - it's not hot path then. Still low hanging fruit. 👍🏽

@fpetkovski
Copy link
Contributor Author

fpetkovski commented Mar 16, 2023

Yeah I think this is because we generate debug messages even without debug mode. We use alerting scoped per tenant so we have lots of excluded stores for alerts.

@douglascamata
Copy link
Contributor

An option might be also to change endpointRef.String() and remove the labels from it. This is where the allocations ultimately come from.

@jacobbaungard
Copy link
Contributor

Is printing of the labelsets useful for debugging purposes, or is it sufficient to know the store addresses? If it's enough to know the store addresses, perhaps the easiest fix is to remove the printing of the labelsets as @douglascamata suggests above.

It is probably quite a bit of work to conditionally generate the debug messages, given we can't get the current log level easily from go-kit/log.

@matej-g
Copy link
Collaborator

matej-g commented Mar 17, 2023

I think to know why a store has been filtered out, labels can be useful. On the other hand, for the debug message about start of fan out, I think address might suffice? In that case, we could move gathering the stores info after if len(stores) == 0 and in other places print only Addr().

@douglascamata
Copy link
Contributor

I think to know why a store has been filtered out, labels can be useful.

@matej-g but realistically speaking, did you ever find yourself using that information? I'm just trying to get us to think whether we believe it's useful or we know it's useful. Even looking at the size of the debug log lines that this will generate, it's a lot of work to just read through it...

@matej-g
Copy link
Collaborator

matej-g commented Mar 17, 2023

If you want my personal empirical experience, it was useful for me 😄, there were couple of times where I could not understand why exactly the store was filtered out and found it only after looking at labels. That being said, this might not be universal for all users. It's not a big deal to drop it and only include address and use that to lookup labels. I was just thinking if we could retain that info and sidestep this performance issue at the same time, but either solution is fine with me 🙂.

@douglascamata
Copy link
Contributor

@matej-g I see. As mentioned by others, if we could check the log level there could be a "base log" at info level without the labels and at debug level it could include labels.

@fpetkovski
Copy link
Contributor Author

I think we should not generate debug messages at all if debug log level is not enabled. We can propagate this as a separate flag that is as generateDebugMessages = log.level == "debug".

jacobbaungard added a commit to jacobbaungard/thanos that referenced this issue Mar 20, 2023
In debug mode we log information about which endpoints we skip and
query, including the labels of each endpoint. These debug messages, can
use significant memory, especially in larger setups with larger number
of labelsets.

Previously we would generate the debug messages, even when not in debug
mode. With commit, we only generate the debug messages when debug
logging is enabled.

This fixes: thanos-io#6198

Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
jacobbaungard added a commit to jacobbaungard/thanos that referenced this issue Mar 20, 2023
In debug mode we log information about which endpoints we skip and
query, including the labels of each endpoint. These debug messages, can
use significant memory, especially in larger setups with larger number
of labelsets.

Previously we would generate the debug messages, even when not in debug
mode. With commit, we only generate the debug messages when debug
logging is enabled.

This fixes: thanos-io#6198

Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
jacobbaungard added a commit to jacobbaungard/thanos that referenced this issue Mar 20, 2023
In debug mode we log information about which endpoints we skip and
query, including the labels of each endpoint. These debug messages, can
use significant memory, especially in larger setups with larger number
of labelsets.

Previously we would generate the debug messages, even when not in debug
mode. With commit, we only generate the debug messages when debug
logging is enabled.

This fixes: thanos-io#6198

Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
jacobbaungard added a commit to jacobbaungard/thanos that referenced this issue Mar 20, 2023
In debug mode we log information about which endpoints we skip and
query, including the labels of each endpoint. These debug messages, can
use significant memory, especially in larger setups with larger number
of labelsets.

Previously we would generate the debug messages, even when not in debug
mode. With commit, we only generate the debug messages when debug
logging is enabled.

This fixes: thanos-io#6198

Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
jacobbaungard added a commit to jacobbaungard/thanos that referenced this issue Mar 20, 2023
In debug mode we log information about which endpoints we skip and
query, including the labels of each endpoint. These debug messages, can
use significant memory, especially in larger setups with larger number
of labelsets.

Previously we would generate the debug messages, even when not in debug
mode. With commit, we only generate the debug messages when debug
logging is enabled.

This fixes: thanos-io#6198

Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
@jacobbaungard
Copy link
Contributor

I attempted to create a fix for this.

I didn't manage to generate a profile similar to the look of yours though @fpetkovski. If you have an environment where it is easy to do so, perhaps it would be worth to just sanity check that the PR actually solves the memory bloat issue reported here.

@fpetkovski
Copy link
Contributor Author

Hi Jacob, thanks for the PR 👍 I will deploy this today or tomorrow and take another profile.

jacobbaungard added a commit to jacobbaungard/thanos that referenced this issue Mar 22, 2023
In debug mode we log information about which endpoints we skip and
query, including the labels of each endpoint. These debug messages, can
use significant memory, especially in larger setups with larger number
of labelsets.

Previously we would generate the debug messages, even when not in debug
mode. With commit, we only generate the debug messages when debug
logging is enabled.

This fixes: thanos-io#6198

Signed-off-by: Jacob Baungard Hansen <jachanse@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants