-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎁 Continuous CPU profiling #396
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I am running I was expecting these memory pointers to be de-referenced: I grabbed the pprof & the beam.smp and ran them locally through https://github.com/google/pprof: Does the Parca output look right to you @brancz? |
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Makes experimenting "from scratch" easier Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
With parca-dev/parca-agent#132 So far, so good @brancz. Time for some Erlang JIT-fu 🥋 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Unbounded parca server memory growth is the primary concern. Currently the server will crash with OOM, but that is OK, until profiles retention feature is finalized. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
03e30f0
to
c524402
Compare
All context is in #396 Just in time for Christmas 🎄 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why did we do this?
How did we do it?
We installed Parca agent first, and server next in our Kubernetes production cluster.
This is what that looks like:
Next, we port-forward 7070 to Parca server, and load the UI:
We select CPU Samples profile, hit Search and see all workloads that use CPU across the K8s cluster:
Let's click on the process which has the most samples (this will be the one that uses the most CPU):
Great! We notice that this is the Parca (server), and we instantly see that widest span is actually symbolizing DWARF information. Is this a good thing or a bad thing? I mean, I think that it does what it's supposed to, but what I really want to do is compare a high number of samples, like this peak, to a low number of samples and try to understand what is different.
I hold
SHIFT
, click on thecontainer="parca"
label, then click the Compare button. Next, I click on a point in the graph with fewer samples on the left-hand side, and a point with a high number on the right-hand side. This is what I see:All new spans on the right-hand side are shown as red, which means all that extra work that the CPU needs to do. I quickly spot that memory garbage collection makes for most of the new samples. My understanding is that this is due to the Parca server writing the CPU samples that it tracks to its storage, which is what happens just before garbage collection. This makes sense, and is exactly what I would expect to see when a process has "work" to do:
That is already helpful, meaning that if I had to optimise something, since those functions use the most CPU, that's where I would start. Now remember, some of this may be necessary - after all, processes are meant to do work, and use CPU - but maybe they are not as efficient as they could be, and since those functions use the most CPU, I would need to go through that code and see what could be optimised or removed (if anything!).
The last thing that I want to mention regarding the Compare view is that:
Lastly, if I wanted to see CPU profiles across everything for the last day perhaps (this may use a lot of memory!) I select Last day from the drop-down and click on the Merge button. This is what I see:
You will spot
postgres
,calico
and almost half of all CPU spans inparca
. The memory addresses on the left-hand side in the screenshot above are actually for the Erlang VM, which cannot be currently symbolized. This is a present in the sense that it's a missing feature and we are making the wider community aware of it. It would be amazing if we could understand what actually happens in the Erlang VM (the changelog.com app runtime - it's a Phoenix app fwiw) from a CPU perspective with eBPF & pprof (which is what Parca uses under the hood). If we were to use Parca to look at it today (v0.6.1), this is what we would see:To analyse a profile locally, click on Download pprof to download locally, uncompress, and then run the following command:
perf report -i /tmp/perf.data # For time spent in each function without its children: perf report --no-children -i /tmp/perf.data
Anything else to add?
By default, Parca server deploy does not have any CPU or memory limits.
In the current implementation -
v0.6.1
- Parca server will OOM and restart when it reaches the memory limit. When this happens, all CPU samples will be lost. This is OK for us, even in production, and we simply patch the K8s deployment with CPU & memory limits.What happens next?
Should I read the comments below?
If you are curious about the steps that went into this, have a look, otherwise all the important details are above the fold, meaning in this top-level comment. This is it for now, have a great Christmas 2021 🎄
As for the one more thing moment, I want to mention actually three things:
https://github.com/pyrra-dev/pyrra & https://demo.pyrra.dev/
https://github.com/parca-dev/parca-agent/blob/0327839ff8d58042c27f7588e61c73cba4b5b3b3/Dockerfile
Notice how Parca pins the apt sources, and then installs the packes with no versions, since the sources pin them. This is the best way to ensure that anyone can reproduce the builds (to the byte!) that Parca distributes. That is a big vote of trust & confidence @brancz!
Agent & server versions on the website are always in sync. It is re-generated in Vercel after a release gets created: