Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎁 Continuous CPU profiling #396

Merged
merged 10 commits into from
Dec 22, 2021
Merged

🎁 Continuous CPU profiling #396

merged 10 commits into from
Dec 22, 2021

Conversation

gerhard
Copy link
Member

@gerhard gerhard commented Nov 17, 2021

Why did we do this?

  • To understand how our K8s worker nodes use CPU
  • To quickly generate and compare CPU profiles
  • To look at the CPU profile from a longer-term perspective (hours)

How did we do it?

We installed Parca agent first, and server next in our Kubernetes production cluster.
This is what that looks like:

image

Next, we port-forward 7070 to Parca server, and load the UI:

image

We select CPU Samples profile, hit Search and see all workloads that use CPU across the K8s cluster:

image

Let's click on the process which has the most samples (this will be the one that uses the most CPU):

image

Great! We notice that this is the Parca (server), and we instantly see that widest span is actually symbolizing DWARF information. Is this a good thing or a bad thing? I mean, I think that it does what it's supposed to, but what I really want to do is compare a high number of samples, like this peak, to a low number of samples and try to understand what is different.

I hold SHIFT, click on the container="parca" label, then click the Compare button. Next, I click on a point in the graph with fewer samples on the left-hand side, and a point with a high number on the right-hand side. This is what I see:

image

ASIDE: I know, this looks a lot like Christmas 🎄

All new spans on the right-hand side are shown as red, which means all that extra work that the CPU needs to do. I quickly spot that memory garbage collection makes for most of the new samples. My understanding is that this is due to the Parca server writing the CPU samples that it tracks to its storage, which is what happens just before garbage collection. This makes sense, and is exactly what I would expect to see when a process has "work" to do:

image

image

That is already helpful, meaning that if I had to optimise something, since those functions use the most CPU, that's where I would start. Now remember, some of this may be necessary - after all, processes are meant to do work, and use CPU - but maybe they are not as efficient as they could be, and since those functions use the most CPU, I would need to go through that code and see what could be optimised or removed (if anything!).

The last thing that I want to mention regarding the Compare view is that:

  • 💚 Green means less CPU activity
  • 💙 Blue means the same amount of CPU activity
  • ❤️ Red means more CPU activity

Lastly, if I wanted to see CPU profiles across everything for the last day perhaps (this may use a lot of memory!) I select Last day from the drop-down and click on the Merge button. This is what I see:

image

You will spot postgres, calico and almost half of all CPU spans in parca. The memory addresses on the left-hand side in the screenshot above are actually for the Erlang VM, which cannot be currently symbolized. This is a present in the sense that it's a missing feature and we are making the wider community aware of it. It would be amazing if we could understand what actually happens in the Erlang VM (the changelog.com app runtime - it's a Phoenix app fwiw) from a CPU perspective with eBPF & pprof (which is what Parca uses under the hood). If we were to use Parca to look at it today (v0.6.1), this is what we would see:

image

To analyse a profile locally, click on Download pprof to download locally, uncompress, and then run the following command:

perf report -i /tmp/perf.data

# For time spent in each function without its children:
perf report --no-children -i /tmp/perf.data

Anything else to add?

By default, Parca server deploy does not have any CPU or memory limits.

In the current implementation - v0.6.1 - Parca server will OOM and restart when it reaches the memory limit. When this happens, all CPU samples will be lost. This is OK for us, even in production, and we simply patch the K8s deployment with CPU & memory limits.

What happens next?

Should I read the comments below?

If you are curious about the steps that went into this, have a look, otherwise all the important details are above the fold, meaning in this top-level comment. This is it for now, have a great Christmas 2021 🎄


As for the one more thing moment, I want to mention actually three things:

  1. https://github.com/pyrra-dev/pyrra & https://demo.pyrra.dev/

  2. https://github.com/parca-dev/parca-agent/blob/0327839ff8d58042c27f7588e61c73cba4b5b3b3/Dockerfile

Notice how Parca pins the apt sources, and then installs the packes with no versions, since the sources pin them. This is the best way to ensure that anyone can reproduce the builds (to the byte!) that Parca distributes. That is a big vote of trust & confidence @brancz!

  1. https://www.parca.dev/docs/kubernetes

Agent & server versions on the website are always in sync. It is re-generated in Vercel after a release gets created:

@gerhard
Copy link
Member Author

gerhard commented Nov 18, 2021

I am running ghcr.io/parca-dev/parca-agent:v0.2.0-45-gdfa59a1 and I have confirmed that ERL_FLAGS="+S 1 +JPperf true" have been passed through correctly:

image

I was expecting these memory pointers to be de-referenced:
image

I grabbed the pprof & the beam.smp and ran them locally through https://github.com/google/pprof:
image

Does the Parca output look right to you @brancz?

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Makes experimenting "from scratch" easier

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
With parca-dev/parca-agent#132

So far, so good @brancz. Time for some Erlang JIT-fu 🥋

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Unbounded parca server memory growth is the primary concern. Currently
the server will crash with OOM, but that is OK, until profiles
retention feature is finalized.

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
@gerhard gerhard changed the title Experiment with Parca in Kubernetes worker nodes 🎁 Visualise CPU profiles on K8s worker nodes Dec 17, 2021
@gerhard gerhard changed the title 🎁 Visualise CPU profiles on K8s worker nodes 🎁 Visualise CPU profiles on K8s worker nodes with Parca Dec 17, 2021
@gerhard gerhard changed the title 🎁 Visualise CPU profiles on K8s worker nodes with Parca 🎁 Visualise & compare CPU profile with Parca Dec 17, 2021
@thechangelog thechangelog deleted a comment from ikavgo Dec 17, 2021
@gerhard gerhard changed the title 🎁 Visualise & compare CPU profile with Parca 🎁 Continuous CPU profiling Dec 18, 2021
@thechangelog thechangelog deleted a comment from deadtrickster Dec 22, 2021
All context is in #396

Just in time for Christmas 🎄

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
@gerhard gerhard merged commit c2ff074 into master Dec 22, 2021
@gerhard gerhard deleted the parca-experiment branch December 22, 2021 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant