🎁 Continuous CPU profiling #396

gerhard · 2021-11-17T08:06:27Z

Why did we do this?

To understand how our K8s worker nodes use CPU
To quickly generate and compare CPU profiles
To look at the CPU profile from a longer-term perspective (hours)

How did we do it?

We installed Parca agent first, and server next in our Kubernetes production cluster.
This is what that looks like:

Next, we port-forward 7070 to Parca server, and load the UI:

We select CPU Samples profile, hit Search and see all workloads that use CPU across the K8s cluster:

Let's click on the process which has the most samples (this will be the one that uses the most CPU):

Great! We notice that this is the Parca (server), and we instantly see that widest span is actually symbolizing DWARF information. Is this a good thing or a bad thing? I mean, I think that it does what it's supposed to, but what I really want to do is compare a high number of samples, like this peak, to a low number of samples and try to understand what is different.

I hold SHIFT, click on the container="parca" label, then click the Compare button. Next, I click on a point in the graph with fewer samples on the left-hand side, and a point with a high number on the right-hand side. This is what I see:

ASIDE: I know, this looks a lot like Christmas 🎄

All new spans on the right-hand side are shown as red, which means all that extra work that the CPU needs to do. I quickly spot that memory garbage collection makes for most of the new samples. My understanding is that this is due to the Parca server writing the CPU samples that it tracks to its storage, which is what happens just before garbage collection. This makes sense, and is exactly what I would expect to see when a process has "work" to do:

That is already helpful, meaning that if I had to optimise something, since those functions use the most CPU, that's where I would start. Now remember, some of this may be necessary - after all, processes are meant to do work, and use CPU - but maybe they are not as efficient as they could be, and since those functions use the most CPU, I would need to go through that code and see what could be optimised or removed (if anything!).

The last thing that I want to mention regarding the Compare view is that:

💚 Green means less CPU activity
💙 Blue means the same amount of CPU activity
❤️ Red means more CPU activity

Lastly, if I wanted to see CPU profiles across everything for the last day perhaps (this may use a lot of memory!) I select Last day from the drop-down and click on the Merge button. This is what I see:

You will spot postgres, calico and almost half of all CPU spans in parca. The memory addresses on the left-hand side in the screenshot above are actually for the Erlang VM, which cannot be currently symbolized. This is a present in the sense that it's a missing feature and we are making the wider community aware of it. It would be amazing if we could understand what actually happens in the Erlang VM (the changelog.com app runtime - it's a Phoenix app fwiw) from a CPU perspective with eBPF & pprof (which is what Parca uses under the hood). If we were to use Parca to look at it today (v0.6.1), this is what we would see:

To analyse a profile locally, click on Download pprof to download locally, uncompress, and then run the following command:

perf report -i /tmp/perf.data

# For time spent in each function without its children:
perf report --no-children -i /tmp/perf.data

Anything else to add?

By default, Parca server deploy does not have any CPU or memory limits.

In the current implementation - v0.6.1 - Parca server will OOM and restart when it reaches the memory limit. When this happens, all CPU samples will be lost. This is OK for us, even in production, and we simply patch the K8s deployment with CPU & memory limits.

What happens next?

Listen to @brancz tell us more about this in Ship It! #33
Give feedback re missing PostgreSQL symbolisation. This was helpful: debuginfod
Put Erlang / Elixir onto @brancz's radar
Start a conversation in the Erlang / Elixir community re answering some of @brancz questions. Off the top of my head, I am thinking: @garazdawi @mkuratczyk @lawik @akoutmos @deadtrickster @ferd. FWIW, this was helpful: Performance testing the JIT compiler for the BEAM VM

Should I read the comments below?

If you are curious about the steps that went into this, have a look, otherwise all the important details are above the fold, meaning in this top-level comment. This is it for now, have a great Christmas 2021 🎄

As for the one more thing moment, I want to mention actually three things:

Notice how Parca pins the apt sources, and then installs the packes with no versions, since the sources pin them. This is the best way to ensure that anyone can reproduce the builds (to the byte!) that Parca distributes. That is a big vote of trust & confidence @brancz!

https://www.parca.dev/docs/kubernetes

Agent & server versions on the website are always in sync. It is re-generated in Vercel after a release gets created:

gerhard · 2021-11-18T21:45:37Z

I am running ghcr.io/parca-dev/parca-agent:v0.2.0-45-gdfa59a1 and I have confirmed that ERL_FLAGS="+S 1 +JPperf true" have been passed through correctly:

I was expecting these memory pointers to be de-referenced:

I grabbed the pprof & the beam.smp and ran them locally through https://github.com/google/pprof:

Does the Parca output look right to you @brancz?

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Makes experimenting "from scratch" easier Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

@brancz

With parca-dev/parca-agent#132 So far, so good @brancz. Time for some Erlang JIT-fu 🥋 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Unbounded parca server memory growth is the primary concern. Currently the server will crash with OOM, but that is OK, until profiles retention feature is finalized. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

All context is in #396 Just in time for Christmas 🎄 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

gerhard added the echoes/initiative: ship-it-christmas-gifts label Nov 17, 2021

gerhard self-assigned this Nov 17, 2021

gerhard mentioned this pull request Dec 17, 2021

parca-server 0.6.0 shows dev in the UI parca-dev/parca#502

Closed

gerhard added 9 commits December 17, 2021 12:49

Start the Parca experiment

8132ec8

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Setup Parca with a single command

f6843e2

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Add a single command to delete Parca

4cc318b

Makes experimenting "from scratch" easier Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Clean-up Parca notes

68759ca

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Add single command to patch parca-agent

9583545

With parca-dev/parca-agent#132 So far, so good @brancz. Time for some Erlang JIT-fu 🥋 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Capture Parca's approach to reproducible builds

d8dd58e

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Deploy latest parca-agent & set parca-server limits

f061703

Unbounded parca server memory growth is the primary concern. Currently the server will crash with OOM, but that is OK, until profiles retention feature is finalized. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Bump Parca server to latest release

e16691c

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

Upgrade Parca Server to latest stable

c524402

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

gerhard force-pushed the parca-experiment branch from 03e30f0 to c524402 Compare December 17, 2021 12:49

gerhard changed the title ~~Experiment with Parca in Kubernetes worker nodes~~ 🎁 Visualise CPU profiles on K8s worker nodes Dec 17, 2021

gerhard changed the title ~~🎁 Visualise CPU profiles on K8s worker nodes~~ 🎁 Visualise CPU profiles on K8s worker nodes with Parca Dec 17, 2021

gerhard changed the title ~~🎁 Visualise CPU profiles on K8s worker nodes with Parca~~ 🎁 Visualise & compare CPU profile with Parca Dec 17, 2021

thechangelog deleted a comment from ikavgo Dec 17, 2021

gerhard changed the title ~~🎁 Visualise & compare CPU profile with Parca~~ 🎁 Continuous CPU profiling Dec 18, 2021

thechangelog deleted a comment from deadtrickster Dec 22, 2021

Ship the gift of Continuous CPU profiling

95b6b7e

All context is in #396 Just in time for Christmas 🎄 Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>

gerhard merged commit c2ff074 into master Dec 22, 2021

gerhard deleted the parca-experiment branch December 22, 2021 22:46

gerhard mentioned this pull request Mar 25, 2022

Template dagger CLI version in docs dagger/dagger#1854

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎁 Continuous CPU profiling #396

🎁 Continuous CPU profiling #396

gerhard commented Nov 17, 2021 •

edited

Loading

gerhard commented Nov 18, 2021

🎁 Continuous CPU profiling #396

🎁 Continuous CPU profiling #396

Conversation

gerhard commented Nov 17, 2021 • edited Loading

Why did we do this?

How did we do it?

Anything else to add?

What happens next?

Should I read the comments below?

gerhard commented Nov 18, 2021

gerhard commented Nov 17, 2021 •

edited

Loading