New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zephyr instrumentation subsystem #57373
Comments
Hi @gromero! We appreciate you submitting your first issue for our open-source project. 🌟 Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙 |
💯
Perfect - this is exactly what I was looking for! Thanks for putting this together @gromero and @microbuilder |
This is looking great! |
@gromero Any update on getting this over the finish line? |
Is it ready to merge? |
@TaiJuWu @microbuilder Unfortunately, not yet. I need to remove the variable length support in the ring buffer (use Zephyr API instead) and test it on a VM as requested by @carlocaione in Prague. I didn't give up. It's been just hard to find the cycles I need to address it. Otherwise, I deem it done. |
TL:DR;
New Zephyr subsystem for profiling and tracing, proof of concept: #57371
CLI tool demo:
https://asciinema.org/a/581092
Perfetto output example:
Introduction
This RFC proposes a new Zephyr subsystem named
instrumentation
meant forprofiling and tracing. The new 'instrumentation' subsys leverages the compiler
code instrumentation feature available on compilers like GCC and LLVM, enabled
via the flag
-finstrument-functions
, to collect profile data, functionentry/exit traces, and thread scheduling info in a non-intrusive way.
Problem description
Most profiling and tracing features in Zephyr today either depend on proprietary
and expensive tooling, or have significant platform limitations. For instance,
Segger's SystemView, which requires J-Link, or gprof, which only supports a few
platforms, often emulated, like native_posix, and native_posix_64 platforms,
with no timestamp data associated with events. Also, although
tracing
subsyshas numerous hooks available in its APIs, these hooks are static and so can't be
easily set to trace arbitrary functions in kernel or application code.
By leveraging the compiler code instrumentation feature it is possible to have a
platform, tooling, and arch-neutral tracing and profiling subsystem which allows
collecting traces at arbitrary code locations, both at compile time and runtime,
and at the same time, using the very same mechanism, collect profiling data for
the traced code.
Once the traces and profiling data are collected, various transport layers can
be used to transfer data from the target to the host, similary to the transport
backends available in the
tracing
subsys (UART, USB, etc).Data can then the viewed in the host using a CLI tool, which can display traces
and profing statistics in the text terminal or export it to various formats so
data can be visualized on GUI tools, like Perfetto TraceViewer. The same CLI
tool allows for setting the functions to be traced at runtime (it can also be
set via Kconfig, at compile time).
Proposed change
Implement a new Zephyr subsystem named
instrumentation
and an associated CLItool that allows setting, collecting, displaying, and exporting tracing and
profiling data captured by this new subsystem.
Detailed RFC
Compilers like GCC and LLVM can generate instrumentation calls for entry and
exit to functions if flag
-finstrument-functions
is passed to the compiler.When this feature is enabled, just after function entry and just before function
exit, the following functions, respectively, are called with the address of the
current function and its call site:
The instrumentation is also done for functions expanded inline in other
functions. A function may be given the attribute
no_instrument_function
, inwhich case this instrumentation is not done for the function. It's also possible
to exclude from instrumentation all functions in a compilation unit (source
file) by setting
-finstrument-functions-exclude-file-list=file,file,...
, oralso exclude functions by their name setting
-finstrument-functions-exclude-function-list=sym,sym,...
.All these options take effect at compile time and so the
instrumentation
subsys has yet another way to disable the instrumentation at runtime: calling
instr_disable()
; andinstr_enable()
, to enable it back. This is aninteresting mechanism because it allows using any function inside the
instrumentation code and avoid having to exclude that function from
instrumentation at compile time, which would prevent tracing and profiling that
function outside the instrumentation code, i.e. elsewhere in kernel or
in the application code.
Because the compiler instruments the entry and exit to functions, it's possible
to have a single handler called by
__cyg_profile_func_enter()
and__cyg_profile_func_exit()
that collects profiling and tracing data for thewhole system. The
instr_event_handler()
is this handler.Conceptually, the only events that can trigger the handler are function entries
and exits, which provide good function trace information. However once the
handler inspects the context and the name of the function (callee) the handler
can promote these events to others. For instance, if it's possible to get the
context when a function is called, a function entry event with additional
context (CPU ID, priority, thread name, etc) is generated; another example is if
the callee is part of the thread switching code, the event can be promoted to a
thread switched in/out scheduler event (e.g.
z_thread_mark_switched_in/out()
).Other events can be easily added the same way.
tracing
subsys needs to enabled (CONFIG_TRACING=y
) to give access to hookslike
z_thread_mark_switched_in/out()
.The
instrumentation
subsys relies upon a monotonic timer which is initialized tozero at subsys initialization. Different platforms can set the timer backend to
the most suitable timer source to be used. For example, on some Arm Cortex-M the
DWT (Data Watchpoint and Trace) unit is available and
DWT_CYCCNT
can be usedas a monotonic clock source. This timer is used by the handler to acquire a
timestamp for each event processed.
The handler collects profile data for each function in an aggregated way,
where the total amount of time spent in the function (delta) is calculated per
entry/exit and accumulated. The accumulators are kept in an array. Similarly to
the way a function stack is created and destroyed in the function's prologue and
epilogue, a timestamp (S0) is pushed to an alternative stack when a function is
called and another timestamp is obtained just before the function returns (S1).
The delta is calculated as
delta = S1 - S0
. Delta is then accumulated perfunction. The depth of the timestamp stack can be configured (e.g to 16) and the
number of functions for which delta can be accumulated can also be configured
(e.g. 256 as the size of accumulator array).
The handler also collects traces, notabily, function entry/exit events but
also scheduling events. The events are recorded into a configurable size buffer,
which can be configured to behave as a ring buffer or as a fixed buffer,
depending on the use case.
The ring buffer API used by the instrumentation subsys to keep the traces is
similar to the Zephyr Ring Buffers API but allows putting variable size elements
(event records) and also can work in overwrite mode. In the overwrite mode when
the buffer fills up newer events will overwrite the older ones. The buffer can
also be configured to be in fixed mode, in this mode once the buffer fills up no
more traces are recorded. The ring buffer API consists in:
When the instrumentation handler needs to write a record it calls
put_item_claim
and informs the type of the record it wants to write. If thereis enough contiguous space (buffer doesn't wrap)
put_item_claim
returns apointer that will be used to write the record directly to the buffer. If the
buffer wraps, the instrumentation handler will need to fill up first a record
structure and pass it to
put_item_wrapping
, which will take care of copyingthe record to the buffer, wrapping data accordingly to the start of the buffer.
When the instrumentation handler needs to read a record, usually for dumping it
to the host, the ring buffer
get_*
functions are used instead and they worksimilarly to
put_*
ones. One additional function is available for conveniencetho:
get_item
.get_item
will returned the next available item at the tailof the ring buffer, until there is no more items to get from the ring buffer.
Event records are defined as follows, where the header is essentially used to
specify the event type:
The following basic event types are proposed:
Subsystem initialization and activation
Because the compiler can instrument any function, the instrumentation handler
can be called quite early in the boot process, even as soon as the first kernel
function is called. This can be problematic because RAM might be not properly
initialized yet.
One way to guarantee the instrumentation subsystem has the proper context to be
initialized is to check if an initialized C variable in RAM is already properly
initialized with a magic number; in other words, check an in-RAM value against a
magic value in the flash memory. This check is done by
instr_fundamentals_initialized()
and guarantees essential Zephyr RAMinitialization functions were executed (like
z_early_memset
& friends).In the instrumentation initialization the tracing buffer will be initialized.
Also, the
trigger
andstopper
function will be set accordingly. Thetrigger
and
stopper
function addresses control when, after the subsys is enabled, thetracing and profiling will start and stop, respectively. The
trigger
andstopper
addresses are set to a default value at compile time based on Kconfig,but since they can be set at runtime and need to survive between reboots, they
can be set also from the RAM, via the
retained_mem
driver. Hence, if valuesfor
trigger
andstopper
are available from retained memory, they will be setaccordingly, overwritting the defaults from compile time.
After proper initialization the instrumentation subsystem will be enabled but
turned off.
In the instrumentation subsys terminology
enabled
means that tracing andprofiling can be performed at any time but will start only when instrumentation
is turned on, and will stop when it is turned off. Thus there are two stages to
effetively activate tracing and profiling. This allows disabling the
instrumentation in the instrumentation code itself at runtime, so in critical
sections, avoiding infinite recursions (instrumentation subsys functions, like
the handler, are excluded at compile time from instrumentation but can call
other functions, like
strncpy
, etc, which are instrumented, leading toinfinite recursion).
Tracing and profiling are turned on when the
trigger
function is called andare turned off when the
stopper
function returns. This lets a fine-grainedcontrol over which code location tracing and profiling will be active.
Transport
Much like in the
tracing
subsys, theinstrumentation
subsys might supportvarious backends for transporting the tracing and profiling data collected in
the buffers. The focus must be on popular, low-cost, and widely available
interfaces, like UART and USB.
The transport controller supports the following commands:
status
: Get status on tracing and profiling (enabled / disabled)reboot
: Reboot targetping
: Ping transport controllerdump_trace
: Dump traces from bufferdump_profiler
: Dump profile deltas from accumulator arraytrigger
: Set trigger functionstopper
: Set stopper functionThe dump commands dump the event records as a binary stream via the transport
channel. The binary stream is decoded using the CTF (Common Trace Format) v1.8
and libbabeltrace2.
Currently only the asynchronous transfer mode is considered.
CLI tool
To interface with the transport controller an easy to use CLI tool is proposed,
which provides the subcommands and options below:
Changes in Zephyr core
No changes are necessary in the existing Zephyr core code or in the tracing
hooks since the
instrumentation
subsys is called from instrumented codegenerated by the compiler.
Use cases
The
instrumentation
subsys and its CLI tool, zaru, can be used to trace andprofile specific code location.
The user will set
trigger
andstopper
function names, usually set tomain
.If they are changed at runtime, then
--reboot
must be passed to reboot thetarget before data is collected at new location specified.
Profiling data is collected always when tracing is active.
The collected data can be viewed using
zaru trace
orzaru profile
subcommands. The size of the ring buffer can be configured accordingly to the
memory constraints of the target. The more buffer size, the more traces it's
possible to collect.
If the ring buffer is in write mode, the most recent events can be observed.
Otherwise, once the buffer gets full data collection stops and so only the first
(and older) events are kept recorded.
The tracing data can be exported to various formats, like the Event Trace Format,
that can be imported by Perfetto Trace Viewer.
Dependencies
The
instrumentation
subsys depends on thetracing
subsys, i.e.CONFIG_TRACING=y
, for the thread scheduling hooks. It also depends on theretained_mem
driver for retaining thetrigger
andstopper
addressesbetween reboots, i.e.
CONFIG_RETAINED_MEM=y
andCONFIG_RETAINED_MEM_MUTEXES=n
.And it also depends on the reboot capability (
CONFIG_REBOOT=y
).The CLI tool,
zaru
, depends on Python 3 Babeltrace2 module (bt2) for decodingthe binary stream (event records) via the Common Trace Format's Trace Stream
Description Language (TSDL).
Concerns and Unresolved Questions
delta values because the number of recursive calls exceeds the number of slots
in the timestamp stack (usually 16 slots). Maybe some heuristics can be used
to avoid pushing further timestamps in recursion.
which delta can be accumulated, some kind of heuristic can be used to discard
the least relevant ones, for instance, function with low delta values.
can be compacted further by using a lookup table and just recording the lookup
index.
type, so it's difficult to represent 32-bit vs 64-bit binary formats in a
single
metadata
file. The same happens for expressing big-endian vslittle-endian formats. Hence multiple
metadata
files would exist in CTF 1.8.This problem is solved in CTF 2.0, but currently there isn't a library
implementation for it (it's a WIP).
trigger
andstopper
mechanism can be enhanced, for instance, be activatedonly a certain amount of times and when callee is called from specific
callers.
example: if the trace buffer isn't enough to capture all the events for the
location selected, print how much memory it would be necessary to capture all
the events ideally, so the user can adjust the buffer size based on this
estimation.
The text was updated successfully, but these errors were encountered: