trtt edited this page Sep 2, 2015 · 9 revisions

Improving profiling capabilities of glretrace.

Currently the following options are available for profiling in glretrace: --pcpu, --pgpu, --pmem, --ppd. This is already a lot, but adding support for other metrics seems reasonable. The aim of this patchset is to improve profiling capabilities of glretrace.

The following items describe what is included in the patchset:

  • add support for metric sources (backends) in glretrace via metric abstraction system b45c308 87a0777
  • add GL_AMD_performance_monitor support a5702de 578aea8
  • add GL_INTEL_performance_query support 13d23ab da28aa4
  • make separate backend for supporting metrics available through old options, it is called "opengl" (perhaps, memory profiling (old --pmem) does not fit here very well with this name) b90d800 e1dee91

Metric abstraction system

The idea is to have a general interface that can be used to leverage almost any metric collection system. The proposed interface is the abstract class MetricBackend. It is described here:

Basically, it is a simple class that defines several functions (among them beginQuery/endQuery) for profiling boundaries (frames, draw calls, all calls), selecting and querying available metrics, retrieving profiling results. Supporting new backends is as simple as implementing these functions for them and hiding backend-specific code there. Metrics in communication of backend with the outer world are referred by the structure Metric (which is also defined in the same file). Metrics have some parameters and are divided into several groups.

What should be noted:

  1. The system is quite flexible. Different metrics could be enabled for different boundaries. So you can have Metric #1 taken for frames and Metric #2 for draw calls.

  2. Not all metrics could be profiled at the same time. Therefore profiling should be made in several passes. Backends are expected to generate and handle these passes (via generatePasses(), beginPass(), endPass()). Also for apitrace this means that retrace is to be made several times.

  3. It is supposed that metric data is stored in the backends and each backend takes care of it. Backend can be queried for the data. There is no unified storage.

Changes in glretrace

  • runs in several passes 136411b

  • all the new code when possible is carried out into new file metric_helper.cpp (which is just an addition to glretrace_main.cpp and uses the same namespace and header).

    This includes the getBackend() function which is responsible for getting backends by name strings (so it basically links metric abstraction system with glretrace). It also includes all the code needed to handle new CLI options.

  • new option --list-metrics 7f9933f

    Lists all backends with metrics that are available.

    Output includes all the information in the Metric structure. Looks a bit ugly, but there is no other way to get this data.


    Requires specification of the trace file -- it is needed to look at the type of the context created (different metrics might be available depending on that).

  • new options --pframes, --pcalls, --pdrawcalls 87a0777

    These are the actual options for profiling.

    --pcalls is used for profiling all calls. The reason why this option even exists -- profiling CPU times.

    The syntax of expected argument is the same for all 3:

    --pframes="backend1: [metric_group, metric_id], metric_name, ...; backend2: ..."

    For example: --pframes="GL_INTEL_performance_query: Aggregated Core Array Active, [1,4], GPU Busy; opengl: Pixels Drawn" --pcalls="opengl: CPU Start" --pdrawcalls="GL_AMD_performance_monitor: [0,2]"

    For given boundary and given backend a list of metrics to be profiled should be specified. Either by listing their names or ids (+group ids).

    Profiling is done sequentially for all enabled backends. Calls to backend profiling functions are done in the same places where the current (--pcpu, --pgpu, ...) profiling is done.

  • new option --gen-passes bd29efe

    Just in case you are wondering how many passes it takes to get all the metrics you want. Simply outputs the number of passes.

  • new profiling output generation 5769ff2 260a97b

    New class for handling profiling output -- MetricWriter.

    Why a new one is needed? Firstly, because new metrics have to be queried from backends and printed out. Also, currently (with --pgpu etc.) information about the profiled call is simply passed to the output every after the call. With the new metric abstraction system the output is generated at the end of the whole run (more about it below, in the part about memory consumption), so all the information about the call (name string, id, shader program) should be available there (there is also a question if this information is even needed, it is included just so the new/old outputs are similar). The simplest solution is to store this information in the memory, in the new class.

    Output format is 3 tsv (tab-separated) tables with metrics -- for frames, calls and draw calls.


Problematic places

  1. Memory consumption

With the new metric abstraction system it is not possible to pass data to the output after each call as it is done currently. Data is aggregated from different passes, different backends and different boundaries. So is is only possible to generate output at the end of the run.

Each backend has its own storage, also MetricWriter stores some information. This data is not freed until the end and it can take quite a lot of memory. For example, for the 3Gb trace file profiling CPU times takes 1Gb of memory for profiling needs (it is comparable to the blobs memory usage, which is the same 1Gb). Though, the main use case should probably be profiling draw calls or frames, it fortunately takes less memory.

All this data which occupies RAM cannot be dumped to the temporary file. For that to happen there should be a unified interface for all storage. This is incompatible with the design where each backend uses its own storage (and it is quite convenient considering how differently the data is obtained by the backends).

There is a kind of workaround to the problem. I wrote (a2856ef) a simple custom allocator that allocates data in the growing file and mmaps it. So that memory in the RAM is backed up by the file and the actual RAM pages can be freed up if needed. This allocator can be used with stl containers. It is used by the MetricWriter and it is passed to backends (ca8139c) (they can use it where appropriate).

I've written this optimization only for unix-like systems (implementation for Windows seems to be not very difficult either). Also its usage is quite limited in 32-bit systems (only 4Gb address space).

Probably this allocator could also be used for blobs in apitrace since they take some memory.

  1. Context switching

In OpenGL, a context switch is executed by calling MakeCurrent. Context switches are handled in the commit 1d0bf77. It adds possibility to support switches by adding two functions to the interface: pausePass(), continuePass(). This commit also adds basic switching support in implemented backends (AMD_perfmon, INTEL_perfquery, opengl). But it is not final, there are some problems. Since metrics are initially chosen for the first context encountered in the trace file, other contexts might not support all chosen metrics. This is not handled properly for "opengl" backend, for example. Frame can be interrupted by a context switch. If it is also profiled, it is not clear what to do in this situation. In AMD_perfmon, INTEL_perfquery no data is saved for this frame; in "opengl" backend the part that was profiled before the context switch is saved (no particular reason, it was just simpler to do).

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.