# Example: Memory Tracing for detailed dyn. Memory Estimations using ETISS

When measuring RAM usage of a program we can differentiate between two different classes:
- Static RAM usage (known after compilation/linking)
- Dynamic RAM usage (e.g. max. Heap/Stack utilization)

In the following an example is shown on how to use the `trace` feature of the `etiss_pulpino` target to measure the dynamic RAM usage in addition to the static usage for a simple benchmark.

## Supported components

**Models:** Any (`sine_model` used below)

**Frontends:** Any (`tflite` used below)

**Frameworks/Backends:** Any (`tvmaotplus` and `tvmrt` used below)

**Platforms/Targets:** `etiss_pulpino` only

## Prerequisites

Set up MLonmCU as usual, i.e. initializa an environment and install all required dependencies. Feel free to use the following minimal `environment.yml.j2` template:

```yaml
---
TODO
```

Do not forget to set your `MLONMCU_HOME` environment variable first if not using the default location!

## Usage

*Warning:* Since memory tracing involves writing a log of every single memory access to disk, this might drastically slow down the execution time as well as write a lot of data to your disk. (For larger models this might exceed 10GB per inference!)

### A) Command Line Interface

As an example, let's compare `tvmaotplus` (MicroTVM lightweight Ahead-of-Time runtime) with `tvmrt` (MicroTVMs legacy graph runtime).

To use the `trace` feature, just add `--feature trace` to the command line:

In [1]:
!mlonmcu flow run sine_model --backend tvmaotplus --backend tvmrt --target etiss_pulpino --feature trace

INFO - Loading environment cache from file
INFO - Successfully initialized cache
INFO - Loading extensions.py (User)
INFO - [session-270]  Processing stage LOAD
INFO - [session-270]  Processing stage BUILD
INFO - [session-270]  Processing stage COMPILE
INFO - [session-270]  Processing stage RUN
INFO - All runs completed successfuly!
INFO - Postprocessing session report
INFO - [session-270] Done processing runs
INFO - Report:
   Session  Run       Model Frontend Framework Backend Platform         Target  Cycles  MIPS  Total ROM  Total RAM  ROM read-only  ROM code  ROM misc  RAM data  RAM zero-init data  RAM stack  RAM heap Features                                             Config Postprocesses Comment
0      270    0  sine_model   tflite       tvm  tvmaot     mlif  etiss_pulpino    1522     0      56662       4513           4432     52086       144      2485                 268        720      1040  [trace]  {'tflite.use_inout_data': False, 'tflite.visua...            []       -


By using the `filter_cols` postprocess we can strip away all unneeded information from the benchmark report to make it a bit more reabile:

In [2]:
!mlonmcu flow run sine_model --backend tvmaotplus --backend tvmrt --target etiss_pulpino --feature trace \
    --postprocess filter_cols --config filter_cols.keep="Backend,Total RAM,RAM data,RAM zero-init data,RAM stack,RAM heap"

INFO - Loading environment cache from file
INFO - Successfully initialized cache
INFO - Loading extensions.py (User)
INFO - [session-271]  Processing stage LOAD
INFO - [session-271]  Processing stage BUILD
INFO - [session-271]  Processing stage COMPILE
INFO - [session-271]  Processing stage RUN
INFO - [session-271]  Processing stage POSTPROCESS
INFO - All runs completed successfuly!
INFO - Postprocessing session report
value 'Backend,Total RAM,RAM data,RAM zero-init data,RAM stack,RAM heap'
value 'Backend,Total RAM,RAM data,RAM zero-init data,RAM stack,RAM heap'
value 'Backend,Total RAM,RAM data,RAM zero-init data,RAM stack,RAM heap'
INFO - [session-271] Done processing runs
INFO - Report:
      Backend  Total RAM  RAM data  RAM zero-init data  RAM stack  RAM heap
0  tvmaotplus       4497      2493                 244        720      1040
1       tvmrt     141397      2501              132256       5604      1036


It can be seen, that for this simple benchmark, the `tvmrt` backend uses approx. 8 times more stack than `tvmaotplus`. However this is probably neglectible compared to the total RAM usage in this scenario.

### B) Python Scripting

TODO

Use pandas instead of postprocess

In [3]:
from tempfile import TemporaryDirectory
from pathlib import Path
import pandas as pd

from mlonmcu.context.context import MlonMcuContext
from mlonmcu.session.run import RunStage

Benchmark Configuration

In [4]:
FRONTEND = "tflite"
MODEL = "sine_model"
BACKEND = "tvmaotplus"
PLATFORM = "mlif"
TARGET = "etiss_pulpino"
FEATURES = ["log_instrs"]
CONFIG = {"log_instrs.to_file": True}
POSTPROCESSES = ["analyse_instructions"]

Initialize and run a single benchmark

In [5]:
with MlonMcuContext() as context:
    session = context.create_session()
    run = session.create_run(config=CONFIG)
    run.add_features_by_name(FEATURES, context=context)
    run.add_frontend_by_name(FRONTEND, context=context)
    run.add_model_by_name(MODEL, context=context)
    run.add_backend_by_name(BACKEND, context=context)
    run.add_platform_by_name(PLATFORM, context=context)
    run.add_target_by_name(TARGET, context=context)
    run.add_postprocesses_by_name(POSTPROCESSES)
    session.process_runs(context=context)
    report = session.get_reports()
report.df

INFO - Loading environment cache from file
INFO - Successfully initialized cache
INFO - Loading extensions.py (User)
INFO - [session-241] Processing all stages
ERROR - 'builtin_function_or_method' object is not subscriptable
Traceback (most recent call last):
  File "/var/tmp/ga87puy/mlonmcu/mlonmcu/venv/lib/python3.8/site-packages/mlonmcu-0.3.0.dev0-py3.8.egg/mlonmcu/session/run.py", line 782, in process
    func()
  File "/var/tmp/ga87puy/mlonmcu/mlonmcu/venv/lib/python3.8/site-packages/mlonmcu-0.3.0.dev0-py3.8.egg/mlonmcu/session/run.py", line 549, in postprocess
    artifacts = postprocess.post_run(temp_report, self.artifacts)
  File "/var/tmp/ga87puy/mlonmcu/mlonmcu/venv/lib/python3.8/site-packages/mlonmcu-0.3.0.dev0-py3.8.egg/mlonmcu/session/run.py", line 507, in artifacts
    itertools.chain([subs[subs.keys[0]] for stage, subs in self.artifacts_per_stage.items()])
  File "/var/tmp/ga87puy/mlonmcu/mlonmcu/venv/lib/python3.8/site-packages/mlonmcu-0.3.0.dev0-py3.8.egg/mlonmcu/sessi

Unnamed: 0,Session,Run,Model,Frontend,Framework,Backend,Platform,Target,Total ROM,Total RAM,ROM read-only,ROM code,ROM misc,RAM data,RAM zero-init data,Features,Config,Postprocesses,Comment,Failing
0,241,0,sine_model,tflite,tvm,tvmaotplus,mlif,etiss_pulpino,56100,2737,4280,51676,144,2493,244,[log_instrs],"{'tflite.use_inout_data': False, 'tflite.visua...",[analyse_instructions],-,True
