<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [2]</a>'.</span>

In [1]:
from IPython.display import Code

# Example: Per-Layer Model Benchmarking

When investigating why a given model performs not as expected or implementing optimiations for specific types of layers, it is often useful to consider the runtime individual layers instead of the end-to-end execution time.

MLonMCU currently supports to approaches for per-layer benchmarking:
1. Using the `split_layers` feature of the `tflite` frontend
2. Using the profiling feature provided by the `tvm` and `microtvm` (WIP) platform

Both use-cases are explained briefly in the rest of this notebook.

## 1. Splitting TFLite Models into individual layers

### Supported components

**Models:** Any (`resnet` used below)

**Frontends:** `tflite` only

**Frameworks/Backends:** Any (`tvmaotplus` used below)

**Platforms/Targets:** Any (`etiss` used below)

**Features:** The `split_layers` feature of the `tflite` frontend needs to be enabled

### Prerequisites

If not done already, setup a virtual python environment and install the required packages into it. (See `requirements.txt`)

<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [2]:
Code(filename="requirements.txt")

FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'

Set up MLonmCU as usual, i.e. initialize an environment and install all required dependencies. Feel free to use the following minimal `environment.yml.j2` template:

In [None]:
Code(filename="environment.yml.j2")

Do not forget to set your `MLONMCU_HOME` environment variable first if not using the default location!

### Usage

The following examples demonstrate the `split_layers` feature recently added MLonMCU.

#### A) Command Line Interface

First define a simple benchmark of a single model/backend/target combination:

In [None]:
!mlonmcu flow run resnet --backend tvmaotplus --target etiss_pulpino

Now lets enable the `split_layers` feature:

In [None]:
!mlonmcu flow run resnet --backend tvmaotplus --target etiss_pulpino -f split_layers

The resulting report should contain the the original benchmark results (for the whole model) in the first row. The remaining 16 rows are for each of the layers found in the `resnet.tflite` model. The layer-index can be found in the 'Sub' column. The cycle count of these should roughly sum up to the total execution time measured in row one.

#### B) Python Scripting

Some imports

In [None]:
from tempfile import TemporaryDirectory
from pathlib import Path
import pandas as pd

from mlonmcu.context.context import MlonMcuContext
from mlonmcu.session.run import RunStage

Benchmark Configuration

In [None]:
FRONTEND = "tflite"
MODEL = "resnet"
BACKEND = "tvmaotplus"
PLATFORM = "mlif"
TARGET = "etiss_pulpino"
FEATURES = ["split_layers"]
CONFIG = {"filter_cols.keep": ["Sub", "Total Instructions", "Total ROM", "Total RAM"]}
POSTPROCESSES = ["filter_cols"]

Initialize and run a single benchmark

In [None]:
with MlonMcuContext() as context:
    with context.create_session() as session:
        run = session.create_run(config=CONFIG)
        run.add_features_by_name(FEATURES, context=context)
        run.add_frontend_by_name(FRONTEND, context=context)
        run.add_model_by_name(MODEL, context=context)
        run.add_backend_by_name(BACKEND, context=context)
        run.add_platform_by_name(PLATFORM, context=context)
        run.add_target_by_name(TARGET, context=context)
        run.add_postprocesses_by_name(POSTPROCESSES)
        session.process_runs(context=context)
        report = session.get_reports()
assert "Failing" not in report.df.columns
report.df

Stripping out all common data, we get this:

In [None]:
df = report.df
df.fillna("full", inplace=True)
df.set_index("Sub", inplace=True)
df

## 2. Using ~~(Micro)~~TVMs profiling functionality

Instead of splitting the model layer wise before optimization, this will use the functionality of TVMs graph runtime to benchmark individual functions conatine din the model graph. These functions do not nessessarily map directly to a single layer in the original model operator fusing is automatically performed by TVMs compilation pipeline.

### Supported components

**Models:** Any (`resnet` used below)

**Frontends:** Any frontend supported by TVM (`tflite` used below)

**Frameworks/Backends:** TVM: `tvmllvm` ~~MicroTVM: `tvmrt`~~

**Platforms/Targets:** TVM: `tvm_cpu` ~~MicroTVM: Any~~

**Features:** The `tvm_profile` feature needs to be enabled

Let's only consider the `tvm_cpu` target here until this is supported officially by upstream TVM. Hence we are profiling on the host cpu here, not on a MCU or simulator. 

### Prerequisites

If not done already, setup a virtual python environment and install the required packages into it. (See `requirements.txt`)

In [None]:
Code(filename="requirements.txt")

Set up MLonmCU as usual, i.e. initializa an environment and install all required dependencies. Feel free to use the following minimal `environment.yml.j2` template:

In [None]:
Code(filename="environment.yml.j2")

Do not forget to set your `MLONMCU_HOME` environment variable first if not using the default location!

### Usage

The following examples demonstrate the `tvm_profile` of the TVM and MicroTVM platform.

#### A) Command Line Interface

First define a simple benchmark of a single model/backend/target combination:

In [None]:
!python -m mlonmcu.cli.main flow run resnet -b tvmllvm -t tvm_cpu

To enable TVM's profiling feature just just add `-f tvm_profile` to the command line:

In [None]:
!python -m mlonmcu.cli.main flow run resnet -b tvmllvm -t tvm_cpu -f tvm_profile

Since tvm used quite long function-names, this might not be very reaible. As a last step, let's try to improve that using the `filter_cols` postprocess:

In [None]:
!python -m mlonmcu.cli.main flow run resnet -b tvmllvm -t tvm_cpu -f tvm_profile \
        --postprocess filter_cols -c filter_cols.keep="Model,Sub,Runtime [s]"

#### B) Python Scripting

Some imports

In [None]:
from tempfile import TemporaryDirectory
from pathlib import Path
import pandas as pd

from mlonmcu.context.context import MlonMcuContext
from mlonmcu.session.run import RunStage

Benchmark Configuration

In [None]:
FRONTEND = "tflite"
MODEL = "resnet"
BACKEND = "tvmllvm"
PLATFORM = "tvm"
TARGET = "tvm_cpu"
FEATURES = ["tvm_profile"]
CONFIG = {}
POSTPROCESSES = []

Initialize and run a single benchmark

In [None]:
with MlonMcuContext() as context:
    with context.create_session() as session:
        run = session.create_run(config=CONFIG)
        run.add_features_by_name(FEATURES, context=context)
        run.add_frontend_by_name(FRONTEND, context=context)
        run.add_model_by_name(MODEL, context=context)
        run.add_platform_by_name(PLATFORM, context=context)
        run.add_backend_by_name(BACKEND, context=context)
        run.add_target_by_name(TARGET, context=context)
        run.add_postprocesses_by_name(POSTPROCESSES)
        session.process_runs(context=context)
        report = session.get_reports()
assert "Failing" not in report.df.columns
report.df

After stripping it down to the essential data:

In [None]:
df = report.df
df.drop(["Session", "Run", "Frontend", "Model", "Framework", "Backend", "Platform", "Target", "Config", "Features", "Postprocesses", "Comment"], axis=1, inplace=True)
df.fillna("full", inplace=True)
df.set_index("Sub", inplace=True)
df