In [1]:
from IPython.display import Code

# Example: Compare MIPS of RISC-V Instruction Set Simulators

Typically MLonMCU would be used to benchmark TinyML workloads on real wardware or simulators. However it's flexibility also allows some interesting experiments not directly related to Embedded ML. In the following it the performance of some RISC-V ISA Simulators is compared using the MLonMCU command line or Python API.

## Supported components

**Models:** Any (`sine_model` used below)

**Frontends:** Any (`tflite` used below)

**Frameworks/Backends:** Any (`tvmaotplus` used below)

**Platforms/Targets:** `etiss`, `spike`, `ovpsim` (`etiss` and `spike` used below)

## Prerequisites

If not done already, setup a virtual python environment and install the required packages into it. (See `requirements.txt`)

In [2]:
Code(filename="requirements.txt")

Set up MLonmCU as usual, i.e. initialize an environment and install all required dependencies. Feel free to use the following minimal `environment.yml.j2` template:

In [3]:
Code(filename="environment.yml.j2")

Do not forget to set your `MLONMCU_HOME` environment variable first if not using the default location!

## Usage

If supported by the defined target, the measured MIPS (of the Simulation) is part of the report printed/returned my MLonMCU. The following shows you how to get rid of unwanted further information and how to increase the accuracy of the MIPS value.

### A) Command Line Interface

Let's start with an example benchmark of two models using 2 different RISC-V simulators:

In [4]:
!mlonmcu flow run resnet toycar --backend tvmaot --target etiss_pulpino --target spike -c run.export_optional=1

INFO - Loading environment cache from file
INFO - Successfully initialized cache


INFO -  Processing stage LOAD


INFO -  Processing stage BUILD


INFO -  Processing stage COMPILE


INFO -  Processing stage RUN


INFO - All runs completed successfuly!
INFO - Postprocessing session report
INFO - Done processing runs


INFO - Report:
   Session  Run   Model Frontend Framework Backend Platform         Target  Setup Cycles  Setup Instructions  Run Cycles  Run Instructions  Total Cycles  Total Instructions  Setup CPI  Run CPI  Total CPI  Simulated Instructions        MIPS  Total ROM  Total RAM  ROM read-only  ROM code  ROM misc  RAM data  RAM zero-init data  Validation  Run Stage Time [s]  Compile Stage Time [s]  Workspace Size [B]  Build Stage Time [s]  Load Stage Time [s] Features                                             Config Postprocesses Comment
0        0    0  resnet   tflite       tvm  tvmaot     mlif  etiss_pulpino            64                  64    81653242          81653242      81653506            81653506        1.0      1.0        1.0                81760000   68.163300     228266     124824         162192     65930       144      2484              122340        True            1.352831                1.636709              119120              7.254674             0.000229       [] 

The MIPS value can be found in the column next to the Cycles (which are in this case actually counting instructions). However there is a lot of further information we want to filter out next. This can be achieved using the `filter_cols` subprocess.

In [5]:
!mlonmcu flow run resnet toycar --backend tvmaot --target etiss_pulpino --target spike --postprocess filter_cols --config filter_cols.keep="Model,Target,MIPS" -c run.export_optional=1

INFO - Loading environment cache from file
INFO - Successfully initialized cache


INFO - [session-1]  Processing stage LOAD


INFO - [session-1]  Processing stage BUILD


INFO - [session-1]  Processing stage COMPILE


INFO - [session-1]  Processing stage RUN


INFO - [session-1]  Processing stage POSTPROCESS


INFO - All runs completed successfuly!
INFO - Postprocessing session report
INFO - [session-1] Done processing runs
INFO - Report:
    Model         Target        MIPS
0  resnet  etiss_pulpino   68.471200
1  resnet          spike  222.242024
2  toycar  etiss_pulpino    2.644330
3  toycar          spike   59.961392


That looks much more clean! However the numbers seem quite low, especially for the smaller `toycar` (MLPerfTiny Anomaly Detection) model. Let's see if the MIPS will increase when running more than a single inference. We are using the `benchmark` feature for this.

*Hint*: Since we are now running our benchmarks 60 times more often, the following cell will likely need a few minutes to execute.

In [6]:
!mlonmcu flow run resnet toycar --backend tvmaot --target etiss_pulpino --target spike --postprocess config2cols --postprocess filter_cols --config filter_cols.keep="Model,Target,MIPS,config_benchmark.num_runs" --feature benchmark --config-gen benchmark.num_runs=1 --config-gen benchmark.num_runs=10 --config-gen benchmark.num_runs=50 -c run.export_optional=1

INFO - Loading environment cache from file
INFO - Successfully initialized cache


INFO - [session-2]  Processing stage LOAD


INFO - [session-2]  Processing stage BUILD


INFO - [session-2]  Processing stage COMPILE


INFO - [session-2]  Processing stage RUN


INFO - [session-2]  Processing stage POSTPROCESS


INFO - All runs completed successfuly!
INFO - Postprocessing session report


INFO - [session-2] Done processing runs
INFO - Report:
     Model         Target        MIPS config_benchmark.num_runs
0   resnet  etiss_pulpino   68.036500                         1
1   resnet          spike  223.966865                         1
2   resnet  etiss_pulpino   99.258600                        10
3   resnet          spike  232.817709                        10
4   resnet  etiss_pulpino  102.322000                        50
5   resnet          spike  236.055999                        50
6   toycar  etiss_pulpino    2.648910                         1
7   toycar          spike   60.328431                         1
8   toycar  etiss_pulpino   21.536200                        10
9   toycar          spike  160.189544                        10
10  toycar  etiss_pulpino   62.784500                        50
11  toycar          spike  233.067141                        50


This look more promising. This experiment shows MIPS measurements might not be accurate for short-running simulations. Also spike seems to be more than twice as fast compared to ETISS.

### B) Python Scripting

Some imports

In [7]:
from tempfile import TemporaryDirectory
from pathlib import Path
import pandas as pd

from mlonmcu.context.context import MlonMcuContext
from mlonmcu.session.run import RunStage

Benchmark Configuration

In [8]:
FRONTEND = "tflite"
MODELS = ["resnet", "toycar"]
BACKEND = "tvmaot"
PLATFORM = "mlif"
TARGETS = ["etiss_pulpino", "spike"]
POSTPROCESSES = ["config2cols", "filter_cols"]
FEATURES = ["benchmark"]
CONFIG = {
    "filter_cols.keep": ["Model", "Target", "MIPS", "config_benchmark.num_runs"], "run.export_optional": True
}

Initialize and run a single benchmark

In [9]:
with MlonMcuContext() as context:
    with context.create_session() as session:
        for model in MODELS:
            for target in TARGETS:
                def helper(session, num=0):
                    cfg = CONFIG.copy()
                    cfg["benchmark.num_runs"] = num
                    run = session.create_run(config=cfg)
                    run.add_frontend_by_name(FRONTEND, context=context)
                    run.add_features_by_name(FEATURES, context=context)
                    run.add_model_by_name(model, context=context)
                    run.add_backend_by_name(BACKEND, context=context)
                    run.add_platform_by_name(PLATFORM, context=context)
                    run.add_target_by_name(target, context=context)
                    run.add_postprocesses_by_name(POSTPROCESSES)
                for num in [1, 10]:  # Removed 50 to cut down runtime
                    helper(session, num)
        session.process_runs(context=context)
        report = session.get_reports()
assert not session.failing
report.df

INFO - Loading environment cache from file


INFO - Successfully initialized cache


INFO - [session-3] Processing all stages


INFO - All runs completed successfuly!


INFO - Postprocessing session report


INFO - [session-3] Done processing runs


Unnamed: 0,Model,Target,MIPS,config_benchmark.num_runs
0,resnet,etiss_pulpino,68.4654,1
1,resnet,etiss_pulpino,99.361,10
2,resnet,spike,219.865623,1
3,resnet,spike,235.690099,10
4,toycar,etiss_pulpino,2.65147,1
5,toycar,etiss_pulpino,21.6002,10
6,toycar,spike,58.500239,1
7,toycar,spike,162.6548,10
