#**Model evaluation**

There are 1112 models available to download using the timm package. And many more available on github, huggingface or other sources. In some cases the performance metrics of the models are not properly declared by the repositiories (I found a bug in one of the papers when I was writing this notebook).

The goal of this notebook is to provide a place to evaluate models. This should help to select models that are similar to each other in terms of computational performance but with different architectures.

## Metrics

There are several metrics that are compared:

**FLOPs** which stands for Floating Point Operations Per Second, is a measure how many floating point operations are done.

**MACs** which stands for Multiply-ACcumulate operations, is a measure how many multiply and accumulates we perform.

Both of therse are used to evaluate neural networks. Both aren't perfect but allow to predict the computational requirements.


**parameters count** is a metric that everyone knows well. Unfortunetly there is no easy correlation between the count of parameters and its accuracy. There is however a correlation with inference time.

**inference time** is the time it takes the model to predict the outcome. There are two methods covered. The first one is the inference using the PyTorch. The second one is the ONNX inference.

**train time** is the last metric but due to the nature of this notebook (CPU execution) it is not evaluated here fully.

# Installation

First step is to install the necessary packages and do the basic imports.
We install the following:
- **calflops**, a package to evaluate the FLOPs, MACs and parameter counts.
- **nn-meter**, a tool to estimate the ONNX model performance on a mobile device (Microsoft repo)
- **onnx** and **onnxscript**, required to build the ONNX models

When all are installed we can proceed further to evaluate the models.

In [1]:
!pip install calflops -q

In [2]:
#!pip install timm -q

# use github timm for mobilenet v4
!pip install git+https://github.com/rwightman/pytorch-image-models.git

Collecting git+https://github.com/rwightman/pytorch-image-models.git
  Cloning https://github.com/rwightman/pytorch-image-models.git to /tmp/pip-req-build-xxkvli5n
  Running command git clone --filter=blob:none --quiet https://github.com/rwightman/pytorch-image-models.git /tmp/pip-req-build-xxkvli5n
  Resolved https://github.com/rwightman/pytorch-image-models.git to commit 20fe56bd9072af61d9f5404ce8b08e24ff10a807
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [3]:
!pip install pandas -q

# Evaluate computational metrics



TIMM package is used as an exclusive source of the models. Model evaluation can be done on a single model at a time. The model name from timm is used as the key. The outcomes are stored (FLOPs) or can be stored (in case of inference for example) in a file (csv).

## Helpers

**getModel** function is used to load the models and their shape data.

**LogBook** class is used to instantinate the **log** that contains the experiment data.

WARNING: access to GDrive is required (to store the log).

In [4]:
# use default 299 size of the DF20 dataset
import timm

def getModel(model_name, input_size=(3, 299, 299)):
  model = timm.create_model(model_name, pretrained=True)
  size = model.pretrained_cfg['input_size']
  if input_size != None:
    # use own input size
    size = input_size
  return model, size

In [7]:
# open or create a df for the results
import pandas as pd
from pathlib import Path
import math

class LogBook:
  def __init__(self, file_path='model_results.csv'):
    self.file_path = file_path
    if not Path(file_path).exists():
      print('Create file')
      self.df = pd.DataFrame(columns=['model_name', 'input_size', 'flops', 'macs', 'params', 'flops_back', 'macs_back', 'inference_ms', 'train_b32ps'])
      self.df.set_index(['model_name','input_size'], inplace=True)
    else:
      print('Read file')
      self.df = pd.read_csv(file_path)
      self.df.set_index(['model_name','input_size'], inplace=True)
    #print(self.df.head())

  def save(self):
    self.df.to_csv(self.file_path, index=True)
    print('Saved!')

  def new_entry(self, model_name, input_size):
    print("New entry 2")
    print(len(self.df.columns))
    new_data = {'flops': math.nan, 'macs': math.nan, 'params': math.nan,
                'flops_back': math.nan, 'macs_back': math.nan,
                'inference_ms': math.nan, 'train_sps': math.nan}
    self.df.loc[(model_name, input_size)] = new_data
    print(self.df.head())
    self.save()

  def add(self, model_name, input_size, flops, macs, params, flops_back, macs_back, inference_torch_ms):
    self.df.loc[model_name, input_size] = [flops, macs, params, flops_back, macs_back, inference_torch_ms]
    self.save()

  def add(self, model_name, input_size, **kwargs):
    for key, value in kwargs.items():
        self.df.at[(model_name, input_size), key] = value
    self.save()
    self.df.head()



In [6]:
# mount GDrive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [8]:
# make the log
log = LogBook(file_path='/content/drive/MyDrive/Fungi/model_results.csv')
#log.add('bruum22', 299, flops=27.233, params=14.2)
#log.df.head()

Create file


# Experiment setup

Results from all the runs are stored in the CSV file. The file is conviniently located on the GDrive. The keys to the table are **model name** and **input size** (only a single edge, in pix).

The following parameters are recorded:
- GFLOPs
- GMACs
- MParams
- GFLOPs with backprop
- GMACs with backprop
- inference time in ms using CPU - for a single image
- training time using GPU (L4 on Colab) - measure the average train time on batch size 32


In [41]:
# preview experiments
log.df.sort_index(level=[0,1], ascending=True).head()
#log.df.sort_index(level=[1,0], ascending=True).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,flops,macs,params,flops_back,macs_back,inference_ms,train_b32ps
model_name,input_size,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
efficientnet_b0.ra_in1k,224,0.79,0.39,5.29,2.37,1.16,22.93,11.39
efficientnet_b0.ra_in1k,299,1.5,0.73,5.29,4.49,2.19,31.29,5.67
mobilenetv2_100.ra_in1k,224,0.61,0.3,3.5,1.84,0.9,17.29,15.69
mobilenetv2_100.ra_in1k,299,1.15,0.56,3.5,3.46,1.69,24.05,7.56


Experiment is set up by passing the name of the model and its input shape. Both of these values are keys in the dataframe. The log entries are made automatically. Remember there can be only one entry for the keys (name, shape).

In [31]:
# place the model name here...
#model_name = "efficientnet_b0.ra_in1k"
model_name = "mobilenetv2_100.ra_in1k"

# define the size of the input as a single number (HxW where H=W eg. 299)
# if = None the model input size is used. Not all models support custom input size.
input_image_size = (3, 299, 299)  # eg. (3, 299, 299)

# make a new entry

# Computation cost tests

This section calculates the FLOPs, MACs and MParams for the model.

The computational cost is calculated using the calflops package. First for just the forward pass and in the next step for the backpropagation.

In [32]:
import timm
from calflops import calculate_flops

In [33]:
batch_size = 1

model, input_shape = getModel(model_name, input_image_size)
input_shape = (batch_size, *input_shape)    # include batch size (picture count)
#print(input_shape)

flops, macs, params = calculate_flops(model=model,
                                      input_shape=input_shape,
                                      output_as_string=False,
                                      include_backPropagation=False)

flops = round(flops/10**9,2)   # convert to GFLOPs
macs = round(macs/10**9,2)     # convert to GMACs
params = round(params/10**6,2) # convert to millions

print(f"FLOPs: {flops}, MACs: {macs}, MParams: {params}")
log.add(model_name, input_shape[-1], flops=flops, macs=macs, params=params)

flops, macs, params = calculate_flops(model=model,
                                      input_shape=input_shape,
                                      output_as_string=False,
                                      include_backPropagation=True)

flops = round(flops/10**9,2)   # convert to GFLOPs
macs = round(macs/10**9,2)     # convert to GMACs
params = round(params/10**6,2) # convert to millions

print(f"+backprop FLOPs: {flops}, MACs: {macs}, MParams: {params}")
log.add(model_name, input_shape[-1], flops_back=flops, macs_back=macs)



------------------------------------- Calculate Flops Results -------------------------------------
Notations:
number of parameters (Params), number of multiply-accumulate operations(MACs),
number of floating-point operations (FLOPs), floating-point operations per second (FLOPS),
fwd FLOPs (model forward propagation FLOPs), bwd FLOPs (model backward propagation FLOPs),
default model backpropagation takes 2.00 times as much computation as forward propagation.

Total Training Params:                                                  3.5 M   
fwd MACs:                                                               564.41 MMACs
fwd FLOPs:                                                              1.15 GFLOPS
fwd+bwd MACs:                                                           1.69 GMACs
fwd+bwd FLOPs:                                                          3.46 GFLOPS

-------------------------------- Detailed Calculated FLOPs Results --------------------------------
Each module cacul

# Measure inference on device

This section allow to measure the inference time on a simulated device using the nn-meter package from Microsoft.

Use the microsoft's nn-meter to predict latency just to check how good we are with the model.

In [None]:
!pip install nn-meter

Collecting nn-meter
  Downloading nn_meter-2.0-py3-none-any.whl (132 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/132.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━[0m [32m92.2/132.5 kB[0m [31m2.6 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.5/132.5 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting jsonlines (from nn-meter)
  Downloading jsonlines-4.0.0-py3-none-any.whl (8.7 kB)
Installing collected packages: jsonlines, nn-meter
Successfully installed jsonlines-4.0.0 nn-meter-2.0


In [None]:
!nn-meter --list-predictors

(nn-Meter) Supported latency predictors:
(nn-Meter) [Predictor] cortexA76cpu_tflite21: version=1.0
(nn-Meter) [Predictor] adreno640gpu_tflite21: version=1.0
(nn-Meter) [Predictor] adreno630gpu_tflite21: version=1.0
(nn-Meter) [Predictor] myriadvpu_openvino2019r2: version=1.0


In [None]:
# save a model as onnx
!pip install onnx
!pip install onnxscript

Collecting onnx
  Downloading onnx-1.16.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.9/15.9 MB[0m [31m42.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: onnx
Successfully installed onnx-1.16.1
Collecting onnxscript
  Downloading onnxscript-0.1.0.dev20240620-py3-none-any.whl (632 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m632.4/632.4 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: onnxscript
Successfully installed onnxscript-0.1.0.dev20240620


In [None]:
import torch.onnx

#Function to Convert to ONNX
def Convert_ONNX(model, input_size):

    # set the model to inference mode
    model.eval()

    # Let's create a dummy input tensor
    dummy_input = torch.randn(*input_size, requires_grad=True)

    # Export the model
    torch.onnx.export(model,         # model being run
         dummy_input,       # model input (or a tuple for multiple inputs)
         "ImageClassifier.onnx",       # where to save the model
         export_params=True,  # store the trained parameter weights inside the model file
         opset_version=10,    # the ONNX version to export the model to
         do_constant_folding=True,  # whether to execute constant folding for optimization
         input_names = ['modelInput'],   # the model's input names
         output_names = ['modelOutput'], # the model's output names
         dynamic_axes={'modelInput' : {0 : 'batch_size'},    # variable length axes
                                'modelOutput' : {0 : 'batch_size'}})
    print(" ")
    print('Model has been converted to ONNX')

In [None]:
import torch
#onnx_program = torch.onnx.dynamo_export(model, input_shape)
print(input_shape)
Convert_ONNX(model, input_shape)

(1, 3, 224, 224)


  assert weight.shape[1] * groups == input.shape[1]
  conv_per_position_macs = int(_prod(kernel_dims)) * in_channels * filters_per_channel
  active_elements_count = batch_size * int(_prod(output_dims))
  return int(overall_conv_flops + bias_flops), int(overall_conv_macs)
  if module_mac_count and macs:


 
Model has been converted to ONNX


Try out the onnx model

In [None]:
!nn-meter predict --predictor cortexA76cpu_tflite21 --predictor-version 1.0 --onnx ImageClassifier.onnx

(nn-Meter) checking local kernel predictors at /root/.nn_meter/data/predictor/cortexA76cpu_tflite21
(nn-Meter) Download from https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/cortexA76cpu_tflite21.zip ...
100% 376M/376M [00:17<00:00, 21.4MiB/s]
(nn-Meter) load predictor /root/.nn_meter/data/predictor/cortexA76cpu_tflite21/se.pkl
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
(nn-Meter) load predictor /root/.nn_meter/data/predictor/cortexA76cpu_tflite21/conv-bn-relu.pkl
(nn-Meter) load predictor /root/.nn_meter/data/predictor/cortexA76cpu_tflite21/add.pkl
(nn-Meter) load predictor /root/.nn_meter/data/predictor/cortexA76cpu_tflite21/avgpool.pkl
(nn-Meter) load predictor /root/.nn_meter/data/predictor/cortexA76cpu_tflite21/addrelu.pkl
(nn-Meter) load predictor /root/.nn_meter/data/predictor/cortexA76cpu_tflite21/channelshuffle.pkl
(nn-Me

# Inference benchmark in PyTorch

A simple inference benchmark to measure the prediction speed on CPU in pure PyTorch.

https://medium.com/@sgurwinderr/pytorch-model-benchmarking-obtaining-accurate-results-by-accounting-for-warmup-5cc40ed59a34

In [34]:
import time
import torch

num_inference = 100
total_inference_time = 0
input_data = torch.randn(*input_shape)  # Replace with your input data
#input_data=input_data.to("xpu") # Using Intel GPU

#model=model.to("xpu") # Using Intel GPU
#model = ipex.optimize(model) # Using Intel GPU
model.eval()

for i in range(3):  # Perform 3 warmup iterations
    with torch.no_grad():
        _ = model(input_data)

for _ in range(num_inference):
    with torch.no_grad():
        start_time = time.time()
        expected_output = model(input_data)
        end_time = time.time()
    inference_time = end_time - start_time
    total_inference_time += inference_time

average_inference_time = total_inference_time / num_inference
print(average_inference_time * 1000, 'ms')

avg_inf_time_ms = round(average_inference_time*1000,2)
log.add(model_name, input_shape[-1], inference_ms=avg_inf_time_ms)

24.048349857330322 ms
Saved!


# Train time benchmark

Benchmark a simple back propagation (make dummy data).

Due to the issues with the sizes and times use the Collab's L4 as a GPU reference.

In [35]:
import torch
import time

def execute_train(model, data, labels, epochs):
  device = "cuda" if torch.cuda.is_available() else "cpu"
  if device == "cpu":
    raise "No GPU for training."

  # move stuff to GPU
  model = model.to(device)
  data, labels = data.to(device), labels.to(device)

  optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
  loss_function = torch.nn.CrossEntropyLoss()

  for i in range(5):  # Perform 3 warmup iterations
    # Forward pass
    outputs = model(data)
    # Calculate the loss
    loss = loss_function(outputs, labels)
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    # Update the parameters
    optimizer.step()

  total_train_time = 0
  for _ in range(epochs):
    start_time = time.time()

    # Forward pass
    outputs = model(data)
    # Calculate the loss
    loss = loss_function(outputs, labels)
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    # Update the parameters
    optimizer.step()

    end_time = time.time()
    train_time = end_time - start_time
    #print(train_time)
    total_train_time += train_time
  return total_train_time / epochs


In [36]:
epochs = 10

batch_size = 32
model, shape = getModel(model_name, input_image_size)
input_shape = (batch_size, *shape)
input_data = torch.randn(*input_shape)  # Replace with your input data
labels = torch.randint(0,999, (batch_size,))    # Replace with your labels

model.train()

#print(input_data.shape)
#print(labels.shape)
avg_train_time = execute_train(model, input_data, labels, epochs)
bps = round(1/avg_train_time,2)
print(f"Batches per second: {bps}")

log.add(model_name, input_shape[-1], train_b32ps=bps)


Batches per second: 7.56
Saved!
