
# Evaluation metrics for the Cityscapes dataset
In this Jupyter Notebook we show how to compute and visualize different evaluation metrics for the Cityscapes Pixel-Segmentation task

In [1]:
!pip install cityscapesscripts;

Collecting cityscapesscripts
  Downloading cityscapesScripts-2.2.0-py3-none-any.whl (472 kB)
[K     |████████████████████████████████| 472 kB 10.7 MB/s eta 0:00:01
Collecting pyquaternion
  Downloading pyquaternion-0.9.9-py3-none-any.whl (14 kB)
Collecting coloredlogs
  Downloading coloredlogs-15.0-py2.py3-none-any.whl (45 kB)
[K     |████████████████████████████████| 45 kB 6.9 MB/s  eta 0:00:01
Collecting humanfriendly>=9.1
  Downloading humanfriendly-9.1-py2.py3-none-any.whl (86 kB)
[K     |████████████████████████████████| 86 kB 10.0 MB/s eta 0:00:01
Installing collected packages: pyquaternion, humanfriendly, coloredlogs, cityscapesscripts
Successfully installed cityscapesscripts-2.2.0 coloredlogs-15.0 humanfriendly-9.1 pyquaternion-0.9.9


## Imports

In [1]:
import os
import torch
import zipfile
import numpy as np

import sklearn.metrics
from importlib import reload
from os.path import join as pjoin
from loaders.cityscapes import cityscapesDataset
from torchvision import transforms

In [2]:
local_path = "./Cityscapes"
pckgs_names = ["gtFine_trainvaltest.zip","leftImg8bit_trainvaltest.zip"]
dir_names = ["gtFine", "leftImg8bit"]

### Load the dataset
We assume that the datset *Cityscapes* has already been downloaded in the folder *local_path*. If this is not the case, check first the notebook [Exploring the Cityscapes dataset](./cityscapes_dataset.ipynb)

In [3]:
training_data = cityscapesDataset(local_path, split="train")

Annotations files processed


In [4]:
labels = training_data.label_ids()
labels

[-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 255]

In [5]:
y_true = [training_data[i][1] for i in range(0,10)]
y_true = torch.flatten(torch.cat(y_true))
y_true.size()

torch.Size([1310720])

In [6]:
y_pred = [training_data[i][1] for i in range(15,25)]
y_pred = torch.flatten(torch.cat(y_pred))
y_pred.size()

torch.Size([1310720])

## Evaluation report

In [17]:
import importlib
import evaluation
importlib.reload(evaluation)

<module 'evaluation' from '/project/simple-imageseg/evaluation.py'>

Create an evaluation report providing a dataloader and a model to make the predictions

Create an evaluation report providing the ground-truth values and the predictions.

In [18]:
eval_report = evaluation.EvaluationReport.from_predictions(y_true.numpy(), y_pred.numpy(), labels=list(range(19)))

We can obtain the metrics for a single targe class, in a 'one vs. rest' way

In [19]:
target_class = 2
metrics2 = eval_report.get_metrics(pos_label=target_class)
metrics2

{'accuracy': 0.7463,
 'sensitivity': 0.6746,
 'specificity': 0.7672,
 'dice_coeff': 0.5455,
 'jaccard_sim': 0.375,
 'f1_score': 0.5455}

The number of decimal places can be changed

In [20]:
eval_report.decimal_places = 2
eval_report.get_metrics(pos_label=target_class)

{'accuracy': 0.75,
 'sensitivity': 0.67,
 'specificity': 0.77,
 'dice_coeff': 0.55,
 'jaccard_sim': 0.38,
 'f1_score': 0.55}

We can also obtain the average metrics for all classes. 

In [21]:
eval_report.get_metrics(pos_label=1)

{'accuracy': 0.89,
 'sensitivity': 0.42,
 'specificity': 0.91,
 'dice_coeff': 0.25,
 'jaccard_sim': 0.14,
 'f1_score': 0.25}

In [22]:
eval_report.get_metrics(average="macro")

{'accuracy': 0.95,
 'sensitivity': 0.11,
 'specificity': 0.97,
 'dice_coeff': 0.1,
 'jaccard_sim': 0.06,
 'f1_score': 0.1}

And we can include the weight of each class to compute a weighted average

In [27]:
eval_report.weights = np.random.rand(19)
eval_report.decimal_places = 5
eval_report.get_metrics(average="weighted")

{'accuracy': 0.94654,
 'sensitivity': 0.13149,
 'specificity': 0.96806,
 'dice_coeff': 0.10474,
 'jaccard_sim': 0.06738,
 'f1_score': 0.10474}

There is no need to compute all the metrics. We can select which metrics we want:

In [28]:
eval_report.get_metrics(metrics=['accuracy', 'dice_coeff'], average="macro")

{'accuracy': 0.94756, 'dice_coeff': 0.099}

Or compute a single metric directly

In [29]:
eval_report.accuracy()

0.8929

In [30]:
eval_report.f1_score()

0.25125