
# Evaluation metrics for the Cityscapes dataset
In this Jupyter Notebook we show how to compute and visualize different evaluation metrics for the Cityscapes Pixel-Segmentation task

In [None]:
!pip install cityscapesscripts;

## Imports

In [1]:
import os
import torch
import zipfile
import numpy as np

from sklearn.metrics import accuracy_score
from importlib import reload
from os.path import join as pjoin
from datasets.cityscapes import cityscapesDataset
from torchvision import transforms

In [2]:
local_path = "./Cityscapes"
pckgs_names = ["gtFine_trainvaltest.zip","leftImg8bit_trainvaltest.zip"]
dir_names = ["gtFine", "leftImg8bit"]

### Load the dataset
We assume that the datset *Cityscapes* has already been downloaded in the folder *local_path*. If this is not the case, check first the notebook [Exploring the Cityscapes dataset](./cityscapes_dataset.ipynb)

In [3]:
training_data = cityscapesDataset(local_path, split="train")

Annotations files processed


In [4]:
labels = training_data.label_ids()
labels

[-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 255]

In [5]:
y_true = [training_data[i][1] for i in range(0,10)]
y_true = torch.flatten(torch.cat(y_true))
y_true.size()

torch.Size([1310720])

In [6]:
y_pred = [training_data[i][1] for i in range(5,15)]
y_pred = torch.flatten(torch.cat(y_pred))
y_pred.size()

torch.Size([1310720])

In [7]:
labels = np.unique(y_true.numpy())

## Evaluation report

In [30]:
import importlib
import evaluation
importlib.reload(evaluation)

<module 'evaluation' from '/project/simple-imageseg/evaluation.py'>

Create an evaluation report providing a dataloader and a model to make the predictions

Create an evaluation report providing the ground-truth values and the predictions.

In [31]:
eval_report = evaluation.EvaluationReport.from_predictions(y_true.numpy(), y_pred.numpy(), labels)

We can obtain the metrics for a single targe class, in a 'one vs. rest' way

In [22]:
target_class = 1
metrics2 = eval_report.get_metrics(pos_label=target_class)
metrics2

{'accuracy': 0.9274,
 'sensitivity': 0.0812,
 'specificity': 0.9672,
 'dice_coeff': 0.0912,
 'jaccard_sim': 0.0478,
 'precision': 0.1041,
 'recall': 0.0812,
 'f1_score': 0.0912}

The number of decimal places can be changed

In [23]:
eval_report.decimal_places = 10
eval_report.get_metrics(pos_label=target_class)

{'accuracy': 0.9274479659,
 'sensitivity': 0.0811904272,
 'specificity': 0.9671959811,
 'dice_coeff': 0.0912453761,
 'jaccard_sim': 0.0478036176,
 'precision': 0.1041428446,
 'recall': 0.0811904272,
 'f1_score': 0.0912453761}

We can also obtain the average metrics for all classes. 

In [24]:
eval_report.get_metrics(average="macro")

{'accuracy': 0.944132954,
 'sensitivity': 0.1742704216,
 'specificity': 0.966371336,
 'dice_coeff': 0.1647542206,
 'jaccard_sim': 0.116726126,
 'precision': 0.1658803216,
 'recall': 0.1742704216,
 'f1_score': 0.1647542206}

And we can include the weight of each class to compute a weighted average

In [25]:
eval_report.get_metrics(average="weighted")

{'accuracy': 0.8473626705,
 'sensitivity': 0.5251301093,
 'specificity': 0.9031826034,
 'dice_coeff': 0.5287310676,
 'jaccard_sim': 0.409557238,
 'precision': 0.5386192145,
 'recall': 0.5251301093,
 'f1_score': 0.5287310676}

There is no need to compute all the metrics. We can select which metrics we want:

In [26]:
eval_report.get_metrics(metrics=['accuracy', 'dice_coeff'], average="macro")

{'accuracy': 0.944132954, 'dice_coeff': 0.1647542206}

Or compute a single metric directly

In [27]:
eval_report.accuracy(average="macro")

0.944132954

In [28]:
eval_report.f1_score(average="weighted")

0.5287310676

If we want to obtain the global accuracy, we can directly use **sklearn.metrics.accuracy_score**:

In [29]:
accuracy_score(y_true, y_pred)

0.5209365844726562