BENCHMARK TEST

This is a jupyter notebook to compare the masking methods written so far:
- arbitrary rectangles
- arbitrary lines
- arbitrary lines efficient <-- to test

As skimage.draw.disk was suspected to be the computationally most heavy component,
its time was also included in the test.

As the algorithms involve a lot of randomly computed numbers, 
some crucial variables had to be fixed for fairness.
Of course, the rectangle method is very different and therefore 
does not have these parameters. It is a simple algorithm, so it´s considered as a baseline here.
It is assumed that mainly the number of lines and the maximal radius of
the circles that are drawn in both methods contribute to the runtime.

Benchmark parameters:
- lines = 10
- maxRad = 10

(Could also include data for other parameters)

Others:
- Picture 1 from my personal test images was taken
- n_samples=3 in __init__() handed over as param


In [1]:
import numpy as np
import pandas as pd
from tqdm import tqdm
import matplotlib.pyplot as plt
import os

In [2]:
# Load data
eff_lines = pd.read_csv("efficient_lines_time.csv")
lines = pd.read_csv("lines_time.csv")
rectangles = pd.read_csv("rectangles_time.csv")

# Merge data
dataFrame = eff_lines.join(lines.join(rectangles))

In [3]:
# Show data
dataFrame

Unnamed: 0,NEW_exec_time,NEW_disk_time,OLD_exec_time,OLD_disk_time,REC_exec_time
0,0.112405,0.083974,0.124952,0.078482,0.042630
1,0.118061,0.092594,0.145950,0.120183,0.015403
2,0.121361,0.097085,0.102239,0.078916,0.015630
3,0.094955,0.077940,0.129797,0.105395,0.015625
4,0.117725,0.117725,0.153081,0.128688,0.015625
...,...,...,...,...,...
995,0.106305,0.082932,0.130668,0.104241,0.025626
996,0.098829,0.074365,0.104219,0.076951,0.016977
997,0.113415,0.097788,0.149478,0.102844,0.011078
998,0.117603,0.086561,0.162716,0.140434,0.019294


What questions will this benchmark test answer?
- which line mask generator is faster on average?
- what is the proportional gain in speed?
- how much time will be saved on 260000 samples (celeba)?

Optional:
- which disk drawing code executes faster on average?
- what is the proportional gain in speed?
- what is the average contribution of disk drawing time to execuction time in percent?
- plot that visualizes the execution times
- plot that visualizes the disk drawing times

In [13]:
# Compute the mean exec times
NEW_mean, NEW_disk_mean, OLD_mean, OLD_disk_mean, REC_mean = dataFrame.mean()

gain = NEW_mean*100 / OLD_mean

print(f'The mean of execution times of rectangle version is: {REC_mean}.\n')
print(f'The mean of execution times of old version is: {OLD_mean}.\n')
print(f'The mean of execution times of NEW version is: {NEW_mean}.\n')
print(f'The new algorithm is {100-gain}% faster.')

The mean of execution times of rectangle version is: 0.01863044095039367.

The mean of execution times of old version is: 0.12882680130004884.

The mean of execution times of NEW version is: 0.11704541969299315.

The new algorithm is 9.145132447723995% faster.


In [20]:
# Compute the time saved with NEW version
NEW_sum = dataFrame['NEW_exec_time'].sum()
OLD_sum = dataFrame['OLD_exec_time'].sum()
REC_sum = dataFrame['REC_exec_time'].sum()

t_per_dataset_OLD = 260000*OLD_sum / 1000
t_per_dataset_NEW = 260000*NEW_sum / 1000
t_per_dataset_REC = 260000*REC_sum / 1000

print(f'Time for 1000 samples:\n   NEW: {NEW_sum/60} min\n   OLD: {OLD_sum/60} min\n   REC: {REC_sum/60} min\n')
print(f'Time for all 260000 samples:\n   NEW: {t_per_dataset_NEW/60} min\n   OLD: {t_per_dataset_OLD/60} min\n   REC: {t_per_dataset_REC/60} min\n')
print(f'The new algorithm could save {(t_per_dataset_OLD-t_per_dataset_NEW) / 60} minutes on the whole dataset.')

Time for 1000 samples:
   NEW: 1.9507569948832189 min
   OLD: 2.1471133550008137 min
   REC: 0.31050734917322714 min

Time for all 260000 samples:
   NEW: 507.1968186696369 min
   OLD: 558.2494723002116 min
   REC: 80.73191078503905 min

The new algorithm could save 51.05265363057467 minutes on the whole dataset.


In [21]:
# Disk drawing
print(f'The mean of disk drawing times of old version is: {OLD_disk_mean}.\n')
print(f'The mean of disk drawing times of NEW version is: {NEW_disk_mean}.\n')

The mean of disk drawing times of old version is: 0.1017175018787384.

The mean of disk drawing times of NEW version is: 0.09246079540252684.

