# Benchmarks of Strategies for Selecting Outdated Items
This notebook contains the benchmarks related to the selection strategies for context data, which we report in our paper.
Context data are selected from the processed data and included in the next progressive computation step, such that its result approximates that of a _non-progressive_ computation over the processed data.

## Benchmark Configuration

We use the following configuration in our benchmarks:
### Test cases 
- full computation over the entire dataset (upper baseline)
- progressive computation without optimization (lower baseline)
- full computation of processed data
- progressive computation using optimization strategies

### Dataset
- NYC taxis dataset (10 Million items), stored in a compressed CSV file, loaded with DuckDB 

### Variables
- dependent variables: runtime, prediction error
- independent variables: 

## Benchmarks

### Finding an appropriate `chunk size`
The number of items in each chunk dictate the computation time for each chunk in the data, in that the more items we process, the longer the DOI computation takes.
Therefore, the first consideration in our benchmarks is to find the maximum number of items, for which the computation time remains immediate.
Prior work (see Card et al., 1991) has shown this limit to be about one second.

In the cell below, we try different chunk sizes to find the maximum items we can pass to the doi function for computations under 1s.

In [None]:
from time import time
from copy import copy
from notebook_test_case import DATA, DOI_CONFIG, PARAMETERS, PATH
from context_item_dropdown_strategy.no_context import NoContext
from outdated_item_dropdown_strategy.no_update import NoUpdate
from storage_strategy.no_storage import NoStorage
from test_case import create_test_case, StrategiesConfiguration

chunk_sizes = [10, 100, 1000, 10000]

for size in chunk_sizes:
  before = time()

  strategies_config = StrategiesConfiguration(
    "__chunk_size__",
    NoContext(DATA.n_dims, None), 
    NoUpdate(DATA.n_dims, None), 
    NoStorage(),
  )

  parameter_config = copy(PARAMETERS)
  parameter_config.chunks = 1
  parameter_config.chunk_size = size
  
  create_test_case(
    name="__test__", 
    strategies=strategies_config,
    data=DATA,
    doi=DOI_CONFIG,
    params=parameter_config,
    path=PATH,
  ).run()

  print(f"{size}: {time() - before}s")


### Computing the Baselines
#### Baseline1: Monolithic computation
The ground truth for our strategies is a full computation over the entire dataset without any chunking.
This computation naturally takes a long time to complete, which is why the progressive scenario is so much more effective from a user perspective: we get to see the data much faster.

In the context of the `BenchmarkTestCase` class, the monolithic computation corresponds with running a progression with a single chunk. 

#### Baseline2: Bigger chunks
In addition to the full computation, another important baseline is to compare ourselves against a computation that does not use any strategies, but instead uses the entire `chunk time` to compute a whole new chunk.
The idea here is to compare, whether all the context- and outdated-computations are actually valuable, or whether we could just use all resources on processing new data instead.

In [None]:
from ipywidgets import Button, Dropdown, Layout, VBox, Output
from benchmarks import MODES, get_all_test_cases, run_test_case_ground_truth, run_test_case_bigger_chunks

test_cases = get_all_test_cases()

presets = list(range(len(test_cases)))
modes = MODES

test_case_dropdown = Dropdown(
  description="Select a test case",
  style={"description_width": "initial"},
  options=presets,
  disabled=len(presets) == 0,
)
mode_dropdown = Dropdown(
  description="Select a mode",
  style={"description_width": "initial"},
  options=modes,
  disabled=len(modes) == 0,
)

start_button = Button(
  description="start",
  tooltip="launches the benchmark with the selected configuration",
  icon="check",
  layout=Layout(margin="20px 0")
)

output = Output()

@output.capture()
def start(button):
  with output:
    test_case_index = int(test_case_dropdown.value)
    mode = mode_dropdown.value
    print("ground truth:\n###")
    run_test_case_ground_truth(test_case_index, mode)
    print("\nbigger chunks:\n###")
    run_test_case_bigger_chunks(test_case_index, mode)
    print("done")

start_button.on_click(start)
VBox([test_case_dropdown, mode_dropdown, start_button, output])

### Running the test cases

In [None]:
from ipywidgets import Button, Dropdown, Layout, VBox, Output, widget_output
from IPython.display import display
from benchmarks import MODES, get_all_test_cases, run_test_case

test_cases = get_all_test_cases()

presets = list(range(len(test_cases)))
modes = MODES

start_button = Button(
  description="start",
  tooltip="launches the benchmark with the selected configuration",
  icon="check",
  layout=Layout(margin="20px 0")
)

output = Output()

@output.capture()
def start(button):
  with output:
    test_case_index = int(test_case_dropdown.value)
    mode = mode_dropdown.value
    run_test_case(test_case_index, mode)
    print("done")

start_button.on_click(start)

try:
  ui = VBox([test_case_dropdown, mode_dropdown, start_button, output])
  display(ui)
except NameError:
  print("run the benchmarking cell first (no need to press start, though)")


### Evaluation

In [None]:
from imblearn.under_sampling import RandomUnderSampler
from sklearn.metrics import jaccard_score, r2_score

rus = RandomUnderSampler(random_state=0)

def evaluate_test_case(test_case: np.ndarray, ground_truth: np.ndarray):
  # score = jaccard_score(test_case, ground_truth, average="weighted")
  score = r2_score(ground_truth, test_case)
  return score

ground_truth = results_full["doi"]
context_test_case = results_context["doi"]
baseline_test_case = results_chunked["doi"]


evaluate_test_case(baseline_test_case, ground_truth), evaluate_test_case(context_test_case, ground_truth)