# Tabularization Refactoring Experiments

In [1]:
import numpy as np
from darts.utils.data.tabularization import (
    _create_lagged_data,
    create_lagged_training_data,
    create_lagged_prediction_data,
)
from darts.utils.timeseries_generation import linear_timeseries
from tab_experiment_utils import perform_benchmarks, perform_profiling

## Bugs in Current Implementation

The currently implemented `_create_lagged_data` does not correctly handle cases where training/prediction observations can be created for times that are *not* included in all of the provided time series. The easiest way to illustrate what I mean by this is through an example; suppose we wanted to create training/prediction data using the following three time series:

In [2]:
# Timeseries to create data from:
target_series = linear_timeseries(start=0, length=6, freq=1)
past_covariates = linear_timeseries(start=0, length=4, freq=1)
future_covariates = linear_timeseries(start=0, length=4, freq=1)
# Lags to use for each timeseries:
lags = [-1]
lags_past_covariates = [-3]
lags_future_covariates = [-3]

For the sake of clarity, let's 'visualise' what each of these timeseries look like - the values are shown in the top row, and the time index is shown in the bottom row of each array:

In [3]:
print(np.stack([target_series.all_values().squeeze(), target_series.time_index]))

[[0.  0.2 0.4 0.6 0.8 1. ]
 [0.  1.  2.  3.  4.  5. ]]


In [4]:
print(np.stack([past_covariates.all_values().squeeze(), past_covariates.time_index]))

[[0.         0.33333333 0.66666667 1.        ]
 [0.         1.         2.         3.        ]]


In [5]:
print(
    np.stack([future_covariates.all_values().squeeze(), future_covariates.time_index])
)

[[0.         0.33333333 0.66666667 1.        ]
 [0.         1.         2.         3.        ]]


In this scenario, we're actually *able* create create training data for the time point `5`, even though the `past_covariates` and `future_covariates` timeseries only go up to the `3` time point. This is because constructing the feature vector for time `5` only requires us to know the values of `past_covariates` and `future_covariates` `[-3]` lags away from time `5` (i.e. their values at time `2`). Indeed, `create_lagged_training_data` (i.e. one of the newly implemented functions) returns:

In [6]:
X, y, times = create_lagged_training_data(
    target_series=target_series,
    output_chunk_length=1,
    past_covariates=past_covariates,
    future_covariates=future_covariates,
    lags=lags,
    lags_past_covariates=lags_past_covariates,
    lags_future_covariates=lags_future_covariates,
    max_samples_per_ts=1,
)
print(f"X = {X[:,:,0]}")
print(f"y = {y[:,:,0]}")
print(f"times = {list(times)}")

X = [[0.8        0.66666667 0.66666667]]
y = [[1.]]
times = [5]


i.e. the time `5` value of `target_series` is used as the `y` label for the 'latest' possible sample. If we use these inputs with the current implementation, however:

In [7]:
X, y, times = _create_lagged_data(
    target_series=target_series,
    output_chunk_length=1,
    past_covariates=past_covariates,
    future_covariates=future_covariates,
    lags=lags,
    lags_past_covariates=lags_past_covariates,
    lags_future_covariates=lags_future_covariates,
    max_samples_per_ts=1,
    is_training=True,
)
print(f"X = {X[:,:]}")
print(f"y = {y[:,:]}")
print(f"times = {list(times[0])}")

X = [[0.4 0.  0. ]]
y = [[0.6]]
times = [3]


i.e. the 'lastest' observation produced is for time `3`, which is the latest times present in `past_covariates` and `future_covariates`. This illustrates that the currently implemented `_create_lagged_data` is does not return all of the training samples it *could* possible construct.

In the case of constructing prediction features, we don't need to concern ourselves with creating a label for each generated feature, which means that we're able to create prediction features for times that extend even beyond the times contained in `target_series`. In the current example, we're actually able to construct features to predict the series at time `6`, since predicting the series at this time only requires us to know the value of `target_series` `[-1]` lags away (i.e. at time `5`), and the values of `past_covariates` and `future_covariates` `[-3]` lags away (i.e. at time `3`). Let's see this by calling `create_lagged_prediction_data` (i.e. another newly implemented function):

In [8]:
X, times = create_lagged_prediction_data(
    target_series=target_series,
    past_covariates=past_covariates,
    future_covariates=future_covariates,
    lags=lags,
    lags_past_covariates=lags_past_covariates,
    lags_future_covariates=lags_future_covariates,
    max_samples_per_ts=1,
)
print(f"X = {X[:,:,0]}")
print(f"times = {list(times)}")

X = [[1. 1. 1.]]
times = [6]


Here we see that the last values of `target_series`, `past_covariates`, and `future_covariates` can be used to construct an `X` array that can predict the series values at time `6`, as we expected. When calling the currently implemented `_create_lagged_data`, one finds:

In [9]:
X, _, times = _create_lagged_data(
    target_series=target_series,
    output_chunk_length=1,
    past_covariates=past_covariates,
    future_covariates=future_covariates,
    lags=lags,
    lags_past_covariates=lags_past_covariates,
    lags_future_covariates=lags_future_covariates,
    max_samples_per_ts=1,
    is_training=False,
)
print(f"X = {X[:,:]}")
print(f"times = {list(times[0])}")

X = [[0.8        0.66666667 0.66666667]]
times = [5]


i.e. `_create_lagged_data` is unable to create the prediction data point for time `6`; instead, it only returns the prediction data point for time `5`.

## Speed Benchmarks

In this section, a few different benchmarks of `_create_lagged_data` vs `create_lagged_training_data` are performed. It should be noted that these benchmarks are slightly biased *against* the current implementations for benchmarks where `max_samples_per_ts` is set to `None`, since the newly implemented functions are constructing more training observations than the current implementation (see the previous section to understand why). 

### Small Number of Lags, No `max_samples_per_ts`

In [10]:
perform_benchmarks(
    num_repeats=1000,
    use_range_idx=False,
    multi_models=True,
    lags=[-1],
    lags_past_covariates=[-2],
    lags_future_covariates=[-3],
    output_chunk_length=10,
    max_samples_per_ts=None,
    check_inputs=False,
)

With Unequal Frequency Timeseries:
Current implementation: 19.23994779586792 secs for 1000 repetitions
New implementation: 1.354933500289917 secs for 1000 repetitions
Speed up = 14.199920359007377 fold

With Equal Frequency Timeseries, Using Moving Windows:
Current implementation: 10.251362085342407 secs for 1000 repetitions
New implementation: 1.0340075492858887 secs for 1000 repetitions
Speed up = 9.914204294177788 fold

With Equal Frequency Timeseries, Using Time Intersections:
Current implementation: 10.053273677825928 secs for 1000 repetitions
New implementation: 3.6990296840667725 secs for 1000 repetitions
Speed up = 2.717813733998803 fold



### Small Number of Lags, `max_samples_per_ts` = 10

In [11]:
perform_benchmarks(
    num_repeats=1000,
    use_range_idx=False,
    multi_models=True,
    lags=[-1],
    lags_past_covariates=[-2],
    lags_future_covariates=[-3],
    output_chunk_length=10,
    max_samples_per_ts=10,
)

With Unequal Frequency Timeseries:
Current implementation: 20.565967321395874 secs for 1000 repetitions
New implementation: 0.9422202110290527 secs for 1000 repetitions
Speed up = 21.82713454950685 fold

With Equal Frequency Timeseries, Using Moving Windows:
Current implementation: 10.45595407485962 secs for 1000 repetitions
New implementation: 0.7731902599334717 secs for 1000 repetitions
Speed up = 13.523132166407903 fold

With Equal Frequency Timeseries, Using Time Intersections:
Current implementation: 10.36695122718811 secs for 1000 repetitions
New implementation: 1.0335073471069336 secs for 1000 repetitions
Speed up = 10.030844247220893 fold



### Large Number of Lags, No `max_samples_per_ts`

In [12]:
# Use fewer repeats here for sake of brevity (these benchmarks take longer):
perform_benchmarks(
    num_repeats=100,
    use_range_idx=False,
    multi_models=True,
    lags=range(-30, 0, 1),
    lags_past_covariates=range(-62, 0, 2),
    lags_future_covariates=range(-100, 0, 3),
    output_chunk_length=10,
    max_samples_per_ts=None,
)

With Unequal Frequency Timeseries:
Current implementation: 29.16094398498535 secs for 100 repetitions
New implementation: 0.6532535552978516 secs for 100 repetitions
Speed up = 44.63954883749449 fold

With Equal Frequency Timeseries, Using Moving Windows:
Current implementation: 10.20738697052002 secs for 100 repetitions
New implementation: 2.484281539916992 secs for 100 repetitions
Speed up = 4.108788318276149 fold

With Equal Frequency Timeseries, Using Time Intersections:
Current implementation: 9.86971116065979 secs for 100 repetitions
New implementation: 2.7644736766815186 secs for 100 repetitions
Speed up = 3.570195384355194 fold



### Large Number of Lags, `max_samples_per_ts` = 10

In [13]:
# Use fewer repeats here for sake of brevity (these benchmarks take longer):
perform_benchmarks(
    num_repeats=100,
    use_range_idx=False,
    multi_models=True,
    lags=range(-30, 0, 1),
    lags_past_covariates=range(-62, 0, 2),
    lags_future_covariates=range(-100, 0, 3),
    output_chunk_length=10,
    max_samples_per_ts=10,
)

With Unequal Frequency Timeseries:
Current implementation: 28.617480516433716 secs for 100 repetitions
New implementation: 0.11005711555480957 secs for 100 repetitions
Speed up = 260.02390097332614 fold

With Equal Frequency Timeseries, Using Moving Windows:
Current implementation: 9.818582773208618 secs for 100 repetitions
New implementation: 0.07866573333740234 secs for 100 repetitions
Speed up = 124.81397371707057 fold

With Equal Frequency Timeseries, Using Time Intersections:
Current implementation: 9.515854835510254 secs for 100 repetitions
New implementation: 0.10745787620544434 secs for 100 repetitions
Speed up = 88.55427979348185 fold



## Profiling

To help understand which parts of `create_lagged_training_data` contribute most to run time, we'll profile 1000 repeated calls to this function.

### When Using 'Moving Windows' Method

In [14]:
perform_profiling(
    num_repeats=1000,
    lags=[-1],
    lags_past_covariates=[-1, -2],
    lags_future_covariates=[-2],
    output_chunk_length=1,
    max_samples_per_ts=None,
    multi_models=False,
    use_moving_windows=True,
    equal_freq=True,
    num_timesteps=10000,
)

         727002 function calls (717002 primitive calls) in 1.178 seconds

   Ordered by: cumulative time
   List reduced from 105 to 14 due to restriction <'tabularization.py'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    0.006    0.000    1.178    0.001 /home/mabilton/Documents/Programming/darts/darts/utils/data/tabularization.py:231(create_lagged_data)
     1000    0.062    0.000    1.166    0.001 /home/mabilton/Documents/Programming/darts/darts/utils/data/tabularization.py:455(_create_lagged_data_by_moving_window)
     3000    0.099    0.000    0.317    0.000 /home/mabilton/Documents/Programming/darts/darts/utils/data/tabularization.py:579(_extract_lagged_vals_from_windows)
     1000    0.019    0.000    0.161    0.000 /home/mabilton/Documents/Programming/darts/darts/utils/data/tabularization.py:706(get_feature_times)
     3000    0.007    0.000    0.053    0.000 /home/mabilton/Documents/Programming/darts/darts/utils/data/tabularization.py:10

### When Using 'Time Intersection' Method

In [15]:
perform_profiling(
    num_repeats=1000,
    lags=[-1],
    lags_past_covariates=[-1, -2],
    lags_future_covariates=[-2],
    output_chunk_length=1,
    max_samples_per_ts=None,
    multi_models=False,
    use_moving_windows=False,
    equal_freq=True,
    num_timesteps=10000,
)

         1099006 function calls (1072004 primitive calls) in 2.325 seconds

   Ordered by: cumulative time
   List reduced from 139 to 11 due to restriction <'tabularization.py'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    0.005    0.000    2.325    0.002 /home/mabilton/Documents/Programming/darts/darts/utils/data/tabularization.py:231(create_lagged_data)
     1000    0.897    0.001    2.319    0.002 /home/mabilton/Documents/Programming/darts/darts/utils/data/tabularization.py:605(_create_lagged_data_by_intersecting_times)
     1000    0.003    0.000    0.174    0.000 /home/mabilton/Documents/Programming/darts/darts/utils/data/tabularization.py:851(get_shared_times)
     2000    0.003    0.000    0.167    0.000 /home/mabilton/Documents/Programming/darts/darts/utils/data/tabularization.py:882(intersection_func)
     1000    0.019    0.000    0.158    0.000 /home/mabilton/Documents/Programming/darts/darts/utils/data/tabularization.py:706(get_feat