Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/window transformer #1269

Merged
merged 55 commits into from
Nov 26, 2022
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
4cf16f7
first commit
adamkells Aug 26, 2022
bf05165
Add some template code
adamkells Sep 9, 2022
866cc88
remove print debugging
adamkells Sep 9, 2022
4de43a8
Merge branch 'master' into feat/window-features
hrzn Sep 12, 2022
221e4b1
Merge branch 'master' into feat/window-features
eliane-maalouf Sep 19, 2022
3a1c259
Merge branch 'master' into feat/window-features
hrzn Sep 19, 2022
3bb7554
Merge branch 'master' into feat/window-features
eliane-maalouf Sep 21, 2022
b034f0b
- ForecastingWindowTransformer implementation and tests
eliane-maalouf Oct 9, 2022
ce35be2
- added UserWarning to logging.py
eliane-maalouf Oct 9, 2022
b0cdec8
- formatting files
eliane-maalouf Oct 9, 2022
c1bf337
- cleanup and formatting
eliane-maalouf Oct 10, 2022
c7a4d37
- formatting
eliane-maalouf Oct 10, 2022
9a13883
Merge branch 'master' into feat/window_transformer
eliane-maalouf Oct 10, 2022
c954627
- corrected behavior for user provided function (rolling or not rolli…
eliane-maalouf Oct 10, 2022
95dba04
- corrected lint errors
eliane-maalouf Oct 10, 2022
187412c
- corrected sorting imports
eliane-maalouf Oct 10, 2022
44b392c
Merge branch 'master' into feat/window_transformer
eliane-maalouf Oct 11, 2022
c6a8d4d
Merge branch 'master' into feat/window-features
eliane-maalouf Oct 11, 2022
087b8dc
Merge branch 'feat/window-features' into feat/window_transformer
eliane-maalouf Oct 11, 2022
8bbaecc
- removed @adamkells modifications in regression_model.py after movin…
eliane-maalouf Oct 11, 2022
36f49a9
reset regression_model.py to master version
eliane-maalouf Oct 11, 2022
31af196
Update darts/dataprocessing/transformers/window_transformer.py
eliane-maalouf Oct 17, 2022
58b3c57
Update darts/dataprocessing/transformers/window_transformer.py
eliane-maalouf Oct 17, 2022
280aab2
Update darts/dataprocessing/transformers/window_transformer.py
eliane-maalouf Oct 17, 2022
85e0a04
Update darts/dataprocessing/transformers/window_transformer.py
eliane-maalouf Oct 17, 2022
6bac75f
Merge branch 'master' into feat/window_transformer
eliane-maalouf Oct 17, 2022
5d3ea2d
added window_transform() function to TimeSeries class to allow direct…
eliane-maalouf Oct 21, 2022
faf7668
Merge branch 'master' into feat/window_transformer
eliane-maalouf Oct 23, 2022
6312334
updated ForecastingWindowTransformer class
eliane-maalouf Oct 23, 2022
eaae122
- update untitests for window transformation from TimeSeries and from…
eliane-maalouf Oct 24, 2022
79eabc4
- updated how a target time series gets
eliane-maalouf Oct 25, 2022
d5e653f
Merge branch 'master' into feat/window_transformer
eliane-maalouf Oct 28, 2022
4025bc9
Merge branch 'master' into feat/window_transformer
eliane-maalouf Oct 28, 2022
86200c1
Merge branch 'master' into feat/window_transformer
eliane-maalouf Nov 3, 2022
50e6170
Notebook example // option to suppress warnings // init update
eliane-maalouf Nov 3, 2022
821b330
formatting
eliane-maalouf Nov 3, 2022
d0b8062
improve docstring for window_transform
hrzn Nov 6, 2022
712f907
Merge branch 'master' into feat/window_transformer
eliane-maalouf Nov 9, 2022
aa1ac84
Merge branch 'master' into feat/window_transformer
eliane-maalouf Nov 18, 2022
bc1f551
Merge branch 'master' into feat/window_transformer
eliane-maalouf Nov 18, 2022
a7d232d
updated window_transform function as per review
eliane-maalouf Nov 18, 2022
ba554fc
Merge branch 'master' into feat/window_transformer
eliane-maalouf Nov 20, 2022
11a2751
updated window_transformer.py and demo notebook DRAFT_window_transfor…
eliane-maalouf Nov 20, 2022
f5bdf58
updated unittests, corrected formatting
eliane-maalouf Nov 21, 2022
b4d4742
corrected formatting and documentation
eliane-maalouf Nov 21, 2022
236f249
sort imports
eliane-maalouf Nov 21, 2022
1539945
- removed previously added user warning function
eliane-maalouf Nov 24, 2022
db128be
remove incorrect import
eliane-maalouf Nov 24, 2022
e76692a
Merge branch 'master' into feat/window_transformer
hrzn Nov 25, 2022
49da756
add docstring header in window transformer
hrzn Nov 25, 2022
8c1a84f
updated as per review, removed draft notebook
eliane-maalouf Nov 25, 2022
47df61b
- import sorting
eliane-maalouf Nov 25, 2022
a778cc1
Merge branch 'master' into feat/window_transformer
eliane-maalouf Nov 25, 2022
f0bbcec
Merge branch 'master' into feat/window_transformer
hrzn Nov 26, 2022
faac844
Merge branch 'master' into feat/window_transformer
eliane-maalouf Nov 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions darts/dataprocessing/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@
)
from .scaler import Scaler
from .static_covariates_transformer import StaticCovariatesTransformer
from .window_transformer import ForecastingWindowTransformer
258 changes: 258 additions & 0 deletions darts/dataprocessing/transformers/window_transformer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
from typing import Iterator, List, Sequence, Tuple, Union

from darts.dataprocessing.transformers import BaseDataTransformer
from darts.logging import get_logger, raise_if_not, raise_log
from darts.timeseries import TimeSeries
from darts.utils.utils import series2seq

logger = get_logger(__name__)


class ForecastingWindowTransformer(BaseDataTransformer):
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
def __init__(
self,
window_transformations: Union[dict, List[dict]],
name: str = "ForecastingWindowTransformer",
n_jobs: int = 1,
verbose: bool = False,
):
"""
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
A transformer that applies window transformation to a TimeSeries or a Sequence of TimeSeries. It expects a
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
dictionary or a list of dictionaries specifying the window transformation(s) to be applied.

Parameters
----------
window_transformations
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
A dictionary or a list of dictionaries. Each dictionary should contain at least the 'function' key.

The 'function' value should be a string with the name of one of the builtin transformation functions,
or a callable function provided by the user that can be applied to the input series
by using pandas.DatFrame.rolling object.

The two following options are available for built-in transformation functions:
1) Based on pandas.DataFrame.Rolling windows, the 'function' key should have one of
{BUILTIN_TRANSFORMS_WINDOW}.
2) Based on pandas.DataFrame.ewm (Exponentially-weighted window), the 'function' key should have one of
{BUILTIN_TRANSFORMS_EWM} prefixed by 'ewm_'. For example, 'function': 'ewm_mean', 'ewm_sum'.

The 'window' key should be provided for built-in and for user provided callable functions.
The 'window' value should be a positive integer representing the size of the window to be used for the
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
transformation.

Two optional keys can be provided for more flexibility: 'components' and 'series_id':
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
1) The 'components' key can be a string or a list of strings specifying the names of the components
of the series on which the transformation should be applied.
If not provided, the transformation will be applied to all components of the series.
2) When the input to the transformer is a sequence of TimeSeries, the 'series_id' key can be a positive
integer >= 0 or a list of postivie integers >= 0 specifying the indices of the series on which
the transformation should be applied. Series indices in the sequence start at 0.
If not provided, the transformation will be applied to all series in the sequence.
The following are possbible combination scenarios for the 'components' and 'series_id' keys:
- 'components' and 'series_id' are not provided: the transformation will be applied to all components in
all series in the sequence.
- 'components' is not provided and 'series_id' is provided: the transformation will be applied to all
components in the series specified by 'series_id'.
- 'components' is provided and 'series_id' is not provided: the transformation will be applied to the
components specified by 'components' in all series in the sequence.
- 'components' and 'series_id' are provided: the transformation will be applied to the components
specified by 'components' in the series specified by 'series_id'.
If particular components are to be transformed in particular series, the 'components' key should be a
list of lists with the same size as the 'series_id' key. For example, if 'series_id' is [0, 1] and
'components' is [['component_1'], ['component_2', 'component_3']], the transformation will be applied to
'component_1' in series 0, 'component_2' and 'component_3' in series 1.

All other dictionary items provided will be treated as keyword arguments for the function group
(i.e., pandas.DataFrame.rolling or pandas.DataFrame.ewm) or for the specific function in that group
(i.e., pandas.DataFrame.rolling.mean/std/max/min... or pandas.DataFrame.ewm.mean/std/sum).
This allows for more flexibility in configuring how the window slides over the data, by providing for
example:
'center': True/False to set the observation at the current timestep at the center of the windows
(default is False),
'closed': 'right'/'left'/'both'/'neither' to specify whether the right, left or both ends of the window are
excluded (Darts enforces default to 'left', to guarantee the outcome to be forecasting safe);
'step':int slides the window of 'step' size between each window evaluation (Darts enforces default to 1
to guarantee outcome to have same frequency as the input series).
More information on the available options for builtin functions can be found in the pandas documentation:
https://pandas.pydata.org/docs/reference/window.html

For user provided functions, extra arguments in the transformation dictionary are passed to the function.
Darts sets by default that the user provided function will receive numpy arrays as input. User can modify
this behavior by adding item 'raw':False in the transformation dictionary.
It is expected that the function returns a single value for each window.
Other possible configurations can be found in the pandas.DataFrame.Rolling().apply() documentation:
https://pandas.pydata.org/docs/reference/window.html

When calling transform(), user can pass different keyword arguments to configure the transformed series
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
output:
1) treat_na
String to specify how to treat missing values in the resulting transformed TimeSeries.
Can be 'dropna' to truncate the TimeSeries and drop observations with missing values,
'bfill' to specify that NAs should be filled with the last valid observation.
Can also be a value, in which case NAs will be filled with this value.

2) forecasting_safe
If True, Darts enforces that the resulting TimeSeries is safe to be used in forecasting models as target
or as features. This parameter guarantees that the window transformation will not include any future values
in the current timestep and will not fill NAs with future values. Default is True.
Only pandas.DataFrame.Rolling functions can be currently guaranteed to be forecasting-safe.

3) target
If forecasting_safe is True and the target TimeSeries is provided, then the target TimeSeries will be
truncated to align it with the window transformed TimeSeries.

4) keep_non_transformed
If True, the resulting TimeSeries will contain the non-transformed components along the transformed
ones. The non-transformed components maintain their original name while the transformed components are
named with the transformation name as a prefix. Default is False.

name
A specific name for the transformer.
n_jobs
The number of jobs to run in parallel. Parallel jobs are created only when a ``Sequence[TimeSeries]`` is
passed as input to a method, parallelising operations regarding different ``TimeSeries`. Defaults to `1`
verbose
Whether to print operations progress
"""
super().__init__(name, n_jobs, verbose)

# dictionary checks are mostly implemented in TimeSeries.window_transform()
# here we only need to verify that the input is not None
# and that 'series_id', if provided, is a list of positive integers

if window_transformations is None:
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
raise_log(
ValueError(
"window_transformations argument should be provided and "
"must be a non-empty dictionary or a non-empty list of dictionaries."
),
logger,
)

if window_transformations is not None:

raise_if_not(
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
(
isinstance(window_transformations, dict)
and len(window_transformations) > 0
)
or (
isinstance(window_transformations, list)
and len(window_transformations) > 0
),
"`window_transformations` must be a non-empty dictionary or a non-empty list of dictionaries. ",
)

if isinstance(window_transformations, dict):
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
window_transformations = [
window_transformations
] # if only one dictionary, make it a list

for idx, transformation in enumerate(window_transformations):
raise_if_not(
isinstance(transformation, dict),
f"`window_transformations` must contain dictionaries. Element at index {idx} is not a dictionary.",
)

if (
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
"series_id" in transformation
and transformation["series_id"] is None
):
window_transformations[idx].pop("series_id")

if (
"series_id" in transformation
and transformation["series_id"] is not None
):
raise_if_not(
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
(
isinstance(transformation["series_id"], int)
and transformation["series_id"] >= 0
)
or (
isinstance(transformation["series_id"], list)
and all(
isinstance(x, int) and x >= 0
for x in transformation["series_id"]
)
),
f"`window_transformation` at index {idx} must contain a positive integer >= 0 for the "
f"'series_id', or a non-empty list containing positive integers >= 0. ",
)
if isinstance(transformation["series_id"], int):
window_transformations[idx]["series_id"] = [
transformation["series_id"]
] # make list

if (
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
"components" in transformation
and transformation["components"] is None
):
window_transformations[idx].pop("components")

self.window_transformations = window_transformations

def _transform_iterator(
self, series: Union[TimeSeries, Sequence[TimeSeries]]
) -> Iterator[Tuple[TimeSeries, dict]]:

series = series2seq(series)

# run through the transformations
series_subset = []
for idx, transformation in enumerate(self.window_transformations):

if "series_id" not in transformation and "components" not in transformation:
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
# apply the transformation to all series and all components
series_subset += [(s, transformation) for s in series]

elif "series_id" in transformation and "components" not in transformation:
# apply the transformation to a specific series and all its components
raise_if_not(
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
len(series) - 1 >= max(transformation["series_id"]),
f"`window_transformation` at index {idx} has a 'series_id' that is greater than "
f"the number of series in the provided sequence. ",
)
series_subset += [
(series[s_idx], transformation)
for s_idx in transformation["series_id"]
]

elif "series_id" not in transformation and "components" in transformation:
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
# apply the transformation to all series on specific components only in each series
# test that component exists in each relevant series is implemented in TimeSeries.window_transform()
# This scenario does not make sense unless series in the sequence have the same components names !
series_subset += [(s, transformation) for s in series]

else:
# apply the transformation to a specific component in a specific series
# if a different component is provided for each selected series
if all(
isinstance(x, list) for x in transformation["components"]
) and len(transformation["components"]) == len(
transformation["series_id"]
):
# testing that components exist in the corresponding series is implemented
eliane-maalouf marked this conversation as resolved.
Show resolved Hide resolved
# in TimeSeries.window_transform()
components_list = transformation["components"]
for (s_idx, c_idvec) in zip(
transformation["series_id"], components_list
):
# pair each series with its components
transformation.update({"components": c_idvec})
series_subset += [(series[s_idx], transformation.copy())]
# copy to avoid having the same final iteration dictionary for all series
else:
# if the same components are provided for all the selected series (series would need to have same
# components names)!
# testing that the components exist in each series is implemented in TimeSeries.window_transform()
series_subset += [
(series[s_idx], transformation)
for s_idx in transformation["series_id"]
]
# copy to avoid having the same final iteration dictionary for all series

return iter(series_subset) # the iterator object for ts_transform function

@staticmethod
def ts_transform(series: TimeSeries, transformation, **kwargs) -> TimeSeries:
return series.window_transform(transformation, **kwargs)
16 changes: 16 additions & 0 deletions darts/logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,3 +222,19 @@ def suppress_lightning_warnings(suppress_all: bool = False):
"ignore",
".*Trying to infer the `batch_size` from an ambiguous collection.*",
)


def raise_user_warning(condition, message, logger):
"""
Checks provided boolean condition and raises a warning if it evaluates to True.
Useful to notify the user of a potential problem without stopping the execution.
condition:
The boolean condition to be checked.
message:
The message of the warning.
logger:
The logger instance to log the warning message if 'condition' is True.

"""
if condition:
logger.warning("UserWarning:" + message)
Loading