Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/fix various issues #1106

Merged
merged 4 commits into from Aug 5, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 33 additions & 15 deletions darts/dataprocessing/dtw/dtw.py
Expand Up @@ -2,9 +2,12 @@
from typing import Callable, Union

import numpy as np
import pandas as pd
import xarray as xr

from darts import TimeSeries
from darts.logging import get_logger, raise_if, raise_if_not
from darts.timeseries import DIMS, HIERARCHY_TAG, STATIC_COV_TAG

from .cost_matrix import CostMatrix
from .window import CRWindow, NoWindow, Window
Expand Down Expand Up @@ -203,25 +206,40 @@ def warped(self) -> (TimeSeries, TimeSeries):
Two new TimeSeries instances of the same length, indexed by pd.RangeIndex.
"""

series1 = self.series1
series2 = self.series2

xa1 = series1.data_array(copy=False)
xa2 = series2.data_array(copy=False)

xa1 = self.series1.data_array(copy=False)
xa2 = self.series2.data_array(copy=False)
path = self.path()

warped_series1 = xa1[path[:, 0]]
warped_series2 = xa2[path[:, 1]]

time_dim1 = series1._time_dim
time_dim2 = series2._time_dim
values1, values2 = xa1.values[path[:, 0]], xa2.values[path[:, 1]]

# We set a RangeIndex for both series:
warped_series1 = xr.DataArray(
data=values1,
dims=xa1.dims,
coords={
self.series1._time_dim: pd.RangeIndex(values1.shape[0]),
DIMS[1]: xa1.coords[DIMS[1]],
},
attrs={
STATIC_COV_TAG: xa1.attrs[STATIC_COV_TAG],
HIERARCHY_TAG: xa1.attrs[HIERARCHY_TAG],
},
)

range_index = True
warped_series2 = xr.DataArray(
data=values2,
dims=xa2.dims,
coords={
self.series2._time_dim: pd.RangeIndex(values2.shape[0]),
DIMS[1]: xa2.coords[DIMS[1]],
},
attrs={
STATIC_COV_TAG: xa2.attrs[STATIC_COV_TAG],
HIERARCHY_TAG: xa2.attrs[HIERARCHY_TAG],
},
)

if range_index:
warped_series1 = warped_series1.reset_index(dims_or_levels=time_dim1)
warped_series2 = warped_series2.reset_index(dims_or_levels=time_dim2)
time_dim1, time_dim2 = self.series1._time_dim, self.series2._time_dim

# todo: prevent time information being lost after warping
# Applying time index from series1 to series2 (take_dates = True) is disabled for consistency reasons
Expand Down
8 changes: 4 additions & 4 deletions docs/userguide/timeseries.md
Expand Up @@ -33,7 +33,7 @@ In addition, some models can work on *multiple time series*, meaning that they c

* **Example of a multivariate series:** The blood pressure and heart rate of a single patient over time (one multivariate series with 2 components).

* **Example of multiple series:** The blood pressure and heart rate of multiple patients; potentially measured at different times for different patients (one univariate series per patient).
* **Example of multiple series:** The blood pressure and heart rate of multiple patients; potentially measured at different times for different patients (one multivariate series with 2 components per patient).


### Should I use a multivariate series or multiple series for my problem?
Expand All @@ -50,9 +50,9 @@ In Darts, probabilistic forecasts are represented by drawing Monte Carlo samples
## Creating `TimeSeries`
`TimeSeries` objects can be created using factory methods, for example:

* [TimeSeries.from_dataframe()](https://unit8co.github.io/darts/generated_api/darts.timeseries.html#darts.timeseries.TimeSeries.from_dataframe) can create `TimeSeries` from a Pandas Dataframe having one or several columns representing values (several columns would correspond to a multivariate series).
* [TimeSeries.from_dataframe()](https://unit8co.github.io/darts/generated_api/darts.timeseries.html#darts.timeseries.TimeSeries.from_dataframe) can create `TimeSeries` from a Pandas Dataframe having one or several columns representing values (columns correspond to components, and several columns would correspond to a multivariate series).

* [TimeSeries.from_values()](https://unit8co.github.io/darts/generated_api/darts.timeseries.html#darts.timeseries.TimeSeries.from_values) can create `TimeSeries` from a 2-D or 3-D NumPy array. It will generate an integer-based time index (of type `pandas.RangeIndex`). 2-D corresponds to deterministic (potentially multivariate) series, and 3-D to stochastic series.
* [TimeSeries.from_values()](https://unit8co.github.io/darts/generated_api/darts.timeseries.html#darts.timeseries.TimeSeries.from_values) can create `TimeSeries` from a 1-D, 2-D or 3-D NumPy array. It will generate an integer-based time index (of type `pandas.RangeIndex`). 1-D corresponds to univariate deterministic series, 2-D to multivariate deterministic series, and 3-D to multivariate stochastic series.

* [TimeSeries.from_times_and_values()](https://unit8co.github.io/darts/generated_api/darts.timeseries.html#darts.timeseries.TimeSeries.from_times_and_values) is similar to `TimeSeries.from_values()` but also accepts a time index.

Expand All @@ -67,7 +67,7 @@ my_multivariate_series = concatenate([series1, series2, ...], axis=1)
produces a multivariate series from some series that share the same time axis.

## Implementation
Behind the scenes, `TimeSeries` is wrapping around a 3-dimensional `xarray.DataArray` object. The dimensions are *(time, component, sample)*, where the size of the *component* dimension is larger than 1 for multivariate series and the size of the *sample* dimension is larger than 1 for stochastic series. The `DataArray` is itself backed by a a 3-dimensional NumPy array, and it has a time index (either `pandas.DatetimeIndex` or `pandas.RangeIndex`) on the *time* dimension and another `pandas.Index` on the *component* (or "columns") dimension. `TimeSeries` is intended to be immutable.
Behind the scenes, `TimeSeries` is wrapping around a 3-dimensional `xarray.DataArray` object. The dimensions are *(time, component, sample)*, where the size of the *component* dimension is larger than 1 for multivariate series and the size of the *sample* dimension is larger than 1 for stochastic series. The `DataArray` is itself backed by a 3-dimensional NumPy array, and it has a time index (either `pandas.DatetimeIndex` or `pandas.RangeIndex`) on the *time* dimension and another `pandas.Index` on the *component* (or "columns") dimension. `TimeSeries` is intended to be immutable and most operations return new `TimeSeries` objects.

## Exporting data from a `TimeSeries`
`TimeSeries` objects offer a few ways to export the data, for example:
Expand Down
2 changes: 1 addition & 1 deletion requirements/core.txt
Expand Up @@ -12,7 +12,7 @@ prophet>=1.1
requests>=2.22.0
scikit-learn>=1.0.1
scipy>=1.3.2
statsforecast>=0.5.2
statsforecast==0.6.0
statsmodels>=0.13.0
tbats>=1.1.0
tqdm>=4.60.0
Expand Down