Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/diff #1380

Merged
merged 14 commits into from
Nov 26, 2022
Merged

Feature/diff #1380

merged 14 commits into from
Nov 26, 2022

Conversation

mabilton
Copy link
Contributor

#641 - Implements Diff data transformer which can difference and 'undifference' time series.

Summary

Diff sequentially applies a series of $m$-lagged differencing operations (i.e. $y^\prime_t = y_t - y_{t-m}$) to a time series (see here). The $m$ value to use for each differencing operation is specified by the lags parameter; for example, setting lags = [1, 12] first applies 1-lagged differencing to the time series, and then 12-lagged differencing to the 1-lagged differenced series. This interface is essential identical to what's found in sktime.

A simple example:

from darts.datasets import AirPassengersDataset
from darts.dataprocessing.transformers import Diff
series = AirPassengersDataset().load()
first_order_diff = Diff(lags=1, dropna=True).fit_transform(series)
print(first_order_diff.head())
second_order_diff = Diff(lags=[1, 2], dropna=False).fit_transform(series)
print(second_order_diff.head())

which produces:

<TimeSeries (DataArray) (Month: 5, component: 1, sample: 1)>
array([[[ 6.]],
    [[14.]],
    [[-3.]],
    [[-8.]],
    [[14.]]])
Coordinates:
* Month      (Month) datetime64[ns] 1949-02-01 1949-03-01 ... 1949-06-01
* component  (component) object '#Passengers'

<TimeSeries (DataArray) (Month: 5, component: 1, sample: 1)>
array([[[ nan]],
    [[ nan]],
    [[ nan]],
    [[ -9.]],
    [[-22.]]])
Coordinates:
* Month      (Month) datetime64[ns] 1949-01-01 1949-02-01 ... 1949-05-01
* component  (component) object '#Passengers'

Diff can also be used to 'undifference' data that extends beyond dates it was trained on; such use cases arise when one wants to convert a forecast of a differenced time series back into an undifferenced one. Here's a simple example to illustrate this:

import numpy as np
from darts.datasets import AirPassengersDataset
from darts.dataprocessing.transformers import Diff
series = AirPassengersDataset().load()
train_series = series.drop_after(10)
diff = Diff(lags=[1,2,3])
diff.fit(train_series)
diffed_series = diff.transform(series)
undiffed_series = diff.inverse_transform(diffed_series)
print(np.allclose(undiffed_series.all_values(), series.all_values()))

Other Information

I also made a few other minor changes while implementing Diff:

  1. Added flatten flag to _reshape_in and _reshape_out methods of BaseDataTransformer, which specifies whether the last two axes of a series should be flattened into one. When flatten=False, these methods essentially mask and unmask the relevant components of series using component_mask.
  2. Added prepend and prepend_values methods (+ tests) to TimeSeries class - these are essentially the 'opposites' to the append and append_values methods (i.e. they add new values to the start of a series rather than to the end).
  3. Added unit test for append_values method of TimeSeries.

As an aside, I'm also planning on implementing an invertible Deseason transformer (#638). Before doing so, however, I'd first like to 'clean up' some of the short-comings I personally see with the current DataTransformer interface; when I have time, I'll open an issue about this and link that to this PR.

Any feedback on what I've done here would be very welcome.

Cheers,
Matt.

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good to me, many thanks @mabilton ! I especially like the good tests and docstring :)
There's a small issue with more_itertools (could we do without?) which prevent the tests from executing. After that, if tests pass, we can merge it soon IMO.

darts/tests/test_timeseries.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/diff.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/diff.py Show resolved Hide resolved
darts/dataprocessing/transformers/diff.py Outdated Show resolved Hide resolved
@eliane-maalouf
Copy link
Contributor

hello, from my side I just spotted some little typos

mabilton and others added 4 commits November 25, 2022 11:13
…alues()`

Co-authored-by: eliane-maalouf <112691612+eliane-maalouf@users.noreply.github.com>
Co-authored-by: Julien Herzen <j.herzen@gmail.com>
Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks @mabilton ! This will go in the next release :)

@mabilton
Copy link
Contributor Author

Thanks for the help @hrzn , @eliane-maalouf !

@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2022

Codecov Report

Base: 93.94% // Head: 94.01% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (53c321d) compared to base (9b409a8).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1380      +/-   ##
==========================================
+ Coverage   93.94%   94.01%   +0.06%     
==========================================
  Files          80       81       +1     
  Lines        8708     8786      +78     
==========================================
+ Hits         8181     8260      +79     
+ Misses        527      526       -1     
Impacted Files Coverage Δ
...taprocessing/transformers/base_data_transformer.py 96.77% <100.00%> (+0.22%) ⬆️
darts/dataprocessing/transformers/diff.py 100.00% <100.00%> (ø)
darts/timeseries.py 92.32% <100.00%> (+0.09%) ⬆️
...arts/models/forecasting/torch_forecasting_model.py 87.70% <0.00%> (-0.06%) ⬇️
darts/models/forecasting/block_rnn_model.py 98.24% <0.00%> (-0.04%) ⬇️
darts/models/forecasting/nhits.py 99.27% <0.00%> (-0.01%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hrzn hrzn merged commit e18d572 into unit8co:master Nov 26, 2022
@mabilton mabilton deleted the feature/diff branch November 26, 2022 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants