Skip to content

Make trend transforms work with NaNs #456

Merged
merged 10 commits into from
Jan 20, 2022
Merged

Make trend transforms work with NaNs #456

merged 10 commits into from
Jan 20, 2022

Conversation

alex-hse-repository
Copy link
Collaborator

@alex-hse-repository alex-hse-repository commented Jan 17, 2022

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Type of Change

  • Examples / docs / tutorials / contributors update
  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves an existing feature)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Proposed Changes

Related Issue

Closing issues

closes #417

@alex-hse-repository alex-hse-repository added the enhancement New feature or request label Jan 17, 2022
@alex-hse-repository alex-hse-repository self-assigned this Jan 17, 2022
@alex-hse-repository alex-hse-repository marked this pull request as draft January 17, 2022 14:04
@codecov-commenter
Copy link

codecov-commenter commented Jan 18, 2022

Codecov Report

Merging #456 (09d1bec) into master (8b93c44) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #456      +/-   ##
==========================================
- Coverage   87.98%   87.97%   -0.01%     
==========================================
  Files         115      115              
  Lines        5435     5440       +5     
==========================================
+ Hits         4782     4786       +4     
- Misses        653      654       +1     
Impacted Files Coverage Δ
...na/transforms/decomposition/change_points_trend.py 99.06% <100.00%> (+0.01%) ⬆️
etna/transforms/decomposition/detrend.py 98.33% <100.00%> (ø)
etna/transforms/decomposition/stl.py 94.28% <100.00%> (+0.25%) ⬆️
etna/transforms/decomposition/trend.py 100.00% <100.00%> (ø)
etna/datasets/tsdataset.py 89.52% <0.00%> (-0.34%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b93c44...09d1bec. Read the comment docs.

@alex-hse-repository alex-hse-repository marked this pull request as ready for review January 18, 2022 05:39
@martins0n martins0n self-requested a review January 18, 2022 07:40
Copy link
Contributor

@martins0n martins0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems ok

But something like this not working anyway and we should think about it

import pandas as pd
from etna import pipeline

from etna.datasets.tsdataset import TSDataset
from etna.pipeline import Pipeline
from etna.metrics import SMAPE
from etna.transforms import TrendTransform

from etna.models.sarimax import SARIMAXModel
from sklearn.linear_model import LinearRegression
import numpy as np

from etna.transforms.missing_values import imputation

def example_df():
    df1 = pd.DataFrame()
    df1["timestamp"] = pd.date_range(start="2020-01-01", end="2020-02-01", freq="H")
    df1["segment"] = "segment_1"
    df1["target"] = np.arange(len(df1))
        #+ 2 * np.random.normal(size=len(df1)


    df2 = pd.DataFrame()
    df2["timestamp"] = pd.date_range(start="2020-01-01", end="2020-02-01", freq="H")
    df2["segment"] = "segment_2"
    df2["target"] = np.sqrt(np.arange(len(df2)) + 2 * np.cos(np.arange(len(df2))))

    return pd.concat([df1, df2], ignore_index=True)


def df_with_nans_in_tails(example_df):
    df = TSDataset.to_dataset(example_df)
    df.loc[:4, pd.IndexSlice["segment_1", "target"]] = None
    df.loc[-3:, pd.IndexSlice["segment_1", "target"]] = None
    return df
 
example_df = example_df()
df_with_nans_in_tails = df_with_nans_in_tails(example_df)       


from etna.datasets import *
from etna.models import *
from etna.transforms import *
from etna.pipeline import Pipeline
pipeline = Pipeline(model=NaiveModel(), transforms=[LinearTrendTransform("target"), TimeSeriesImputerTransform()], horizon=5)

pipeline.fit(TSDataset(df_with_nans_in_tails, freq="1H"))
print(pipeline.forecast())

series = df.loc[df[self.in_column].first_valid_index() :, self.in_column]
series = df.loc[df[self.in_column].first_valid_index() : df[self.in_column].last_valid_index(), self.in_column]
if series.isnull().values.any():
raise ValueError("The input column contains NaNs in the middle of the series! Try to use the imputer.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before PR. Did it work with nulls in middle of TSDataset?

@martins0n martins0n merged commit 75fd188 into master Jan 20, 2022
@martins0n martins0n deleted the issue-417 branch January 20, 2022 07:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make trend transforms work with NaNs
3 participants