Skip to content

Improve performance of TSDataset._check_endings #263

Closed
1 task done
Mr-Geekman opened this issue Nov 8, 2021 · 0 comments 路 Fixed by #264
Closed
1 task done

Improve performance of TSDataset._check_endings #263

Mr-Geekman opened this issue Nov 8, 2021 · 0 comments 路 Fixed by #264
Assignees
Labels
enhancement New feature or request

Comments

@Mr-Geekman
Copy link
Contributor

馃殌 Feature Request

Change TSDataset._check_endings to improve performance.

Motivation

Currently there is a not very optimal code for checking endings in TSDataset. It can be critical for transform and fit_transform method in dataset with many segments.

Proposal

Remove in TSDataset.check_endings cycle over segments. For example, use indexing like self.df.loc[max_index, pd.IndexSlice[:, "target"]].

You can test performance benefit on code:

from etna.datasets import generate_ar_df

df = generate_ar_df(periods=400, start_time='2020-01-01', n_segments=30000, freq='D')
ts = TSDataset(TSDataset.to_dataset(df), freq='D')
ts._check_endings()

In notebook you can use special tools for profiling.

Test cases

Add tests on _check_endings method.

  1. Add test without segments with nans at the end.
  2. Add test with one segment that ends with nan.
  3. Add test with all segments that end with nan.

Alternatives

No response

Additional context

No response

Checklist

  • I discussed this issue with ETNA Team
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant