Improve performance of `TSDataset._check_endings` #263

Mr-Geekman · 2021-11-08T13:30:02Z

🚀 Feature Request

Change TSDataset._check_endings to improve performance.

Motivation

Currently there is a not very optimal code for checking endings in TSDataset. It can be critical for transform and fit_transform method in dataset with many segments.

Proposal

Remove in TSDataset.check_endings cycle over segments. For example, use indexing like self.df.loc[max_index, pd.IndexSlice[:, "target"]].

You can test performance benefit on code:

from etna.datasets import generate_ar_df

df = generate_ar_df(periods=400, start_time='2020-01-01', n_segments=30000, freq='D')
ts = TSDataset(TSDataset.to_dataset(df), freq='D')
ts._check_endings()

In notebook you can use special tools for profiling.

Test cases

Add tests on _check_endings method.

Add test without segments with nans at the end.
Add test with one segment that ends with nan.
Add test with all segments that end with nan.

Alternatives

No response

Additional context

No response

Checklist

I discussed this issue with ETNA Team

The text was updated successfully, but these errors were encountered:

Mr-Geekman added the enhancement New feature or request label Nov 8, 2021

Mr-Geekman self-assigned this Nov 8, 2021

Mr-Geekman mentioned this issue Nov 8, 2021

Replace cycle over segments with vectorized expression in TSDataset._check_endings #264

Merged

9 tasks

julia-shenshina closed this as completed in #264 Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `TSDataset._check_endings` #263

Improve performance of `TSDataset._check_endings` #263

Mr-Geekman commented Nov 8, 2021

Improve performance of TSDataset._check_endings #263

Improve performance of TSDataset._check_endings #263

Comments

Mr-Geekman commented Nov 8, 2021

🚀 Feature Request

Motivation

Proposal

Test cases

Alternatives

Additional context

Checklist

Improve performance of `TSDataset._check_endings` #263

Improve performance of `TSDataset._check_endings` #263