Skip to content

Speed up columns slices #900

Merged
merged 14 commits into from
Sep 7, 2022
Merged

Speed up columns slices #900

merged 14 commits into from
Sep 7, 2022

Conversation

Mr-Geekman
Copy link
Contributor

@Mr-Geekman Mr-Geekman commented Sep 1, 2022

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Proposed Changes

Look #775.

Closing issues

Closes #775.

@Mr-Geekman Mr-Geekman self-assigned this Sep 1, 2022
@Mr-Geekman Mr-Geekman marked this pull request as draft September 1, 2022 14:32
@github-actions
Copy link

github-actions bot commented Sep 1, 2022

🚀 Deployed on https://deploy-preview-900--etna-docs.netlify.app

@github-actions github-actions bot temporarily deployed to pull request September 1, 2022 14:35 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 1, 2022 15:42 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 1, 2022 16:04 Inactive
@codecov-commenter
Copy link

codecov-commenter commented Sep 1, 2022

Codecov Report

Merging #900 (1f662fa) into master (18a064a) will increase coverage by 0.03%.
The diff coverage is 96.87%.

@@            Coverage Diff             @@
##           master     #900      +/-   ##
==========================================
+ Coverage   84.95%   84.99%   +0.03%     
==========================================
  Files         133      133              
  Lines        7593     7611      +18     
==========================================
+ Hits         6451     6469      +18     
  Misses       1142     1142              
Impacted Files Coverage Δ
etna/transforms/math/apply_lambda.py 92.00% <71.42%> (ø)
etna/datasets/__init__.py 100.00% <100.00%> (ø)
etna/datasets/tsdataset.py 90.74% <100.00%> (-0.03%) ⬇️
etna/datasets/utils.py 95.91% <100.00%> (+1.47%) ⬆️
etna/transforms/feature_selection/filter.py 100.00% <100.00%> (ø)
etna/transforms/math/add_constant.py 97.95% <100.00%> (ø)
etna/transforms/math/lags.py 100.00% <100.00%> (ø)
etna/transforms/math/log.py 96.15% <100.00%> (ø)
etna/transforms/math/sklearn.py 95.49% <100.00%> (+0.16%) ⬆️
etna/transforms/math/statistics.py 99.28% <100.00%> (+<0.01%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Mr-Geekman Mr-Geekman marked this pull request as ready for review September 1, 2022 16:38
@martins0n martins0n self-requested a review September 4, 2022 20:27
Copy link
Contributor

@martins0n martins0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should benchmarks?
Sorts everywhere could lead to bottlenecks.

virtualenvs-create: true
virtualenvs-in-project: true

- name: Load cached venv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use cache there -- it could be corrupted.

path: .venv
key: venv-${{ runner.os }}-3.8-${{ hashFiles('**/poetry.lock') }}

- name: Install dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

As a result of caching we will run the same test case without pandas installing.

- name: Install dependencies
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
run: |
poetry add "pandas${{ matrix.pandas-version }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess installing via poetry add will lead to long dependency resolving.

We should install it via pip after poetry install.

make_bug.py Outdated
@@ -0,0 +1,57 @@
import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what is the purpose of this script?

@@ -73,7 +73,8 @@ def transform(self, df: pd.DataFrame) -> pd.DataFrame:
history = self.seasonality * self.window if self.window != -1 else len(df)
segments = sorted(df.columns.get_level_values("segment").unique())

x = df.loc[pd.IndexSlice[:], pd.IndexSlice[segments, self.in_column]].values[::-1]
df_slice = df.loc[pd.IndexSlice[:], pd.IndexSlice[:, self.in_column]].sort_index(axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, we don't need pd.IndexSlice[:]

@github-actions github-actions bot temporarily deployed to pull request September 5, 2022 12:57 Inactive
@Mr-Geekman
Copy link
Contributor Author

I added report with benchmark into etna-experiments, PR: ISSUE - 775.

@github-actions github-actions bot temporarily deployed to pull request September 6, 2022 10:28 Inactive
martins0n
martins0n previously approved these changes Sep 6, 2022
Copy link
Contributor

@martins0n martins0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@martins0n martins0n dismissed their stale review September 6, 2022 18:19

We need to update changelog

@github-actions github-actions bot temporarily deployed to pull request September 7, 2022 07:22 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 7, 2022 07:34 Inactive
@Mr-Geekman Mr-Geekman merged commit 533fce8 into master Sep 7, 2022
@Mr-Geekman Mr-Geekman deleted the issue-775 branch September 7, 2022 08:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up columns slices: etna.datasets.utils.select_columns
3 participants