Skip to content

Improve sample_acf and sample_pacf plots #1004

Merged
merged 14 commits into from Nov 25, 2022
Merged

Improve sample_acf and sample_pacf plots #1004

merged 14 commits into from Nov 25, 2022

Conversation

DBcreator
Copy link
Contributor

@DBcreator DBcreator commented Nov 16, 2022

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Proposed Changes

Closing issues

closes #682

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link

github-actions bot commented Nov 16, 2022

@github-actions github-actions bot temporarily deployed to pull request November 16, 2022 16:58 Inactive
@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2022

Codecov Report

Merging #1004 (b1ad28b) into master (554d4ea) will increase coverage by 0.33%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1004      +/-   ##
==========================================
+ Coverage   85.74%   86.08%   +0.33%     
==========================================
  Files         162      162              
  Lines        8616     8608       -8     
==========================================
+ Hits         7388     7410      +22     
+ Misses       1228     1198      -30     
Impacted Files Coverage Δ
etna/analysis/__init__.py 100.00% <100.00%> (ø)
etna/analysis/eda_utils.py 61.80% <100.00%> (+8.73%) ⬆️
etna/models/nn/deepar.py 98.93% <0.00%> (-0.02%) ⬇️
etna/models/nn/tft.py 99.10% <0.00%> (-0.01%) ⬇️
etna/models/utils.py 100.00% <0.00%> (ø)
etna/models/nn/mlp.py 100.00% <0.00%> (ø)
etna/models/nn/rnn.py 100.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

segments = sorted(ts.segments)

plot = plot_pacf if partial else plot_acf
title = "Partial Autocorrelation" if partial else "Autocorrelation"

k = min(n_segments, len(segments))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it is not really correct. As I understand, we should ignore n_segments parameter if segments is set. Look at distribution_plot.

plot_acf(x=df_slice["target"].values, ax=ax[i], lags=lags)

if df_slice["target"].isna().any():
print("yes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like debug string, remove it.

"""
acf_plot(ts=ts, n_segments=n_segments, lags=lags, segments=segments, figsize=figsize, partial=False)
warnings.warn(
"DeprecationWarning: This function is deprecated and will be removed in etna=1.14.0; Please use acf_plot instead",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I amn't really sure about this version of deprecation.

if df_slice["target"].isna().any():
print("yes")
df_slice["target"].dropna(inplace=True)
warnings.warn("Values with NaN dropped!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should raise this warning. Let's clarify this moment in docstring of the function. Smth like:

This function removes any NaNs before plotting.

@Mr-Geekman
Copy link
Contributor

I don't see any images about the results of new functions. You should add script for plotting + result images. You should check both partial=True/False and data with/without nans. You can look at scripts in #691, #706, https://github.com/tinkoff-ai/etna/milestone/9?closed=1 as a reference.

Copy link
Contributor

@Mr-Geekman Mr-Geekman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at comments above.

@DBcreator
Copy link
Contributor Author

from etna.analysis import acf_plot
import pandas as pd
import matplotlib.pyplot as plt
from etna.datasets import TSDataset
import warnings 
warnings.filterwarnings("ignore")


data = pd.read_csv("examples/data/nordic_merch_sales.csv")
df = TSDataset.to_dataset(data)
ts = TSDataset(df, freq="D")

acf_plot(ts, n_segments=9, columns_num=3, partial=False)
plt.savefig("Autocorrelation")
acf_plot(ts, n_segments=9, columns_num=3, partial=True)
plt.savefig("Partial Autocorrelation")

Autocorrelation:
изображение

Partial Autocorrelation:
изображение

@github-actions github-actions bot temporarily deployed to pull request November 17, 2022 15:09 Inactive
@github-actions github-actions bot temporarily deployed to pull request November 17, 2022 15:56 Inactive
if segments is None:
segments = sorted(ts.segments)
exist_segments = df_pd.segment.unique()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we write this if like:

if segments is None:
    segments = sorted(ts.segments)
    selected_segments = np.random.choice(...)
    segments = list(selected_segments)

In this case we don't need df_pd variable.

"""
acf_plot(ts=ts, n_segments=n_segments, lags=lags, segments=segments, figsize=figsize, partial=False)
warnings.warn(
"DeprecationWarning: This function is deprecated and will be removed soon; Please use acf_plot instead",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negotiating about version to be removed.

@github-actions github-actions bot temporarily deployed to pull request November 21, 2022 10:11 Inactive
@DBcreator
Copy link
Contributor Author

Example with NaN:

from etna.analysis import acf_plot
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from etna.datasets import TSDataset, generate_const_df, generate_ar_df
import warnings 

df_1 = generate_const_df(periods=100, start_time="2020-01-01", scale=1)
df_1["target"] = [np.nan]*len(df_1["target"])
df_2 = generate_ar_df(periods=100, start_time="2020-04-10")
df = pd.concat([df_1, df_2])

df = TSDataset.to_dataset(df)
ts = TSDataset(df, freq="D")

acf_plot(ts)
plt.savefig("Autocorrelation with NaN.png")

acf_plot(ts, partial=True)
plt.savefig("Partial Autocorrelation with NaN.png")

Autocorrelation:
изображение

Partial Autocorrelation:
изображение

CHANGELOG.md Outdated
@@ -10,7 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
-
-
-
- Improve `sample_acf_plot` and `sample_pacf_plot` ([#1004](https://github.com/tinkoff-ai/etna/pull/1004))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should be moved to Changed block and you should write smth like: "Add acf_plot, deprecated sample_acf_plot, sample_pacf_plot".

df_slice = ts[:, name, :][name]
plot_acf(x=df_slice["target"].values, ax=ax[i], lags=lags)

if df_slice["target"].isna().any():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this if?

@github-actions github-actions bot temporarily deployed to pull request November 22, 2022 13:38 Inactive
df_slice = ts.to_pandas()[name]["target"]
if partial:
# for partial autocorrelation remove NaN from the beginning and end of the series
indices_nan = np.argwhere(np.isnan(df_slice.values)).squeeze(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use first_valid_index


indices2remove_end = []

current_end = len(df_slice.values) - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to use last_valid_index


candidates2delete = indices2remove_begin + indices2remove_end
if set(indices_nan) != set(candidates2delete):
raise ValueError("There is a NaN in the middle of the time series!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors should be moved to Raises block in a docstring. Look at docstring of cross_corr_plot for example.

@github-actions github-actions bot temporarily deployed to pull request November 23, 2022 08:40 Inactive
@github-actions github-actions bot temporarily deployed to pull request November 23, 2022 10:55 Inactive
@github-actions github-actions bot temporarily deployed to pull request November 23, 2022 11:36 Inactive
fig.suptitle(title, fontsize=16)

for i, name in enumerate(segments):
df_slice = ts.to_pandas()[name].reset_index()["target"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to_pandas() higher to make it only once.


Notes
-----
`Definition of autocorrelation <https://en.wikipedia.org/wiki/Autocorrelation>`_.

`Definition of partial autocorrelation <https://en.wikipedia.org/wiki/Partial_autocorrelation_function>`_.

This function removes any NaNs from the beginning and end of the series if partial=True.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write:

* If ``partial=False`` function works with NaNs at any place of the time-series.

* if ``partial=True`` function works only with NaNs at the edges of the time-series and fails if there are NaNs inside of it.

@github-actions github-actions bot temporarily deployed to pull request November 24, 2022 09:48 Inactive
@github-actions github-actions bot temporarily deployed to pull request November 25, 2022 06:54 Inactive
@DBcreator DBcreator merged commit cfbfb01 into master Nov 25, 2022
@Mr-Geekman Mr-Geekman deleted the issue-682 branch December 7, 2022 10:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve sample_acf and sample_pacf plots
3 participants