Improve sample_acf and sample_pacf plots #1004
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
🚀 Deployed on https://deploy-preview-1004--etna-docs.netlify.app |
Codecov Report
@@ Coverage Diff @@
## master #1004 +/- ##
==========================================
+ Coverage 85.74% 86.08% +0.33%
==========================================
Files 162 162
Lines 8616 8608 -8
==========================================
+ Hits 7388 7410 +22
+ Misses 1228 1198 -30
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
etna/analysis/eda_utils.py
Outdated
segments = sorted(ts.segments) | ||
|
||
plot = plot_pacf if partial else plot_acf | ||
title = "Partial Autocorrelation" if partial else "Autocorrelation" | ||
|
||
k = min(n_segments, len(segments)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it is not really correct. As I understand, we should ignore n_segments
parameter if segments
is set. Look at distribution_plot
.
etna/analysis/eda_utils.py
Outdated
plot_acf(x=df_slice["target"].values, ax=ax[i], lags=lags) | ||
|
||
if df_slice["target"].isna().any(): | ||
print("yes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like debug string, remove it.
etna/analysis/eda_utils.py
Outdated
""" | ||
acf_plot(ts=ts, n_segments=n_segments, lags=lags, segments=segments, figsize=figsize, partial=False) | ||
warnings.warn( | ||
"DeprecationWarning: This function is deprecated and will be removed in etna=1.14.0; Please use acf_plot instead", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I amn't really sure about this version of deprecation.
etna/analysis/eda_utils.py
Outdated
if df_slice["target"].isna().any(): | ||
print("yes") | ||
df_slice["target"].dropna(inplace=True) | ||
warnings.warn("Values with NaN dropped!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should raise this warning. Let's clarify this moment in docstring of the function. Smth like:
This function removes any NaNs before plotting.
I don't see any images about the results of new functions. You should add script for plotting + result images. You should check both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at comments above.
from etna.analysis import acf_plot
import pandas as pd
import matplotlib.pyplot as plt
from etna.datasets import TSDataset
import warnings
warnings.filterwarnings("ignore")
data = pd.read_csv("examples/data/nordic_merch_sales.csv")
df = TSDataset.to_dataset(data)
ts = TSDataset(df, freq="D")
acf_plot(ts, n_segments=9, columns_num=3, partial=False)
plt.savefig("Autocorrelation")
acf_plot(ts, n_segments=9, columns_num=3, partial=True)
plt.savefig("Partial Autocorrelation") |
etna/analysis/eda_utils.py
Outdated
if segments is None: | ||
segments = sorted(ts.segments) | ||
exist_segments = df_pd.segment.unique() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we write this if like:
if segments is None:
segments = sorted(ts.segments)
selected_segments = np.random.choice(...)
segments = list(selected_segments)
In this case we don't need df_pd
variable.
etna/analysis/eda_utils.py
Outdated
""" | ||
acf_plot(ts=ts, n_segments=n_segments, lags=lags, segments=segments, figsize=figsize, partial=False) | ||
warnings.warn( | ||
"DeprecationWarning: This function is deprecated and will be removed soon; Please use acf_plot instead", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negotiating about version to be removed.
Example with NaN: from etna.analysis import acf_plot
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from etna.datasets import TSDataset, generate_const_df, generate_ar_df
import warnings
df_1 = generate_const_df(periods=100, start_time="2020-01-01", scale=1)
df_1["target"] = [np.nan]*len(df_1["target"])
df_2 = generate_ar_df(periods=100, start_time="2020-04-10")
df = pd.concat([df_1, df_2])
df = TSDataset.to_dataset(df)
ts = TSDataset(df, freq="D")
acf_plot(ts)
plt.savefig("Autocorrelation with NaN.png")
acf_plot(ts, partial=True)
plt.savefig("Partial Autocorrelation with NaN.png") |
CHANGELOG.md
Outdated
@@ -10,7 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
### Added | |||
- | |||
- | |||
- | |||
- Improve `sample_acf_plot` and `sample_pacf_plot` ([#1004](https://github.com/tinkoff-ai/etna/pull/1004)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it should be moved to Changed block and you should write smth like: "Add acf_plot
, deprecated sample_acf_plot
, sample_pacf_plot
".
etna/analysis/eda_utils.py
Outdated
df_slice = ts[:, name, :][name] | ||
plot_acf(x=df_slice["target"].values, ax=ax[i], lags=lags) | ||
|
||
if df_slice["target"].isna().any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this if
?
etna/analysis/eda_utils.py
Outdated
df_slice = ts.to_pandas()[name]["target"] | ||
if partial: | ||
# for partial autocorrelation remove NaN from the beginning and end of the series | ||
indices_nan = np.argwhere(np.isnan(df_slice.values)).squeeze(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to use first_valid_index
etna/analysis/eda_utils.py
Outdated
|
||
indices2remove_end = [] | ||
|
||
current_end = len(df_slice.values) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to use last_valid_index
|
||
candidates2delete = indices2remove_begin + indices2remove_end | ||
if set(indices_nan) != set(candidates2delete): | ||
raise ValueError("There is a NaN in the middle of the time series!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors should be moved to Raises
block in a docstring. Look at docstring of cross_corr_plot
for example.
etna/analysis/eda_utils.py
Outdated
fig.suptitle(title, fontsize=16) | ||
|
||
for i, name in enumerate(segments): | ||
df_slice = ts.to_pandas()[name].reset_index()["target"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this to_pandas()
higher to make it only once.
etna/analysis/eda_utils.py
Outdated
|
||
Notes | ||
----- | ||
`Definition of autocorrelation <https://en.wikipedia.org/wiki/Autocorrelation>`_. | ||
|
||
`Definition of partial autocorrelation <https://en.wikipedia.org/wiki/Partial_autocorrelation_function>`_. | ||
|
||
This function removes any NaNs from the beginning and end of the series if partial=True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write:
* If ``partial=False`` function works with NaNs at any place of the time-series.
* if ``partial=True`` function works only with NaNs at the edges of the time-series and fails if there are NaNs inside of it.
Before submitting (must do checklist)
Proposed Changes
Closing issues
closes #682