-
Notifications
You must be signed in to change notification settings - Fork 80
Conversation
Script with demo: import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from etna.analysis import plot_periodogram
from etna.datasets import TSDataset
def main():
df = pd.read_csv("examples/data/example_dataset.csv", parse_dates=["timestamp"])
df_wide = TSDataset.to_dataset(df)
df_wide.iloc[:3, 0] = np.NaN
ts = TSDataset(df=df_wide, freq="D")
plot_periodogram(ts=ts, period=365.25, amplitude_aggregation_mode="per-segment")
plt.savefig("periodogram_per_segment")
plot_periodogram(
ts=ts, period=365.25, amplitude_aggregation_mode="mean", periodogram_params=dict(scaling="spectrum")
)
plt.savefig("periodogram_mean")
if __name__ == "__main__":
main() |
# Conflicts: # CHANGELOG.md
Codecov Report
@@ Coverage Diff @@
## master #606 +/- ##
===========================================
- Coverage 85.14% 52.95% -32.20%
===========================================
Files 118 118
Lines 5884 5932 +48
===========================================
- Hits 5010 3141 -1869
- Misses 874 2791 +1917
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
etna/analysis/plotters.py
Outdated
columns_num: int = 2, | ||
figsize: Tuple[int, int] = (10, 5), | ||
): | ||
"""Plot the periodogram to determine the optimal order parameter for `etna.transforms.FourierTransform`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reference on scipy.signal.periodogram?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll add it, but remain mention of FourierTransform
. Task creator told that this plot is useful exactly for determining the order for FourierTransform
.
etna/analysis/plotters.py
Outdated
ts: | ||
TSDataset with timeseries data | ||
period: | ||
the period of the seasonality to capture in frequency units of time series, it should be >= 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt it too difficult?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the alternative? We have done this parameter like in FourierTransform.
segment_df = df.loc[:, pd.IndexSlice[segment, "target"]] | ||
segment_df = segment_df[segment_df.first_valid_index() : segment_df.last_valid_index()] | ||
if segment_df.isna().any(): | ||
raise ValueError(f"Periodogram can't be calculated on segment with NaNs inside: {segment}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 'NaNs' exists, but we will cut all of them in future, is it right to raise error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand the problem. If we have NaNs at the edges, we are cutting them. NaNs in the middle leads to answer in all NaNs. Cut them out before applying periodogram isn't reasonable, because we are breaking frequencies and seasonalities.
frequencies_segments.append(frequencies) | ||
spectrums_segments.append(spectrum) | ||
|
||
frequencies = frequencies_segments[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we create frequencies_segments array if we need only one value from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, that this implementation is easy. Other implementations that came to my mind require writing if cases for some iteration, or rewriting the same value inside the array, that can be considered as a bug. Current implementation looks very simple.
Do you know simple alternatives?
_, ax = plt.subplots(figsize=figsize, constrained_layout=True) | ||
ax.step(frequencies, spectrum) # type: ignore | ||
ax.set_xscale("log") # type: ignore | ||
ax.set_title("Periodogram") # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in this case it is worth naming x-axis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable, I'll add it.
IMPORTANT: Please do not create a Pull Request without creating an issue first.
Before submitting (must do checklist)
Type of Change
Proposed Changes
Look #593.
Related Issue
#593.
Closing issues
Closes #593.