Skip to content

Add plot_periodogram #606

Merged
merged 6 commits into from
Mar 21, 2022
Merged

Add plot_periodogram #606

merged 6 commits into from
Mar 21, 2022

Conversation

Mr-Geekman
Copy link
Contributor

@Mr-Geekman Mr-Geekman commented Mar 16, 2022

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Type of Change

  • Examples / docs / tutorials / contributors update
  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves an existing feature)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Proposed Changes

Look #593.

Related Issue

#593.

Closing issues

Closes #593.

@Mr-Geekman Mr-Geekman added the enhancement New feature or request label Mar 16, 2022
@Mr-Geekman Mr-Geekman self-assigned this Mar 16, 2022
@Mr-Geekman
Copy link
Contributor Author

Script with demo:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from etna.analysis import plot_periodogram
from etna.datasets import TSDataset


def main():
    df = pd.read_csv("examples/data/example_dataset.csv", parse_dates=["timestamp"])
    df_wide = TSDataset.to_dataset(df)
    df_wide.iloc[:3, 0] = np.NaN
    ts = TSDataset(df=df_wide, freq="D")

    plot_periodogram(ts=ts, period=365.25, amplitude_aggregation_mode="per-segment")
    plt.savefig("periodogram_per_segment")

    plot_periodogram(
        ts=ts, period=365.25, amplitude_aggregation_mode="mean", periodogram_params=dict(scaling="spectrum")
    )
    plt.savefig("periodogram_mean")


if __name__ == "__main__":
    main()

periodogram_per_segment:
periodogram_per_segment

periodogram_mean:
periodogram_mean

@codecov-commenter
Copy link

codecov-commenter commented Mar 16, 2022

Codecov Report

Merging #606 (f95302d) into master (7dd9448) will decrease coverage by 32.19%.
The diff coverage is 6.25%.

@@             Coverage Diff             @@
##           master     #606       +/-   ##
===========================================
- Coverage   85.14%   52.95%   -32.20%     
===========================================
  Files         118      118               
  Lines        5884     5932       +48     
===========================================
- Hits         5010     3141     -1869     
- Misses        874     2791     +1917     
Impacted Files Coverage Δ
etna/analysis/plotters.py 10.95% <4.25%> (-6.59%) ⬇️
etna/analysis/__init__.py 100.00% <100.00%> (ø)
etna/commands/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
etna/commands/backtest_command.py 0.00% <0.00%> (-96.43%) ⬇️
etna/commands/forecast_command.py 0.00% <0.00%> (-92.00%) ⬇️
etna/commands/__main__.py 0.00% <0.00%> (-87.50%) ⬇️
etna/commands/resolvers.py 0.00% <0.00%> (-80.00%) ⬇️
etna/analysis/outliers/density_outliers.py 22.44% <0.00%> (-75.52%) ⬇️
etna/datasets/datasets_generation.py 26.47% <0.00%> (-73.53%) ⬇️
etna/transforms/timestamp/time_flags.py 27.02% <0.00%> (-72.98%) ⬇️
... and 68 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@martins0n martins0n requested a review from Ama16 March 17, 2022 07:38
columns_num: int = 2,
figsize: Tuple[int, int] = (10, 5),
):
"""Plot the periodogram to determine the optimal order parameter for `etna.transforms.FourierTransform`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reference on scipy.signal.periodogram?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll add it, but remain mention of FourierTransform. Task creator told that this plot is useful exactly for determining the order for FourierTransform.

ts:
TSDataset with timeseries data
period:
the period of the seasonality to capture in frequency units of time series, it should be >= 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt it too difficult?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the alternative? We have done this parameter like in FourierTransform.

segment_df = df.loc[:, pd.IndexSlice[segment, "target"]]
segment_df = segment_df[segment_df.first_valid_index() : segment_df.last_valid_index()]
if segment_df.isna().any():
raise ValueError(f"Periodogram can't be calculated on segment with NaNs inside: {segment}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 'NaNs' exists, but we will cut all of them in future, is it right to raise error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the problem. If we have NaNs at the edges, we are cutting them. NaNs in the middle leads to answer in all NaNs. Cut them out before applying periodogram isn't reasonable, because we are breaking frequencies and seasonalities.

frequencies_segments.append(frequencies)
spectrums_segments.append(spectrum)

frequencies = frequencies_segments[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we create frequencies_segments array if we need only one value from it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, that this implementation is easy. Other implementations that came to my mind require writing if cases for some iteration, or rewriting the same value inside the array, that can be considered as a bug. Current implementation looks very simple.
Do you know simple alternatives?

_, ax = plt.subplots(figsize=figsize, constrained_layout=True)
ax.step(frequencies, spectrum) # type: ignore
ax.set_xscale("log") # type: ignore
ax.set_title("Periodogram") # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in this case it is worth naming x-axis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable, I'll add it.

@Mr-Geekman
Copy link
Contributor Author

Script is the same.

periodogram_per_segment:
periodogram_per_segment

periodogram_mean:
periodogram_mean

@Mr-Geekman Mr-Geekman merged commit 3097a83 into master Mar 21, 2022
@Mr-Geekman Mr-Geekman deleted the issue-593 branch March 21, 2022 10:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Periodogram
3 participants