Merge branch 'issue-515' of https://github.com/tinkoff-ai/etna into i…

…ssue-515
tinkoff-ai · Mar 9, 2022 · 66bd488 · 66bd488
2 parents 3375261 + 37d50e0
commit 66bd488
Show file tree

Hide file tree

Showing 61 changed files with 1,947 additions and 1,322 deletions.
diff --git a/.github/workflows/docker-stable.yml b/.github/workflows/docker-stable.yml
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -84,3 +84,37 @@ jobs:
         env:
           NETLIFY_AUTH_TOKEN: ${{ secrets.NETLIFY_AUTH_TOKEN }}
           NETLIFY_SITE_ID: ${{ secrets.NETLIFY_SITE_ID }}
+
+  docker-build-and-push:
+    needs: publish
+    runs-on: ubuntu-latest
+
+    strategy:
+      matrix:
+        dockerfile:
+          - {"name": etna-cpu, "path": docker/etna-cpu/Dockerfile}
+          - {"name": etna-cuda-10.2, "path": docker/etna-cuda-10.2/Dockerfile}
+          - {"name": etna-cuda-11.1, "path": docker/etna-cuda-11.1/Dockerfile}
+
+    steps:
+      - uses: actions/checkout@v2
+
+      - name: Build image
+        run: |
+          cd $( dirname ${{ matrix.dockerfile.path }})
+          VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/\(.*\),\1,')
+          sed -i "s#etna\[all\]#etna\[all\]==$VERSION#g" requirements.txt
+          cat requirements.txt
+          docker build . --tag image
+
+      - name: Log into registry
+        run: echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
+
+      - name: Push image
+        run: |
+          IMAGE_ID=ghcr.io/${{ github.repository }}/${{ matrix.dockerfile.name }}
+          VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/\(.*\),\1,')
+          echo IMAGE_ID=$IMAGE_ID
+          echo VERSION=$VERSION
+          docker tag image $IMAGE_ID:$VERSION
+          docker push $IMAGE_ID:$VERSION
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -16,10 +16,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Add plot_time_series_with_change_points function ([#534](https://github.com/tinkoff-ai/etna/pull/534))
 - Add plot_trend ([#565](https://github.com/tinkoff-ai/etna/pull/565))
 - Add find_change_points function ([#521](https://github.com/tinkoff-ai/etna/pull/521))
-- 
+- Add option `day_number_in_year` to DateFlagsTransform ([#552](https://github.com/tinkoff-ai/etna/pull/552))
 - Add plot_residuals ([#539](https://github.com/tinkoff-ai/etna/pull/539))
 - 
 - Create `PerSegmentBaseModel`, `PerSegmentPredictionIntervalModel` ([#537](https://github.com/tinkoff-ai/etna/pull/537))
+- Create `MultiSegmentModel` ([#551](https://github.com/tinkoff-ai/etna/pull/551))
+- Create `EnsembleMixin` ([#574](https://github.com/tinkoff-ai/etna/pull/574))
+- 
+- Add option `season_number` to DateFlagsTransform ([#567](https://github.com/tinkoff-ai/etna/pull/567))
+- 
+- Add stl_plot ([#575](https://github.com/tinkoff-ai/etna/pull/575))
+- Add community section to README.md ([#580](https://github.com/tinkoff-ai/etna/pull/580))
+- Create `AbstaractPipeline` ([#573](https://github.com/tinkoff-ai/etna/pull/573))
 - 
 ### Changed
 - Change the way `ProphetModel` works with regressors ([#383](https://github.com/tinkoff-ai/etna/pull/383))
@@ -34,16 +42,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Update CONTRIBUTING.md ([#536](https://github.com/tinkoff-ai/etna/pull/536))
 - 
 - Rename `_CatBoostModel`, `_HoltWintersModel`, `_SklearnModel` ([#543](https://github.com/tinkoff-ai/etna/pull/543))
-- 
+- Add logging to TSDataset.make_future, log repr of transform instead of class name ([#555](https://github.com/tinkoff-ai/etna/pull/555))
 - Rename `_SARIMAXModel` and `_ProphetModel`, make `SARIMAXModel` and `ProphetModel` inherit from `PerSegmentPredictionIntervalModel` ([#549](https://github.com/tinkoff-ai/etna/pull/549))
 - 
+- Update get_started section in README ([#569](https://github.com/tinkoff-ai/etna/pull/569))
+- Make detrending polynomial ([#566](https://github.com/tinkoff-ai/etna/pull/566))
+- Update documentation about transforms that generate regressors, update examples with them ([#572](https://github.com/tinkoff-ai/etna/pull/572))
+- 
+- Make `LabelEncoderTransform` and `OneHotEncoderTransform` multi-segment ([#554](https://github.com/tinkoff-ai/etna/pull/554))
 ### Fixed
 - Fix `TSDataset._update_regressors` logic removing the regressors ([#489](https://github.com/tinkoff-ai/etna/pull/489)) 
 - Fix `TSDataset.info`, `TSDataset.describe` methods ([#519](https://github.com/tinkoff-ai/etna/pull/519))
 - Fix regressors handling for `OneHotEncoderTransform` and `HolidayTransform` ([#518](https://github.com/tinkoff-ai/etna/pull/518))
+- Fix wandb summary issue with custom plots ([#535](https://github.com/tinkoff-ai/etna/pull/535))
 - 
 - 
-- 
+- Fix import Literal in plotters ([#558](https://github.com/tinkoff-ai/etna/pull/558))
 - 
 - 
 - 

diff --git a/README.md b/README.md
@@ -42,7 +42,87 @@ The library started as an internal product in our company -
 we use it in over 10+ projects now, so we often release updates. 
 Contributions are welcome - check our [Contribution Guide](https://github.com/tinkoff-ai/etna/blob/master/CONTRIBUTING.md).
 
+## Get started
 
+Let's load and prepare the data.
+```python
+import pandas as pd
+from etna.datasets import TSDataset
+
+# Read the data
+df = pd.read_csv("examples/data/example_dataset.csv")
+
+# Create a TSDataset
+df = TSDataset.to_dataset(df)
+ts = TSDataset(df, freq="D")
+
+# Choose a horizon
+HORIZON = 14
+
+# Make train/test split
+train_ts, test_ts = ts.train_test_split(test_size=HORIZON)
+```
+
+Define transformations and model:
+```python
+from etna.models import CatBoostModelMultiSegment
+from etna.transforms import DateFlagsTransform
+from etna.transforms import DensityOutliersTransform
+from etna.transforms import FourierTransform
+from etna.transforms import LagTransform
+from etna.transforms import LinearTrendTransform
+from etna.transforms import MeanTransform
+from etna.transforms import SegmentEncoderTransform
+from etna.transforms import TimeSeriesImputerTransform
+from etna.transforms import TrendTransform
+
+# Prepare transforms
+transforms = [
+    DensityOutliersTransform(in_column="target", distance_coef=3.0),
+    TimeSeriesImputerTransform(in_column="target", strategy="forward_fill"),
+    LinearTrendTransform(in_column="target"),
+    TrendTransform(in_column="target", out_column="trend"),
+    LagTransform(in_column="target", lags=list(range(HORIZON, 122)), out_column="target_lag"),
+    DateFlagsTransform(week_number_in_month=True, out_column="date_flag"),
+    FourierTransform(period=360.25, order=6, out_column="fourier"),
+    SegmentEncoderTransform(),
+    MeanTransform(in_column=f"target_lag_{HORIZON}", window=12, seasonality=7),
+    MeanTransform(in_column=f"target_lag_{HORIZON}", window=7),
+]
+
+# Prepare model
+model = CatBoostModelMultiSegment()
+```
+
+Fit `Pipeline` and make a prediction:
+```python
+from etna.pipeline import Pipeline
+
+# Create and fit the pipeline
+pipeline = Pipeline(model=model, transforms=transforms, horizon=HORIZON)
+pipeline.fit(train_ts)
+
+# Make a forecast
+forecast_ts = pipeline.forecast()
+```
+
+Let's plot the results:
+```python
+from etna.analysis import plot_forecast
+
+plot_forecast(forecast_ts=forecast_ts, test_ts=test_ts, train_ts=train_ts, n_train_samples=50)
+```
+
+![](examples/assets/readme/get_started.png)
+
+Print the metric value across the segments:
+```python
+from etna.metrics import SMAPE
+
+metric = SMAPE(mode="macro")
+metric_value = metric(y_true=test_ts, y_pred=forecast_ts)
+>>> {'segment_b': 3.3017151519000967, 'segment_c': 5.270557433427279, 'segment_a': 5.272811627335398, 'segment_d': 4.689085450895735}
+```
 
 ## Installation 
 
@@ -79,35 +159,10 @@ For example, `etna.models.ProphetModel` needs `prophet` extension and can't be u
 
 ETNA supports configuration files. It means that library will check that all the specified packages are installed prior to script start and NOT during runtime. 
 
-To set up a configuration for your project you should create a `.etna` file at the project's root. To see the available options look at [`Settings`](https://github.com/tinkoff-ai/etna/blob/master/etna/settings.py#L68). There is an [example](https://github.com/tinkoff-ai/etna/tree/master/examples/configs/.etna) of configuration file. 
-
-## Get started 
-Here's some example code for a quick start.
-```python
-import pandas as pd
-from etna.datasets.tsdataset import TSDataset
-from etna.models import ProphetModel
-from etna.pipeline import Pipeline
-
-# Read the data
-df = pd.read_csv("examples/data/example_dataset.csv")
-
-# Create a TSDataset
-df = TSDataset.to_dataset(df)
-ts = TSDataset(df, freq="D")
-
-# Choose a horizon
-HORIZON = 8
-
-# Fit the pipeline
-pipeline = Pipeline(model=ProphetModel(), horizon=HORIZON)
-pipeline.fit(ts)
-
-# Make the forecast
-forecast_ts = pipeline.forecast()
-```
+To set up a configuration for your project you should create a `.etna` file at the project's root. To see the available options look at [`Settings`](https://github.com/tinkoff-ai/etna/blob/master/etna/settings.py#L68). There is an [example](https://github.com/tinkoff-ai/etna/tree/master/examples/configs/.etna) of configuration file.
 
 ## Tutorials
+
 We have also prepared a set of tutorials for an easy introduction:
 
 | Notebook     | Interactive launch  |
@@ -121,8 +176,14 @@ We have also prepared a set of tutorials for an easy introduction:
 | [Ensembles](https://github.com/tinkoff-ai/etna/tree/master/examples/ensembles.ipynb) | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/tinkoff-ai/etna/master?filepath=examples/ensembles.ipynb) |
 
 ## Documentation
+
 ETNA documentation is available [here](https://etna-docs.netlify.app/).
 
+## Community
+
+To ask the questions or discuss the library you can join our [telegram chat](t.me/etna_support). 
+[Discussions section](https://github.com/tinkoff-ai/etna/discussions) on github is also open for this purpose.
+
 ## Resources
 
 - [Forecasting with ETNA: Fast and Furious](https://medium.com/its-tinkoff/forecasting-with-etna-fast-and-furious-1b58e1453809) on Medium
@@ -134,6 +195,7 @@ ETNA documentation is available [here](https://etna-docs.netlify.app/).
 ## Acknowledgments
 
 ### ETNA.Team
+
 [Andrey Alekseev](https://github.com/iKintosh),
 [Nikita Barinov](https://github.com/diadorer),
 [Dmitriy Bunin](https://github.com/Mr-Geekman),
@@ -148,6 +210,7 @@ ETNA documentation is available [here](https://etna-docs.netlify.app/).
 [Julia Shenshina](https://github.com/julia-shenshina)
 
 ### ETNA.Contributors
+
 [Artem Levashov](https://github.com/soft1q),
 [Aleksey Podkidyshev](https://github.com/alekseyen),
 [Carlosbogo](https://github.com/Carlosbogo)

diff --git a/etna/analysis/__init__.py b/etna/analysis/__init__.py
@@ -3,6 +3,7 @@
 from etna.analysis.eda_utils import distribution_plot
 from etna.analysis.eda_utils import sample_acf_plot
 from etna.analysis.eda_utils import sample_pacf_plot
+from etna.analysis.eda_utils import stl_plot
 from etna.analysis.feature_relevance.relevance import ModelRelevanceTable
 from etna.analysis.feature_relevance.relevance import RelevanceTable
 from etna.analysis.feature_relevance.relevance import StatisticsRelevanceTable

diff --git a/etna/analysis/eda_utils.py b/etna/analysis/eda_utils.py
@@ -2,16 +2,21 @@
 import warnings
 from itertools import combinations
 from typing import TYPE_CHECKING
+from typing import Any
+from typing import Dict
+from typing import List
 from typing import Optional
 from typing import Sequence
 from typing import Tuple
 
 import matplotlib.pyplot as plt
 import numpy as np
+import pandas as pd
 import seaborn as sns
 import statsmodels.api as sm
 from matplotlib.ticker import MaxNLocator
 from statsmodels.graphics import utils
+from statsmodels.tsa.seasonal import STL
 
 if TYPE_CHECKING:
     from etna.datasets import TSDataset
@@ -221,3 +226,73 @@ def distribution_plot(
         sns.boxplot(data=df_slice.sort_values(by="segment"), y="z", x="segment", ax=ax[i], fliersize=False)
         ax[i].set_title(f"{period}")
         i += 1
+
+
+def stl_plot(
+    ts: "TSDataset",
+    in_column: str = "target",
+    period: Optional[int] = None,
+    segments: Optional[List[str]] = None,
+    columns_num: int = 2,
+    figsize: Tuple[int, int] = (10, 10),
+    plot_kwargs: Optional[Dict[str, Any]] = None,
+    stl_kwargs: Optional[Dict[str, Any]] = None,
+):
+    """Plot STL decomposition for segments.
+
+    Parameters
+    ----------
+    ts:
+        dataset with timeseries data
+    segments:
+        segments to plot
+    columns_num:
+        number of columns in subplots
+    figsize:
+        size of the figure per subplot with one segment in inches
+    plot_kwargs:
+        dictionary with parameters for plotting, `matplotlib.axes.Axes.plot` is used
+    stl_kwargs:
+        dictionary with parameters for STL decomposition, `statsmodels.tsa.seasonal.STL` is used
+    """
+    if plot_kwargs is None:
+        plot_kwargs = {}
+    if stl_kwargs is None:
+        stl_kwargs = {}
+    if not segments:
+        segments = sorted(ts.segments)
+
+    segments_number = len(segments)
+    columns_num = min(columns_num, len(segments))
+    rows_num = math.ceil(segments_number / columns_num)
+
+    figsize = (figsize[0] * columns_num, figsize[1] * rows_num)
+    fig = plt.figure(figsize=figsize, constrained_layout=True)
+    subfigs = fig.subfigures(rows_num, columns_num)
+
+    df = ts.to_pandas()
+    for i, segment in enumerate(segments):
+        segment_df = df.loc[:, pd.IndexSlice[segment, :]][segment]
+        segment_df = segment_df[segment_df.first_valid_index() : segment_df.last_valid_index()]
+        decompose_result = STL(endog=segment_df[in_column], period=period, **stl_kwargs).fit()
+
+        # start plotting
+        subfigs.flat[i].suptitle(segment)
+        axs = subfigs.flat[i].subplots(4, 1, sharex=True)
+
+        # plot observed
+        axs.flat[0].plot(segment_df.index, decompose_result.observed, **plot_kwargs)
+        axs.flat[0].set_ylabel("Observed")
+
+        # plot trend
+        axs.flat[1].plot(segment_df.index, decompose_result.trend, **plot_kwargs)
+        axs.flat[1].set_ylabel("Trend")
+
+        # plot seasonal
+        axs.flat[2].plot(segment_df.index, decompose_result.seasonal, **plot_kwargs)
+        axs.flat[2].set_ylabel("Seasonal")
+
+        # plot residuals
+        axs.flat[3].plot(segment_df.index, decompose_result.resid, **plot_kwargs)
+        axs.flat[3].set_ylabel("Residual")
+        axs.flat[3].tick_params("x", rotation=45)