# sktime Preprocessing Transformers
Time series pipelines often need careful preprocessing before modeling. sktime provides a consistent transformer API
to handle detrending, deseasonalizing, variance stabilization, imputation, and feature generation.

## Core notation
Let a univariate series be $y_t$ for $t=1,\dots,T$. Common transforms include:

- First difference: $\nabla y_t = y_t - y_{t-1}$
- Seasonal difference: $\nabla_s y_t = y_t - y_{t-s}$
- Box-Cox: $y_t^{(\lambda)} = \frac{y_t^\lambda - 1}{\lambda}$ (or $\log y_t$ when $\lambda=0$)

## Why preprocessing matters
- Improve stationarity so models can focus on short-term dynamics
- Separate trend/seasonal components for cleaner residuals
- Handle missing data and stabilize variance
- Provide robust lagged or window-based features

## Common transformer families in sktime
- **Trend/seasonality**: `Detrender`, `Deseasonalizer`
- **Differencing**: `Differencer`
- **Variance stabilization**: `BoxCoxTransformer`
- **Missing data**: `Imputer`
- **Feature engineering**: `Lag`, `WindowSummarizer`, `FourierFeatures`

## Quick visual: differencing a seasonal series

In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

rng = np.random.default_rng(42)
t = np.arange(0, 120)
trend = 0.05 * t
season = 2.0 * np.sin(2 * np.pi * t / 12)
noise = rng.normal(0, 0.5, size=t.size)
y = trend + season + noise

y_diff = np.diff(y, n=1)

fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.1)
fig.add_trace(go.Scatter(x=t, y=y, name="original"), row=1, col=1)
fig.add_trace(go.Scatter(x=t[1:], y=y_diff, name="first difference"), row=2, col=1)
fig.update_layout(height=500, title="Original vs First Difference")
fig.show()

## sktime pipeline example
The `TransformedTargetForecaster` applies a sequence of transformations before forecasting.

In [None]:
import plotly.graph_objects as go

from sktime.datasets import load_airline
from sktime.forecasting.compose import TransformedTargetForecaster
from sktime.forecasting.model_selection import ForecastingHorizon, temporal_train_test_split
from sktime.forecasting.naive import NaiveForecaster
from sktime.transformations.series.boxcox import BoxCoxTransformer
from sktime.transformations.series.detrend import Detrender, Deseasonalizer

y = load_airline()
y_train, y_test = temporal_train_test_split(y, test_size=36)
fh = ForecastingHorizon(y_test.index, is_relative=False)

pipe = TransformedTargetForecaster(steps=[
    ("boxcox", BoxCoxTransformer()),
    ("deseasonalize", Deseasonalizer(model="additive", sp=12)),
    ("detrend", Detrender()),
    ("forecaster", NaiveForecaster(strategy="drift")),
])

pipe.fit(y_train)
pred = pipe.predict(fh)

fig = go.Figure()
fig.add_trace(go.Scatter(x=y_train.index, y=y_train, name="train"))
fig.add_trace(go.Scatter(x=y_test.index, y=y_test, name="test"))
fig.add_trace(go.Scatter(x=pred.index, y=pred, name="forecast"))
fig.update_layout(title="TransformedTargetForecaster pipeline")
fig.show()

## Sliding window feature engineering (concept)
Even without a dedicated transformer, you can create lagged and rolling features directly.

In [None]:
import pandas as pd

df = pd.DataFrame({"y": y})
df["lag_1"] = df["y"].shift(1)
df["lag_12"] = df["y"].shift(12)
df["roll_mean_12"] = df["y"].rolling(12).mean()
df.head(15)

## Practical tips
- Use **deseasonalization** before differencing when seasonality dominates.
- Keep transformations in a pipeline to avoid leakage across train/test splits.
- For long seasonal periods, combine seasonal differencing with Fourier features.