# Stationarity: The One Concept That Makes or Breaks Your Time Series Model

*Part 3 of 8: Stationarity & Preprocessing*

Welcome back! In Parts 1-2, we learned decomposition. Today: **stationarity** - the make-or-break concept for forecasting.

Jennifer's ARIMA model had 99.2% training accuracy. By Friday, it was 30% off. By Monday, completely broken.

Her mistake? Building on **non-stationary data**.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from scipy import signal
from scipy.stats import boxcox
import warnings
warnings.filterwarnings('ignore')

sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (16, 10)
np.random.seed(42)

## What Is Stationarity?

**Stationary**: Statistical properties (mean, variance) don't change over time  
**Non-stationary**: These properties wander or evolve

Think: Measuring your height as an adult (stationary) vs. as a child (non-stationary)

![Concept](stationary_vs_nonstationary.png)

In [None]:
# Generate example data
n = 1000
returns = np.random.normal(0.0005, 0.02, n)
prices = 100 * np.exp(np.cumsum(returns))  # Random walk (non-stationary)

dates = pd.date_range('2020-01-01', periods=n, freq='D')
df = pd.DataFrame({'price': prices}, index=dates)
df['returns'] = df['price'].pct_change()

print(f"Data generated: {len(df)} observations")
print(df.head())

In [None]:
# Visualize
fig, axes = plt.subplots(2, 1, figsize=(16, 8))

axes[0].plot(df.index, df['price'], linewidth=1.5, color='#C73E1D')
axes[0].set_title('Stock Prices (NON-STATIONARY)', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Price ($)')
axes[0].grid(True, alpha=0.3)

axes[1].plot(df.index[1:], df['returns'].dropna(), linewidth=1, color='#06A77D')
axes[1].axhline(y=0, color='red', linestyle='--', alpha=0.3)
axes[1].set_title('Returns (STATIONARY)', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Returns')
axes[1].set_xlabel('Date')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Testing for Stationarity

Visual inspection helps, but we need statistical tests.

### ADF Test (Augmented Dickey-Fuller)
- **H₀**: Non-stationary
- **Rule**: p < 0.05 → Stationary

### KPSS Test  
- **H₀**: Stationary (opposite!)
- **Rule**: p ≥ 0.05 → Stationary

**Use both** to confirm.

In [None]:
def test_stationarity(series, name='Series'):
    """Run ADF and KPSS tests"""
    print(f"\n{'='*60}")
    print(f"Testing: {name}")
    print('='*60)
    
    # ADF test
    adf_result = adfuller(series.dropna())
    print(f"\nADF Test:")
    print(f"  Statistic: {adf_result[0]:.6f}")
    print(f"  p-value: {adf_result[1]:.6f}")
    if adf_result[1] < 0.05:
        print(f"  ✓ STATIONARY (p < 0.05)")
    else:
        print(f"  ✗ NON-STATIONARY (p ≥ 0.05)")
    
    # KPSS test
    kpss_result = kpss(series.dropna(), regression='c')
    print(f"\nKPSS Test:")
    print(f"  Statistic: {kpss_result[0]:.6f}")
    print(f"  p-value: {kpss_result[1]:.6f}")
    if kpss_result[1] >= 0.05:
        print(f"  ✓ STATIONARY (p ≥ 0.05)")
    else:
        print(f"  ✗ NON-STATIONARY (p < 0.05)")
    
    return adf_result[1] < 0.05, kpss_result[1] >= 0.05

# Test prices (should be non-stationary)
adf_stat, kpss_stat = test_stationarity(df['price'], 'Stock Prices')

# Test returns (should be stationary)
adf_stat_ret, kpss_stat_ret = test_stationarity(df['returns'], 'Returns')

## Transformation Toolkit

![Differencing Effect](differencing_effect.png)

When tests show non-stationarity, transform the data:

1. **Differencing** - Removes trend
2. **Log Transform** - Stabilizes variance  
3. **Box-Cox** - Optimal power transform
4. **Seasonal Differencing** - Removes seasonality

### 1. Differencing

In [None]:
# Apply first difference
df['price_diff'] = df['price'].diff()

# Visualize
fig, axes = plt.subplots(2, 1, figsize=(16, 8))

axes[0].plot(df.index, df['price'], linewidth=1.5, color='#C73E1D')
axes[0].set_title('Original Prices', fontsize=13, fontweight='bold')
axes[0].set_ylabel('Price ($)')

axes[1].plot(df.index, df['price_diff'], linewidth=1.5, color='#06A77D')
axes[1].axhline(y=0, color='red', linestyle='--', alpha=0.3)
axes[1].set_title('After Differencing', fontsize=13, fontweight='bold')
axes[1].set_ylabel('Price Change')
axes[1].set_xlabel('Date')

plt.tight_layout()
plt.show()

# Test after differencing
test_stationarity(df['price_diff'], 'Differenced Prices')

### 2. Log Transformation

![Log Transform](log_transformation.png)

Use when variance increases with level.

In [None]:
# Generate data with increasing variance
time = np.arange(200)
exp_growth = 100 * np.exp(0.01 * time)
noise = exp_growth * np.random.normal(0, 0.1, len(time))
hetero_data = exp_growth + noise

# Apply log
log_data = np.log(hetero_data)

# Compare
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

axes[0].plot(hetero_data, linewidth=1.5, color='#C73E1D')
axes[0].set_title('Original: Increasing Variance')

axes[1].plot(log_data, linewidth=1.5, color='#06A77D')
axes[1].set_title('Log: Stabilized Variance')

plt.tight_layout()
plt.show()

print(f"Original variance: {np.var(hetero_data):.2f}")
print(f"Log variance: {np.var(log_data):.4f}")

## Complete Workflow

![Decision Tree](stationarity_decision_tree.png)

In [None]:
def make_stationary(series, name='Series'):
    """Complete workflow to achieve stationarity"""
    print(f"\n{'#'*60}")
    print(f"# Making '{name}' Stationary")
    print('#'*60)
    
    current = series.copy()
    transforms = []
    
    # Check initial state
    print("\n1. Initial Check:")
    adf_stat, kpss_stat = test_stationarity(current, name)
    
    if adf_stat and kpss_stat:
        print("\n✓ Already stationary!")
        return current, transforms
    
    # Check variance
    rolling_std = current.rolling(window=20).std()
    std_ratio = rolling_std.max() / rolling_std.min()
    
    if std_ratio > 2.0:
        print(f"\n2. Variance issue detected (ratio: {std_ratio:.2f})")
        print("   → Applying log transform...")
        if (current <= 0).any():
            current = current - current.min() + 1
        current = np.log(current)
        transforms.append('log')
    
    # Apply differencing
    print("\n3. Differencing:")
    for d in range(1, 3):
        adf_stat, kpss_stat = test_stationarity(current, f'After {d-1} diff')
        if adf_stat and kpss_stat:
            break
        
        current = current.diff().dropna()
        transforms.append(f'diff_{d}')
        print(f"   Applied differencing step {d}")
    
    print(f"\n✅ Transformations: {' → '.join(transforms)}")
    return current, transforms

# Apply workflow
stationary, transforms = make_stationary(df['price'], 'Stock Prices')

## Key Takeaways

1. **Test, don't guess**: Use ADF + KPSS together
2. **Transformation toolkit**: Differencing, log, Box-Cox  
3. **Visual first**: Plot data and rolling stats
4. **Verify after**: Always re-test after transformation
5. **Document steps**: You'll need them for back-transformation

## What's Next

Part 4: **Classical Forecasting Methods**
- Moving Averages
- Exponential Smoothing
- Holt-Winters
- When simple beats complex

Now that your data is stationary, you're ready to build models that work.

---

*Questions? Drop them in the comments!*