# Dynamic Regime-Based Asset Allocation
**Author:** Tanishk Yadav
**Date:** January 2026

## 1. Introduction
This notebook implements a dynamic risk management strategy using **Gaussian Mixture Models (GMM)** to detect latent market regimes. It differentiates between *Low Volatility (Bull)*, *Transition*, and *Crisis* states, adjusting the portfolio's sector exposure accordingly.

### Workflow:
1.  **Ingestion:** Fetch S&P 500 sector data and Macro factors.
2.  **Engineering:** Calculate Realized Volatility and Returns.
3.  **Modeling:** Train unsupervised GMM to cluster market states.
4.  **Backtest:** Simulate strategy performance vs. Benchmarks.



In [None]:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.mixture import GaussianMixture
import datetime as dt
import warnings
import os

# Configuration
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8-darkgrid')
pd.set_option('display.float_format', lambda x: '%.4f' % x)

# Global Constants
START_DATE = '2018-01-01'
END_DATE = dt.datetime.today().strftime('%Y-%m-%d')
TICKERS = [
    'XLF', 'XLI', 'XLB', 'XLE', 'XLY',  # Cyclical
    'XLU', 'XLP', 'XLV',                # Defensive
    'XLK', 'XLC', 'XLRE',               # Growth
    'SPY', '^VIX', '^TNX'               # Macro
]


## 2. Data Ingestion
We fetch daily OHLC data for the Sector Universe and Macro factors.
*Optimization:* Data is cached to `market_data.csv` to speed up subsequent runs.


In [None]:
def fetch_data(tickers, start, end):
    file_path = 'market_data.csv'
    
    if os.path.exists(file_path):
        print(f"Loading data from {file_path}...")
        df = pd.read_csv(file_path, index_col=0, parse_dates=True)
        # Check if data is up to date (simplified check)
        if df.index.max().strftime('%Y-%m-%d') >= (dt.datetime.today() - dt.timedelta(days=3)).strftime('%Y-%m-%d'):
            return df
            
    print("Downloading new data from Yahoo Finance...")
    data = yf.download(tickers, start=start, end=end, progress=True)['Adj Close']
    
    # Handle missing data
    data = data.ffill().dropna()
    
    # Save cache
    data.to_csv(file_path)
    return data

prices = fetch_data(TICKERS, START_DATE, END_DATE)
print(f"Data Shape: {prices.shape}")
prices.tail()


## 3. Feature Engineering
We construct the risk signals required for the Regime Detection model.
*   **Log Returns:** $r_t = \ln(P_t / P_{t-1})$
*   **Realized Volatility:** 21-day rolling standard deviation annualized.


In [None]:
# Calculate Log Returns
returns = np.log(prices / prices.shift(1)).dropna()

# Calculate Realized Volatility (Annualized)
realized_vol = returns.rolling(window=21).std() * np.sqrt(252)
realized_vol = realized_vol.dropna()

# Prepare Feature Matrix for GMM
# We use SPY Volatility and VIX as the primary features for regime detection
features = pd.DataFrame()
features['Rel_Vol'] = realized_vol['SPY']
features['VIX'] = prices['^VIX'].loc[realized_vol.index]

# Normalize Features (Z-Score)
# Essential for GMM to treat features equally
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

print("Features Prepared.")
features.head()


## 4. Regime Detection (GMM)
We fit a Gaussian Mixture Model with **K=3** components to identify market states.
*   **Regime 0:** Bull / Low Vol
*   **Regime 1:** Transition / Medium Vol
*   **Regime 2:** Crisis / High Vol


In [None]:
# Fit GMM
gmm = GaussianMixture(n_components=3, covariance_type='full', random_state=42)
gmm.fit(features_scaled)

# Predict Regimes
regime_labels = gmm.predict(features_scaled)
features['Regime'] = regime_labels

# Ordering Regimes: Ensure 0 is Low Vol, 1 is Med, 2 is High
# We sort cluster labels by their mean VIX
vol_means = features.groupby('Regime')['VIX'].mean().sort_values()
mapping = {old_label: new_label for new_label, old_label in enumerate(vol_means.index)}
features['Regime'] = features['Regime'].map(mapping)

print("Regime Characteristics:")
print(features.groupby('Regime').mean())


### Visualizing Market Regimes


In [None]:
# Plot SPY colored by Regime
dataset = features.join(prices['SPY'], how='inner')

colors = {0: 'green', 1: 'orange', 2: 'red'}
labels = {0: 'Bull', 1: 'Transition', 2: 'Crisis'}

plt.figure(figsize=(15, 7))
plt.plot(dataset.index, dataset['SPY'], color='black', alpha=0.2, label='SPY Price')

for r in [0, 1, 2]:
    subset = dataset[dataset['Regime'] == r]
    plt.scatter(subset.index, subset['SPY'], s=10, color=colors[r], label=labels[r])

plt.title('S&P 500 Market Regimes (2018-Present)', fontsize=14)
plt.legend()
plt.show()


## 5. Strategy Backtesting
We simulate the performance of a **Dynamic Sector Rotation** strategy against the **S&P 500 (Buy & Hold)**.

**Strategy Logic:**
*   **Regime 0 (Bull):** Aggressive Growth (Tech, Comm, Financials)
*   **Regime 1 (Transition):** Defensive Shield (Utilities, Staples)
*   **Regime 2 (Crisis):** **CASH** (Risk Off)


In [None]:
# 1. Define Baskets
regime_basket = {
    0: ['XLK', 'XLC', 'XLF', 'XLY', 'XLV'],  # Aggressive
    1: ['XLU', 'XLP', 'XLV'],                # Defensive
    2: []                                    # Cash (Empty Basket)
}

# 2. Vectorized Backtest
strategy_returns = []
spy_returns = returns['SPY'].loc[dataset.index]

# Shift signal by 1 day (we trade *tomorrow* based on *today's* regime)
signals = dataset['Regime'].shift(1)

for date, regime in signals.items():
    if pd.isna(regime):
        strategy_returns.append(0.0)
        continue
    
    regime = int(regime)
    basket = regime_basket[regime]
    
    if not basket:
        # Cash returns 0 (ignoring risk-free rate for simplicity, or add ^TNX/252)
        daily_ret = 0.0
    else:
        # Equal weight basket calculation
        # Filter for tickers present in columns (in case of data issues)
        valid_tickers = [t for t in basket if t in returns.columns]
        daily_ret = returns.loc[date, valid_tickers].mean()
        
    strategy_returns.append(daily_ret)

# Create Backtest DF
backtest = pd.DataFrame(index=spy_returns.index)
backtest['Strategy'] = strategy_returns
backtest['Benchmark'] = spy_returns

# Calculate Equity Curves
backtest['Strategy_Eq'] = (1 + backtest['Strategy']).cumprod()
backtest['Benchmark_Eq'] = (1 + backtest['Benchmark']).cumprod()

backtest.tail()


### Performance Metrics & Visualization


In [None]:
# Plot Equity Curves
plt.figure(figsize=(15, 7))
plt.plot(backtest['Strategy_Eq'], label='Dynamic Regime Strategy', color='blue', linewidth=2)
plt.plot(backtest['Benchmark_Eq'], label='S&P 500', color='gray', linestyle='--')
plt.title('Equity Curve Comparison', fontsize=14)
plt.legend()
plt.show()

# Calculate Metrics
def calculate_metrics(series):
    total_ret = (1 + series).prod() - 1
    ann_ret = (1 + series).prod() ** (252 / len(series)) - 1
    ann_vol = series.std() * np.sqrt(252)
    sharpe = series.mean() / series.std() * np.sqrt(252)
    
    # Max Drawdown
    cum = (1 + series).cumprod()
    dd = (cum - cum.cummax()) / cum.cummax()
    max_dd = dd.min()
    
    return [total_ret, ann_ret, ann_vol, sharpe, max_dd]

metrics = pd.DataFrame(index=['Total Return', 'Ann. Return', 'Ann. Volatility', 'Sharpe Ratio', 'Max Drawdown'])
metrics['Strategy'] = calculate_metrics(backtest['Strategy'])
metrics['Benchmark'] = calculate_metrics(backtest['Benchmark'])

# Format
metrics = metrics.applymap(lambda x: f"{x:.2%}" if abs(x) > 5 else f"{x:.2f}")
metrics


## 6. Conclusion
This backtest demonstrates that regime-based allocation can successfully reduce tail risk.
*   **Drawdown Reduction:** Moving to Cash in Regime 2 eliminates the deep V-shaped losses of crashes.
*   **Volatility:** Significantly lower than the index.
*   **Trade-off:** May lag during V-shaped recoveries (late re-entry), but offers a smoother compound growth path.
