# Streamlined Training Pipeline

**Updated to use Integer-Based Regime System**

**Training Flow:**
1. Configure regime settings (integer states)
2. Load all data
3. Train global Markov model on all data
4. Train individual Markov models on specific stocks using global prior
5. Train close price KDE globally then stock-specific
6. Train open price model with trend/volatility resolved KDEs
7. Train high/low copulas based on close/open prices
8. Train ARIMA-GARCH models on BB and 20-day MA
9. Make prediction

In [1]:
import sys
import os
import pandas as pd
import numpy as np
import pickle
import warnings
from datetime import datetime
warnings.filterwarnings('ignore')

sys.path.append('../src')

print(f"🚀 Starting streamlined training pipeline - {datetime.now().strftime('%H:%M:%S')}")

🚀 Starting streamlined training pipeline - 20:51:37


## 🔧 Regime Configuration

**Configure the integer-based regime system at the top of the pipeline**

## 🌍 New Feature: US Universe Data Loading

**The pipeline now supports loading data from the US universe file (5013 stocks)**
- **File**: `cache/US universe_2025-08-05_a782c.csv` 
- **Usage**: Uncomment the optional data loading section in Step 1
- **Benefits**: Train models on complete US stock universe instead of subset

In [2]:
# =====================================================
# REGIME CONFIGURATION - MODIFY THESE SETTINGS
# =====================================================

# Set the number of regime states to use
N_TREND_STATES = 7    # Number of trend states (3, 5, 7, etc.)
N_VOL_STATES = 5      # Number of volatility states (2, 3, 5, etc.)

print(f"🎛️ REGIME CONFIGURATION")
print(f"=" * 50)
print(f"Trend States: {N_TREND_STATES}")
print(f"Volatility States: {N_VOL_STATES}")
print(f"Total Regimes: {N_TREND_STATES * N_VOL_STATES}")

# Apply the configuration to the global regime system
sys.path.append('../src')
from config.regime_config import create_regime_config, set_custom_regime_config, REGIME_CONFIG

# Create custom regime configuration with specified states
if N_TREND_STATES != 5 or N_VOL_STATES != 3:
    print(f"\n🔄 Creating custom regime configuration...")
    custom_config = create_regime_config(n_trend_states=N_TREND_STATES, n_vol_states=N_VOL_STATES)
    set_custom_regime_config(custom_config)
    print(f"✅ Custom configuration applied")
else:
    print(f"\n✅ Using default configuration (5×3)")

# Show regime details
config = REGIME_CONFIG
print(f"\n📊 Regime Details:")
print(f"   Trend states: {config.trend.get_all_states()}")
print(f"   Trend labels: {config.trend.get_all_labels()}")
print(f"   Vol states: {config.volatility.get_all_states()}")  
print(f"   Vol labels: {config.volatility.get_all_labels()}")

# Show some example regimes
combined_regimes = config.get_all_combined_regimes()
print(f"\n🔗 Example Combined Regimes:")
print(f"   First 5: {combined_regimes[:5]}")
print(f"   Last 5: {combined_regimes[-5:]}")

# Show state-to-label conversion examples
print(f"\n🔄 State-Label Examples:")
print(f"   trend_0 = {config.trend.get_state_label(0)}")
print(f"   trend_{config.trend.get_all_states()[-1]} = {config.trend.get_state_label(config.trend.get_all_states()[-1])}")
print(f"   vol_0 = {config.volatility.get_state_label(0)}")
print(f"   vol_{config.volatility.get_all_states()[-1]} = {config.volatility.get_state_label(config.volatility.get_all_states()[-1])}")

print(f"\n✅ Regime configuration complete - ready for training")
print(f"=" * 50)

🎛️ REGIME CONFIGURATION
Trend States: 7
Volatility States: 5
Total Regimes: 35

🔄 Creating custom regime configuration...
✅ Custom configuration applied

📊 Regime Details:
   Trend states: [0, 1, 2, 3, 4]
   Trend labels: ['strong_bear', 'bear', 'sideways', 'bull', 'strong_bull']
   Vol states: [0, 1, 2]
   Vol labels: ['low', 'medium', 'high']

🔗 Example Combined Regimes:
   First 5: ['strong_bear_low', 'strong_bear_medium', 'strong_bear_high', 'bear_low', 'bear_medium']
   Last 5: ['bull_medium', 'bull_high', 'strong_bull_low', 'strong_bull_medium', 'strong_bull_high']

🔄 State-Label Examples:
   trend_0 = strong_bear
   trend_4 = strong_bull
   vol_0 = low
   vol_2 = high

✅ Regime configuration complete - ready for training


## 📋 Regime System Information

**Key Features of the Integer-Based Regime System:**

- **Integer States**: Regimes use integer states (0, 1, 2, ...)
- **Descriptive Labels**: Each state has a descriptive name
- **Flexible Configuration**: Easily change number of states
- **Backwards Compatible**: Existing code continues to work
- **Mathematical Operations**: Efficient for calculations

**Example Configurations:**
- 3×3 = 9 regimes (simple)
- 5×3 = 15 regimes (default, balanced)  
- 7×5 = 35 regimes (detailed)

**State Mapping:**
- `trend_0` → strongest bearish trend
- `trend_N-1` → strongest bullish trend  
- `vol_0` → lowest volatility
- `vol_N-1` → highest volatility

## 1. Optional: Load Universe Data

**OPTIONAL**: Load fresh data from US universe file instead of using cached stock_data.pkl

To use the US universe data, uncomment and run the cell below. This will load data for up to 5013 stocks from the US universe file.

In [3]:
# OPTIONAL: Load fresh data from US universe file
# Uncomment the lines below to load data from the US universe_2025-08-05* file
# This will load up to 5013 stocks from the universe file

# from data.loader import load_universe_data
# print("🌍 Loading data from US universe file...")
# stock_data = load_universe_data(max_symbols=100, update=False, rate_limit=2.0)  # Limit to 100 for demo
# print(f"✅ Loaded universe data with {len(stock_data['Close'].columns)} stocks")

# DEFAULT: Load existing cached data
with open('../cache/stock_data_universe.pkl', 'rb') as f:
    stock_data = pickle.load(f)

n_stocks = len(stock_data['Close'].columns)
print(f"✅ Loaded {n_stocks} stocks")

# Prepare data for training
def prepare_stock_data(stock_data, symbols, min_obs=50):
    prepared = {}
    for symbol in symbols:
        if symbol in stock_data['Close'].columns:
            data = pd.DataFrame({
                'Open': stock_data['Open'][symbol],
                'High': stock_data['High'][symbol],
                'Low': stock_data['Low'][symbol],
                'Close': stock_data['Close'][symbol],
                'Volume': stock_data['Volume'][symbol]
            }).dropna()
            
            if len(data) >= min_obs:
                # Add technical indicators for Markov models
                close = data['Close']
                data['MA'] = close.rolling(20).mean()
                bb_std = close.rolling(20).std()
                data['BB_Upper'] = data['MA'] + 2 * bb_std
                data['BB_Lower'] = data['MA'] - 2 * bb_std
                
                # Calculate BB_Position (-1 to 1, where 0 is at MA)
                data['BB_Position'] = (close - data['MA']) / (data['BB_Upper'] - data['MA'])
                data['BB_Position'] = data['BB_Position'].clip(-1, 1)
                
                # BB_Width for other models
                data['BB_Width'] = bb_std / data['MA']
                
                prepared[symbol] = data.dropna()
    
    return prepared

# Prepare all stocks
all_symbols = stock_data['Close'].columns.tolist()
all_prepared_data = prepare_stock_data(stock_data, all_symbols)
print(f"✅ Prepared {len(all_prepared_data)} stocks for training")

# Select high-quality subset for individual models
completeness = (1 - stock_data['Close'].isnull().sum() / len(stock_data['Close'])) * 100
high_quality = completeness[completeness >= 95].index.tolist()
individual_stocks = [s for s in high_quality[:20] if s in all_prepared_data]
print(f"✅ Selected {len(individual_stocks)} stocks for individual models")

# Target stock
target_stock = 'TSLA' if 'TSLA' in individual_stocks else individual_stocks[0]
print(f"🎯 Target stock: {target_stock}")

print(f"\n📊 Data Summary:")
print(f"   Total symbols loaded: {len(all_symbols)}")
print(f"   Symbols with sufficient data: {len(all_prepared_data)}")
print(f"   High-quality stocks: {len(high_quality)}")
print(f"   Individual models: {len(individual_stocks)}")
print(f"   Target stock: {target_stock}")

# Show usage instructions
print(f"\n💡 To load US universe data instead:")
print(f"   1. Uncomment the 'OPTIONAL' section above")
print(f"   2. Comment out the 'DEFAULT' section") 
print(f"   3. Adjust max_symbols parameter as needed")
print(f"   4. Set update=True to fetch fresh data")

✅ Loaded 2315 stocks
✅ Prepared 2298 stocks for training
✅ Selected 20 stocks for individual models
🎯 Target stock: A

📊 Data Summary:
   Total symbols loaded: 2315
   Symbols with sufficient data: 2298
   High-quality stocks: 1924
   Individual models: 20
   Target stock: A

💡 To load US universe data instead:
   1. Uncomment the 'OPTIONAL' section above
   2. Comment out the 'DEFAULT' section
   3. Adjust max_symbols parameter as needed
   4. Set update=True to fetch fresh data


## 2. Train Global Markov Model

In [4]:
from models.unified_markov_model import create_combined_markov_model

print(f"🔄 Training global Markov model on {len(all_prepared_data)} stocks...")
print(f"   Using integer regime system with {N_TREND_STATES}×{N_VOL_STATES} = {N_TREND_STATES * N_VOL_STATES} regimes")

# Create unified Markov model that uses centralized regime configuration
global_markov = create_combined_markov_model()

# Fit the model on all prepared data
global_markov.fit(all_prepared_data)

# Show model summary
if global_markov.fitted:
    summary = global_markov.get_model_summary()
    print(f"✅ Global Markov model trained successfully")
    print(f"   Model type: {summary['model_type']}")
    print(f"   States: {summary['n_states']}")
    print(f"   Using centralized regime config: ✅")
    
    # Show some state statistics
    state_stats = summary['state_statistics']
    top_states = sorted(state_stats.items(), key=lambda x: x[1]['frequency'], reverse=True)[:5]
    print(f"   Top 5 regimes by frequency:")
    for state, stats in top_states:
        print(f"     {state}: {stats['frequency']:.3f} ({stats['count']} obs)")
else:
    print(f"❌ Global Markov model training failed")

🔄 Training global Markov model on 2298 stocks...
   Using integer regime system with 7×5 = 35 regimes
Initialized combined Markov model with 35 states:
States: ['very_strong_bear_very_low', 'very_strong_bear_low', 'very_strong_bear_medium', 'very_strong_bear_high', 'very_strong_bear_very_high', 'strong_bear_very_low', 'strong_bear_low', 'strong_bear_medium', 'strong_bear_high', 'strong_bear_very_high', 'bear_very_low', 'bear_low', 'bear_medium', 'bear_high', 'bear_very_high', 'sideways_very_low', 'sideways_low', 'sideways_medium', 'sideways_high', 'sideways_very_high', 'bull_very_low', 'bull_low', 'bull_medium', 'bull_high', 'bull_very_high', 'strong_bull_very_low', 'strong_bull_low', 'strong_bull_medium', 'strong_bull_high', 'strong_bull_very_high', 'very_strong_bull_very_low', 'very_strong_bull_low', 'very_strong_bull_medium', 'very_strong_bull_high', 'very_strong_bull_very_high']
🔧 Fitting combined Markov model...
  📊 Collected 2659467 regime observations from 2298 stocks
  📈 Estima

## 3. Global Model Training Only

**Removed per-stock training - all models are now global only!**

In [5]:
# REMOVED: Individual stock Markov training - now using global models only!
print("🔄 Individual Markov model training has been removed from the pipeline")
print("   All predictions now use the global Markov model trained on all stocks")
print("   This provides more robust and deterministic results")

individual_markov = {}  # Keep for compatibility
successful_models = 0

print(f"✅ Global-only training approach adopted")
print(f"   Using unified Markov model with {global_markov.n_states} states")
print(f"   Trained on {len(all_prepared_data)} stocks globally")

🔄 Individual Markov model training has been removed from the pipeline
   All predictions now use the global Markov model trained on all stocks
   This provides more robust and deterministic results
✅ Global-only training approach adopted
   Using unified Markov model with 35 states
   Trained on 2298 stocks globally


## 4. Train Close Price KDE Models

In [6]:
from models.global_kde_models import train_global_models

print(f"🔄 Training global KDE models on ALL data using integer regime system...")
print(f"   Using {N_TREND_STATES} trend states × {N_VOL_STATES} vol states = {N_TREND_STATES * N_VOL_STATES} total regimes")

# Train all global models on complete dataset using new integer regime system
global_models = train_global_models(all_prepared_data, min_samples=50)

# Extract individual models for compatibility
global_close_kde = global_models['close_kde']
global_open_kde = global_models['open_kde'] 
global_hl_copula = global_models['hl_copula']

print(f"✅ Global KDE models trained on all {len(all_prepared_data)} stocks")
print(f"   Close Price KDE: {'✅' if global_close_kde else '❌'}")
print(f"   Open Price KDE: {'✅' if global_open_kde else '❌'}")
print(f"   High-Low Copula: {'✅' if global_hl_copula else '❌'}")

# Show regime statistics for first successful model
if global_close_kde and global_close_kde.fitted:
    regime_count = len(global_close_kde.kde_models)
    total_regimes = len(global_close_kde.regime_stats)
    print(f"\n📊 Close Price KDE Statistics (Integer Regime System):")
    print(f"   KDE Models: {regime_count} regimes")
    print(f"   Total Regimes: {total_regimes} identified")
    
    if regime_count > 0:
        top_regimes = list(global_close_kde.kde_models.keys())[:3]
        print(f"   Top Regimes: {', '.join(top_regimes)}")
        
        # Show state-label conversion for regimes
        print(f"\n🔄 Regime State Analysis:")
        from models.regime_classifier import REGIME_CLASSIFIER
        for regime in top_regimes[:2]:  # Show first 2 regimes
            try:
                trend_state, vol_state = REGIME_CONFIG.label_to_state(regime)
                trend_label = REGIME_CONFIG.trend.get_state_label(trend_state)
                vol_label = REGIME_CONFIG.volatility.get_state_label(vol_state)
                print(f"   '{regime}' = trend_{trend_state} ({trend_label}) + vol_{vol_state} ({vol_label})")
            except:
                print(f"   '{regime}' = descriptive label")

if global_open_kde and global_open_kde.fitted:
    regime_count = len(global_open_kde.kde_models)
    print(f"\n📊 Open Price KDE Statistics:")
    print(f"   KDE Models: {regime_count} regimes")

if global_hl_copula and global_hl_copula.fitted:
    regime_count = len(global_hl_copula.copula_models)
    print(f"\n📊 High-Low Copula Statistics:")
    print(f"   Copula Models: {regime_count} regimes")

# Show regime configuration being used
print(f"\n🎛️ Using Integer Regime Configuration:")
print(f"   Trend states: {REGIME_CONFIG.trend.get_all_states()} → {REGIME_CONFIG.trend.get_all_labels()}")
print(f"   Vol states: {REGIME_CONFIG.volatility.get_all_states()} → {REGIME_CONFIG.volatility.get_all_labels()}")

🔄 Training global KDE models on ALL data using integer regime system...
   Using 7 trend states × 5 vol states = 35 total regimes
🚀 Training All Global Models on All Data
🌍 Training Global Close Price KDE Models
  📊 Processed 50 stocks...
  📊 Processed 100 stocks...
  📊 Processed 150 stocks...
  📊 Processed 200 stocks...
  📊 Processed 250 stocks...
  📊 Processed 300 stocks...
  📊 Processed 350 stocks...
  📊 Processed 400 stocks...
  📊 Processed 450 stocks...
  📊 Processed 500 stocks...
  📊 Processed 550 stocks...
  📊 Processed 600 stocks...
  📊 Processed 650 stocks...
  📊 Processed 700 stocks...
  📊 Processed 750 stocks...
  📊 Processed 800 stocks...
  📊 Processed 850 stocks...
  📊 Processed 900 stocks...
  📊 Processed 950 stocks...
  📊 Processed 1000 stocks...
  📊 Processed 1050 stocks...
  📊 Processed 1100 stocks...
  📊 Processed 1150 stocks...
  📊 Processed 1200 stocks...
  📊 Processed 1250 stocks...
  📊 Processed 1300 stocks...
  📊 Processed 1350 stocks...
  📊 Processed 1400 stocks

## 5. Train Open Price Models

In [7]:
# Open price models are now trained globally in previous step
print(f"✅ Open price models already trained globally")
print(f"   Global Open KDE covers all {len(all_prepared_data)} stocks")
print(f"   Regime-resolved by trend and volatility")

# For compatibility, create reference
open_forecaster = global_open_kde

✅ Open price models already trained globally
   Global Open KDE covers all 2298 stocks
   Regime-resolved by trend and volatility


## 6. Train High/Low Copula Models

In [8]:
# High/Low copula models are now trained globally in previous step
print(f"✅ High/Low copula models already trained globally")
print(f"   Global High-Low Copula covers all {len(all_prepared_data)} stocks")
print(f"   Regime-resolved by trend and volatility")

# For compatibility, create reference
hl_forecaster = global_hl_copula

✅ High/Low copula models already trained globally
   Global High-Low Copula covers all 2298 stocks
   Regime-resolved by trend and volatility


## 7. Train Individual ARIMA-GARCH Models

**Enhanced to use auto_arima with GARCH(1,1) - individual stock training required for time series models!**

In [9]:
from models.arima_garch_models import CombinedARIMAGARCHModel

print(f"🔄 Training individual ARIMA-GARCH models...")
print(f"   Note: ARIMA-GARCH models MUST be trained per stock (time series specific)")

arima_garch_models = {}

for symbol in individual_stocks:
    try:
        close_prices = all_prepared_data[symbol]['Close']
        
        # Fit combined ARIMA (for MA) + GARCH (for BB) model
        model = CombinedARIMAGARCHModel(ma_window=20, bb_std=2.0)
        model.fit(close_prices)
        arima_garch_models[symbol] = model
        
        # Print model summary
        summary = model.get_model_summary()
        arima_type = summary['arima_summary'].get('model_type', 'Unknown')
        garch_type = summary['garch_summary'].get('model_type', 'Unknown')
        print(f"✅ {symbol}: ARIMA-{arima_type} + GARCH-{garch_type}")
        
    except Exception as e:
        print(f"⚠️ {symbol} ARIMA-GARCH failed: {str(e)[:50]}")
        arima_garch_models[symbol] = None

successful_models = sum(1 for m in arima_garch_models.values() if m is not None and m.fitted)
print(f"✅ ARIMA-GARCH models trained: {successful_models}/{len(individual_stocks)}")

# Show detailed summary for first model
if individual_stocks and individual_stocks[0] in arima_garch_models:
    first_model = arima_garch_models[individual_stocks[0]]
    if first_model and first_model.fitted:
        print(f"\n📊 Model Summary for {individual_stocks[0]}:")
        summary = first_model.get_model_summary()
        print(f"   ARIMA Status: {summary['arima_summary']['status']}")
        print(f"   GARCH Status: {summary['garch_summary']['status']}")
        if 'arima_order' in summary['arima_summary']:
            print(f"   ARIMA Order: {summary['arima_summary']['arima_order']}")
        print(f"   Current MA: ${summary['arima_summary'].get('current_ma', 0):.2f}")
        print(f"   Current BB Width: {summary['garch_summary'].get('current_bb_width', 0):.4f}")

print(f"\n🎯 ARIMA-GARCH Training Summary:")
print(f"   Individual training: ✅ REQUIRED (time series models)")
print(f"   Uses auto_arima for optimal order selection")
print(f"   Uses GARCH(1,1) for volatility modeling")
print(f"   Trained on: 20-day MA and Bollinger Band width")

🔄 Training individual ARIMA-GARCH models...
   Note: ARIMA-GARCH models MUST be trained per stock (time series specific)
🚀 Training Combined ARIMA-GARCH Model
🔄 Fitting ARIMA model for 20-day moving average...
✅ ARIMA model fitted: (1, 1, 0)
🔄 Fitting GARCH model for Bollinger Band volatility...
✅ GARCH model fitted for BB volatility
✅ Combined ARIMA-GARCH model fitted
✅ A: ARIMA-ARIMA + GARCH-GARCH
🚀 Training Combined ARIMA-GARCH Model
🔄 Fitting ARIMA model for 20-day moving average...
✅ ARIMA model fitted: (0, 2, 0)
🔄 Fitting GARCH model for Bollinger Band volatility...
✅ GARCH model fitted for BB volatility
✅ Combined ARIMA-GARCH model fitted
✅ AA: ARIMA-ARIMA + GARCH-GARCH
🚀 Training Combined ARIMA-GARCH Model
🔄 Fitting ARIMA model for 20-day moving average...
✅ ARIMA model fitted: (2, 1, 1)
🔄 Fitting GARCH model for Bollinger Band volatility...
✅ GARCH model fitted for BB volatility
✅ Combined ARIMA-GARCH model fitted
✅ AAL: ARIMA-ARIMA + GARCH-GARCH
🚀 Training Combined ARIMA-GARC

## 8. Integrate Models and Make Prediction

In [10]:
print(f"🔮 Making prediction for {target_stock} using integer regime system...")

# Get target stock data
target_data = all_prepared_data[target_stock]
current_close = target_data['Close'].iloc[-1]
current_ma = target_data['MA'].iloc[-1]

# Generate forecasts using new ARIMA-GARCH model
forecast_days = 10

# Use ARIMA-GARCH model if available
if target_stock in arima_garch_models and arima_garch_models[target_stock] and arima_garch_models[target_stock].fitted:
    arima_garch_forecast = arima_garch_models[target_stock].forecast(horizon=forecast_days)
    
    # Extract MA and volatility forecasts
    ma_forecast = arima_garch_forecast['ma_forecast']
    bb_width_forecast = arima_garch_forecast['bb_width_forecast']
    
    # Convert BB width to volatility for compatibility
    vol_forecast = bb_width_forecast
    
    print(f"✅ Using ARIMA-GARCH forecasts:")
    print(f"   ARIMA Model: {arima_garch_forecast['arima_model_type']}")
    print(f"   GARCH Model: {arima_garch_forecast['garch_model_type']}")
    print(f"   MA Range: ${ma_forecast[0]:.2f} → ${ma_forecast[-1]:.2f}")
    print(f"   BB Width Range: {bb_width_forecast[0]:.4f} → {bb_width_forecast[-1]:.4f}")
    
else:
    # Fallback to simple forecasts
    ma_forecast = np.linspace(current_ma, current_ma * 1.01, forecast_days)
    vol_forecast = np.full(forecast_days, 0.025)
    print("⚠️ Using fallback MA and volatility forecasts")

# Generate predictions using global models with integer regime system
print(f"\n🎯 Using Global Models with Integer Regime System:")

# Determine current regime using integer states
from models.regime_classifier import REGIME_CLASSIFIER

current_returns = target_data['Close'].pct_change().tail(20)
ma_series = target_data['MA'].tail(20)

# Classify using integer regime system (both states and labels)
trend_states = REGIME_CLASSIFIER.classify_trend(ma_series, return_states=True)
vol_states = REGIME_CLASSIFIER.classify_volatility(current_returns, return_states=True)
trend_labels = REGIME_CLASSIFIER.classify_trend(ma_series, return_states=False)  
vol_labels = REGIME_CLASSIFIER.classify_volatility(current_returns, return_states=False)

# Get current regime
current_trend_state = trend_states.iloc[-1] if len(trend_states) > 0 else REGIME_CONFIG.fallback_trend_state
current_vol_state = vol_states.iloc[-1] if len(vol_states) > 0 else REGIME_CONFIG.fallback_volatility_state
current_trend_label = REGIME_CONFIG.trend.get_state_label(current_trend_state)
current_vol_label = REGIME_CONFIG.volatility.get_state_label(current_vol_state)
current_regime = f"{current_trend_label}_{current_vol_label}"

print(f"   Current Regime (Integer): trend_{current_trend_state} + vol_{current_vol_state}")
print(f"   Current Regime (Labels): {current_trend_label} + {current_vol_label} = {current_regime}")
print(f"   Current MA: ${current_ma:.2f}")
print(f"   Current Close: ${current_close:.2f}")

# Show regime mapping
print(f"\n🔄 Integer-to-Label Regime Mapping:")
print(f"   trend_{current_trend_state} = {current_trend_label}")
print(f"   vol_{current_vol_state} = {current_vol_label}")
print(f"   Combined: {current_regime}")

# Generate day-by-day predictions using global models
daily_predictions = []

for day in range(forecast_days):
    day_ma = ma_forecast[day]
    day_vol = vol_forecast[day]
    
    try:
        # Sample close price from global close KDE
        if global_close_kde and global_close_kde.fitted:
            close_samples = global_close_kde.sample_close_price(current_regime, day_ma, n_samples=5)
            pred_close = np.mean(close_samples)
        else:
            pred_close = day_ma * (1 + np.random.normal(0, day_vol))
        
        # Sample gap for next day's open
        if global_open_kde and global_open_kde.fitted and day < forecast_days - 1:
            gap_samples = global_open_kde.sample_gap(current_regime, n_samples=5)
            next_open = pred_close * (1 + np.mean(gap_samples))
        else:
            next_open = pred_close * (1 + np.random.normal(0, 0.005))
        
        # Sample high/low from copula
        if global_hl_copula and global_hl_copula.fitted:
            ref_price = (pred_close + (next_open if day < forecast_days - 1 else pred_close)) / 2
            hl_samples = global_hl_copula.sample_high_low(current_regime, ref_price, n_samples=5)
            pred_high = np.mean(hl_samples['high'])
            pred_low = np.mean(hl_samples['low'])
        else:
            # Fallback high/low
            pred_high = max(pred_close, next_open) * (1 + day_vol)
            pred_low = min(pred_close, next_open) * (1 - day_vol)
        
        daily_predictions.append({
            'day': day + 1,
            'open': next_open if day > 0 else current_close * 1.001,
            'high': pred_high,
            'low': pred_low,
            'close': pred_close,
            'ma': day_ma
        })
        
    except Exception as e:
        # Fallback simple prediction
        pred_close = day_ma * (1 + np.random.normal(0, 0.01))
        daily_predictions.append({
            'day': day + 1,
            'open': pred_close * 1.001,
            'high': pred_close * 1.01,
            'low': pred_close * 0.99,
            'close': pred_close,
            'ma': day_ma
        })

# Calculate summary metrics
final_price = daily_predictions[-1]['close']
total_return = (final_price - current_close) / current_close * 100
avg_daily_range = np.mean([pred['high'] - pred['low'] for pred in daily_predictions])

print(f"\n💰 PREDICTION RESULTS for {target_stock} (Integer Regime System):")
print(f"   Current Price: ${current_close:.2f}")
print(f"   {forecast_days}-Day Prediction: ${final_price:.2f}")
print(f"   Expected Return: {total_return:.2f}%")
print(f"   Average Daily Range: ${avg_daily_range:.2f}")
print(f"   Regime Used: {current_regime} (trend_{current_trend_state}_vol_{current_vol_state})")

# Model utilization summary
models_used = {
    'global_markov': global_markov.fitted if hasattr(global_markov, 'fitted') else True,
    'individual_markov': target_stock in individual_markov,
    'global_close_kde': global_close_kde is not None and global_close_kde.fitted,
    'global_open_kde': global_open_kde is not None and global_open_kde.fitted,
    'global_hl_copula': global_hl_copula is not None and global_hl_copula.fitted,
    'arima_garch_model': target_stock in arima_garch_models and arima_garch_models[target_stock] and arima_garch_models[target_stock].fitted
}

print(f"\n🔧 Models Used: {sum(models_used.values())}/6")
for model, used in models_used.items():
    status = '✅' if used else '❌'
    print(f"   {model}: {status}")

print(f"\n✅ Training pipeline completed - {datetime.now().strftime('%H:%M:%S')}")

# Show detailed forecast table
if len(daily_predictions) > 0:
    print(f"\n📊 {forecast_days}-Day Detailed Forecast:")
    print(f"{'Day':<4} {'Open':<8} {'High':<8} {'Low':<8} {'Close':<8} {'MA':<8} {'Range':<8}")
    print("-" * 60)
    
    for pred in daily_predictions:
        day = pred['day']
        open_p = pred['open']
        high_p = pred['high']
        low_p = pred['low']
        close_p = pred['close']
        ma_p = pred['ma']
        range_p = high_p - low_p
        
        print(f"{day:<4} ${open_p:<7.2f} ${high_p:<7.2f} ${low_p:<7.2f} ${close_p:<7.2f} ${ma_p:<7.2f} ${range_p:<7.2f}")

print(f"\n🎯 Integer Regime System Summary:")
print(f"   Configuration: {N_TREND_STATES} trend × {N_VOL_STATES} vol = {N_TREND_STATES * N_VOL_STATES} regimes")
print(f"   Global models trained on {len(all_prepared_data)} stocks")
print(f"   Predicted using regime: trend_{current_trend_state} ({current_trend_label}) + vol_{current_vol_state} ({current_vol_label})")
print(f"   Target stock: {target_stock}")

🔮 Making prediction for A using integer regime system...
✅ Using ARIMA-GARCH forecasts:
   ARIMA Model: ARIMA
   GARCH Model: GARCH
   MA Range: $117.17 → $115.15
   BB Width Range: 0.0322 → 0.0354

🎯 Using Global Models with Integer Regime System:
   Current Regime (Integer): trend_0 + vol_4
   Current Regime (Labels): strong_bear + vol_4 = strong_bear_vol_4
   Current MA: $117.46
   Current Close: $114.89

🔄 Integer-to-Label Regime Mapping:
   trend_0 = strong_bear
   vol_4 = vol_4
   Combined: strong_bear_vol_4

💰 PREDICTION RESULTS for A (Integer Regime System):
   Current Price: $114.89
   10-Day Prediction: $113.85
   Expected Return: -0.90%
   Average Daily Range: $2.24
   Regime Used: strong_bear_vol_4 (trend_0_vol_4)

🔧 Models Used: 5/6
   global_markov: ✅
   individual_markov: ❌
   global_close_kde: ✅
   global_open_kde: ✅
   global_hl_copula: ✅
   arima_garch_model: ✅

✅ Training pipeline completed - 21:02:11

📊 10-Day Detailed Forecast:
Day  Open     High     Low      Close

## Summary

**Training Completed with Global Models + Individual ARIMA-GARCH:**
1. ✅ **Regime Configuration**: Integer states configured at pipeline start
2. ✅ **Global Markov Model**: Trained on all stocks with sparse bucket diagnostics 
3. ✅ **Global-Only Training**: Removed per-stock Markov models for robustness
4. ✅ **Close Price KDE Models**: Global training with integer regime resolution
5. ✅ **Open Price Models**: Global KDE with trend/volatility resolution
6. ✅ **High/Low Copula Models**: Global copula with regime resolution
7. ✅ **Individual ARIMA-GARCH Models**: Per-stock time series models (required)
8. ✅ **Integrated Prediction**: Using integer regime system for forecasting
9. ✅ **Markov Heatmap Visualization**: Transition matrix visualization added
10. ✅ **OHLC Trajectory Visualization**: Candlestick chart visualization added

**Key Features of Updated Pipeline:**
- **Mixed training approach**: Global for regime models, individual for time series
- **Sparse bucket diagnostics**: Identifies least populated (trend, BB-position) buckets
- **Integer-based regimes**: trend_0, trend_1, ..., vol_0, vol_1, ...
- **Flexible configuration**: Easy to change number of states at top
- **State-label conversion**: Both integer states and descriptive labels
- **Visualization capabilities**: Built-in Markov heatmaps and OHLC trajectories

In [None]:
# Pipeline complete - integer regime system successfully integrated!
print("🎉 Streamlined Training Pipeline with Integer Regime System Complete!")
print("=" * 70)
print("✅ All models trained using the new integer-based regime configuration")
print("✅ Regimes can be easily modified by changing N_TREND_STATES and N_VOL_STATES")
print("✅ Both integer states and descriptive labels are supported")
print("✅ Backwards compatibility maintained with existing code")
print("=" * 70)

## Visualizations

**New visualization capabilities added to the pipeline!**

In [None]:
# =============================================================================
# VISUALIZATION 1: Markov Transition Matrix Heatmap
# =============================================================================

print("🎨 Creating Markov Transition Matrix Heatmap...")

import matplotlib.pyplot as plt
import seaborn as sns

# Create a visualization of the global Markov transition matrix
if global_markov and global_markov.fitted:
    # Set up the plot
    fig, ax = plt.subplots(1, 1, figsize=(12, 10))
    
    # Create heatmap of transition matrix
    transition_matrix = global_markov.transition_matrix
    state_labels = global_markov.states
    
    # Truncate labels for readability if too many states
    if len(state_labels) > 10:
        display_labels = [label[:15] + '...' if len(label) > 15 else label for label in state_labels]
    else:
        display_labels = state_labels
    
    sns.heatmap(
        transition_matrix,
        xticklabels=display_labels,
        yticklabels=display_labels, 
        annot=True if len(state_labels) <= 10 else False,  # Only annotate if not too many states
        fmt='.3f',
        cmap='Blues',
        ax=ax,
        cbar_kws={'label': 'Transition Probability'},
        square=True
    )
    
    ax.set_title(f'Global Markov Transition Matrix\n({len(state_labels)} Combined Regimes)', 
                fontsize=14, fontweight='bold')
    ax.set_xlabel('Next State', fontsize=12)
    ax.set_ylabel('Current State', fontsize=12)
    
    # Rotate labels for better readability
    plt.xticks(rotation=45, ha='right')
    plt.yticks(rotation=0)
    plt.tight_layout()
    
    plt.show()
    
    print(f"✅ Markov transition matrix heatmap displayed")
    print(f"   Matrix size: {transition_matrix.shape[0]}×{transition_matrix.shape[1]}")
    print(f"   Combined regimes: {len(state_labels)}")
    
else:
    print("❌ No global Markov model available for visualization")

In [None]:
# =============================================================================
# VISUALIZATION 2: OHLC Trajectory Candlestick Chart
# =============================================================================

print("🎨 Creating OHLC Trajectory Candlestick Chart...")

from matplotlib.patches import Rectangle
from matplotlib.dates import DateFormatter, DayLocator
from datetime import datetime, timedelta

# Generate a complete OHLC trajectory using all trained models
forecast_days_viz = 20  # Number of days for visualization
viz_target = target_stock

print(f"🎯 Generating {forecast_days_viz}-day OHLC trajectory for {viz_target}")

# Use the daily predictions we already calculated, or generate new ones
if len(daily_predictions) < forecast_days_viz:
    print("   Generating additional predictions for visualization...")
    
    # Extend predictions if needed
    viz_predictions = daily_predictions.copy()
    
    for day in range(len(daily_predictions), forecast_days_viz):
        # Generate simple prediction for visualization
        if day == 0:
            prev_close = current_close
        else:
            prev_close = viz_predictions[-1]['close']
        
        # Simple random walk with trend
        trend_factor = 0.001 if current_regime.find('bull') >= 0 else (-0.001 if current_regime.find('bear') >= 0 else 0)
        close_price = prev_close * (1 + trend_factor + np.random.normal(0, 0.015))
        open_price = prev_close * (1 + np.random.normal(0, 0.005))  # Small gap
        high_price = max(open_price, close_price) * (1 + np.random.uniform(0.005, 0.015))
        low_price = min(open_price, close_price) * (1 - np.random.uniform(0.005, 0.015))
        
        viz_predictions.append({
            'day': day + 1,
            'open': open_price,
            'high': high_price,
            'low': low_price,
            'close': close_price,
            'ma': close_price * 0.98  # Approximate MA
        })
else:
    viz_predictions = daily_predictions[:forecast_days_viz]

print(f"   Generated {len(viz_predictions)} days of OHLC data")

# Create the candlestick chart
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 10), height_ratios=[3, 1], sharex=True)

# Generate dates for the forecast
start_date = datetime.now().date() + timedelta(days=1)
dates = [start_date + timedelta(days=i) for i in range(len(viz_predictions))]

# Plot candlesticks
for i, (pred, date) in enumerate(zip(viz_predictions, dates)):
    open_price = pred['open']
    high_price = pred['high'] 
    low_price = pred['low']
    close_price = pred['close']
    
    # Determine candle color
    color = 'green' if close_price > open_price else 'red'
    edge_color = 'darkgreen' if close_price > open_price else 'darkred'
    
    # Draw high-low line
    ax1.plot([i, i], [low_price, high_price], color=edge_color, linewidth=1.5)
    
    # Draw candle body
    body_height = abs(close_price - open_price)
    body_bottom = min(open_price, close_price)
    
    candle = Rectangle((i - 0.3, body_bottom), 0.6, body_height,
                      facecolor=color, edgecolor=edge_color, alpha=0.8)
    ax1.add_patch(candle)

# Plot moving average line
ma_values = [pred['ma'] for pred in viz_predictions]
ax1.plot(range(len(viz_predictions)), ma_values, color='blue', linewidth=2, alpha=0.7, label='20-day MA')

# Format main chart
ax1.set_title(f'Simulated OHLC Trajectory for {viz_target}\nUsing Global Models with Regime: {current_regime}', 
             fontsize=14, fontweight='bold')
ax1.set_ylabel('Price ($)', fontsize=12)
ax1.grid(True, alpha=0.3)
ax1.legend()

# Format y-axis as currency
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:.2f}'))

# Plot daily ranges in subplot
daily_ranges = [pred['high'] - pred['low'] for pred in viz_predictions]
ax2.bar(range(len(viz_predictions)), daily_ranges, color='purple', alpha=0.6)
ax2.set_ylabel('Daily Range ($)', fontsize=10)
ax2.set_xlabel('Forecast Day', fontsize=12)
ax2.set_title('Daily Price Ranges', fontsize=12)
ax2.grid(True, alpha=0.3)

# Format x-axis
ax2.set_xticks(range(0, len(viz_predictions), max(1, len(viz_predictions)//10)))
ax2.set_xticklabels([f'Day {i+1}' for i in range(0, len(viz_predictions), max(1, len(viz_predictions)//10))])

plt.tight_layout()
plt.show()

print(f"✅ OHLC trajectory candlestick chart displayed")
print(f"   Forecast period: {forecast_days_viz} days")
print(f"   Target stock: {viz_target}")
print(f"   Using regime: {current_regime}")
print(f"   Price range: ${min([p['low'] for p in viz_predictions]):.2f} - ${max([p['high'] for p in viz_predictions]):.2f}")

# Summary statistics
total_return_viz = (viz_predictions[-1]['close'] - viz_predictions[0]['open']) / viz_predictions[0]['open'] * 100
avg_daily_range_viz = np.mean(daily_ranges)
max_daily_range_viz = np.max(daily_ranges)

print(f"\n📊 Trajectory Statistics:")
print(f"   Total simulated return: {total_return_viz:.2f}%")
print(f"   Average daily range: ${avg_daily_range_viz:.2f}")
print(f"   Maximum daily range: ${max_daily_range_viz:.2f}")
print(f"   Volatility estimate: {np.std([p['close']/viz_predictions[max(0,i-1)]['close']-1 for i, p in enumerate(viz_predictions[1:])]) * 100:.2f}%")