# SVD Latent Factor Analysis

This notebook provides comprehensive visualizations and analysis of the latent factors estimated from rolling SVD decomposition of stock returns.

## Analysis Overview:
1. **Factor Loading Heatmaps** - Visualize how assets load on each factor
2. **Factor Time Series** - Track factor evolution over time
3. **Factor Stability** - Analyze loading consistency across time
4. **Sector Analysis** - Understand factor economic interpretation
5. **Factor Correlations** - Examine relationships between factors
6. **Interactive Dashboards** - Dynamic exploration tools

In [3]:
import sys, os
sys.path.append(os.path.abspath('../src'))

# Load environment variables
from dotenv import load_dotenv
load_dotenv('../.env')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11

# Import our modules
from data.preprocess import preprocess_price_matrix
from data.loader import get_multiple_stocks
from models.svd import rolling_svd_factors

print("📊 SVD Factor Analysis Notebook Initialized")
print(f"📅 Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M')}")

📊 SVD Factor Analysis Notebook Initialized
📅 Analysis Date: 2025-08-03 10:48


## 1. Data Loading and SVD Computation

In [4]:
# Load stock data
print("📈 Loading stock data...")

# Load stock symbols
s_and_p_500_constituents = pd.read_csv('../cache/constituents.csv')
nasdaq_100_constituents = pd.read_csv('../cache/nasdaq-100.csv')
active_tickers = sorted(list(set(s_and_p_500_constituents['Symbol'].dropna()) | set(nasdaq_100_constituents['Symbol'].dropna())))

print(f"📊 Total symbols: {len(active_tickers)}")

# Get stock data (from cache)
stock_data = get_multiple_stocks(active_tickers, update=False, rate_limit=5.0)
close_prices = stock_data['Close']

print(f"💹 Price data shape: {close_prices.shape}")
print(f"📅 Date range: {close_prices.index.min()} to {close_prices.index.max()}")

📈 Loading stock data...
📊 Total symbols: 520
Loading cached data from /mnt/a61cc0e8-1b32-4574-a771-4ad77e8faab6/conda/technical_dashboard/cache/stock_data.pkl
💹 Price data shape: (1256, 517)
📅 Date range: 2020-08-03 00:00:00 to 2025-08-01 00:00:00


In [5]:
# Preprocess data for SVD
print("🔄 Preprocessing data for SVD...")

pre_scaled = preprocess_price_matrix(
    close_prices, 
    winsorize_span=40,
    method='log_return',
    rolling_window=5
)

print(f"✅ Preprocessed data shape: {pre_scaled.shape}")
print(f"📊 Final assets: {len(pre_scaled.columns)}")
print(f"📅 Analysis period: {pre_scaled.index.min()} to {pre_scaled.index.max()}")

🔄 Preprocessing data for SVD...
Dropped assets due to lookback_days requirement:
['ABNB', 'APP', 'ARM', 'CEG', 'DASH', 'EXE', 'GEHC', 'GEV', 'GFS', 'GRAL', 'HOOD', 'KVUE', 'PLTR', 'SOLV', 'VLTO']
Total values winsorized: 2,209
✅ Preprocessed data shape: (1304, 502)
📊 Final assets: 502
📅 Analysis period: 2020-08-04 00:00:00 to 2025-08-01 00:00:00


In [6]:
# Compute rolling SVD factors
print("🔍 Computing rolling SVD factors...")

loadings_df, components_df, explained_var_df = rolling_svd_factors(
    X=pre_scaled,
    dates=pre_scaled.index,
    assets=pre_scaled.columns,
    window_size=180,  # 6-month rolling window
    n_components=10   # Extract top 10 factors
)

print(f"✅ SVD computation complete!")
print(f"📊 Loadings shape: {loadings_df.shape}")
print(f"📈 Components shape: {components_df.shape}")
print(f"📉 Explained variance shape: {explained_var_df.shape}")

🔍 Computing rolling SVD factors...
✅ SVD computation complete!
📊 Loadings shape: (564248, 10)
📈 Components shape: (1124, 10)
📉 Explained variance shape: (1124, 10)


## 2. Factor Loading Visualizations

In [None]:
# Latest factor loadings heatmap
def plot_latest_loadings_heatmap(loadings_df, n_components=10, top_assets=50):
    """
    Plot heatmap of latest factor loadings for top assets by absolute loading magnitude
    """
    # Get latest date
    latest_date = loadings_df.index.get_level_values('date').max()
    latest_loadings = loadings_df.xs(latest_date, level='date')
    
    # Select top assets by total absolute loading magnitude
    asset_importance = latest_loadings.abs().sum(axis=1).sort_values(ascending=False)
    top_assets_list = asset_importance.head(top_assets).index
    
    # Create heatmap data
    heatmap_data = latest_loadings.loc[top_assets_list, [f'PC{i+1}' for i in range(n_components)]]
    
    # Plot
    fig, ax = plt.subplots(figsize=(14, 20))
    sns.heatmap(
        heatmap_data, 
        cmap='RdBu_r', 
        center=0,
        annot=False,
        fmt='.3f',
        cbar_kws={'label': 'Factor Loading'},
        ax=ax
    )
    
    ax.set_title(f'Factor Loadings Heatmap - Top {top_assets} Assets\n{latest_date.strftime("%Y-%m-%d")}', 
                fontsize=16, fontweight='bold', pad=20)
    ax.set_xlabel('Principal Components', fontsize=12, fontweight='bold')
    ax.set_ylabel('Assets', fontsize=12, fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    
    return heatmap_data

print("🔥 Factor Loadings Heatmap (Latest Date)")
heatmap_data = plot_latest_loadings_heatmap(loadings_df, n_components=10, top_assets=50)

In [2]:
# Factor loadings distribution analysis
def analyze_loading_distributions(loadings_df, n_components=6):
    """
    Analyze the distribution of factor loadings across all assets and time
    """
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    axes = axes.flatten()
    
    for i in range(n_components):
        component_col = f'PC{i+1}'
        
        # Get all loadings for this component across all dates
        component_loadings = loadings_df[component_col].values
        
        # Plot distribution
        axes[i].hist(component_loadings, bins=50, alpha=0.7, color=sns.color_palette()[i])
        axes[i].axvline(0, color='red', linestyle='--', alpha=0.8)
        axes[i].set_title(f'{component_col} Loading Distribution', fontweight='bold')
        axes[i].set_xlabel('Loading Value')
        axes[i].set_ylabel('Frequency')
        axes[i].grid(True, alpha=0.3)
        
        # Add stats
        mean_loading = np.mean(component_loadings)
        std_loading = np.std(component_loadings)
        axes[i].text(0.02, 0.98, f'μ={mean_loading:.3f}\nσ={std_loading:.3f}', 
                    transform=axes[i].transAxes, verticalalignment='top',
                    bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
    
    plt.suptitle('Factor Loading Distributions Across All Assets and Time', 
                fontsize=16, fontweight='bold', y=1.02)
    plt.tight_layout()
    plt.show()

print("📊 Factor Loading Distributions")
analyze_loading_distributions(loadings_df)

📊 Factor Loading Distributions


NameError: name 'loadings_df' is not defined

## 3. Factor Time Series Analysis

In [None]:
# Factor time series analysis
def analyze_factor_time_series(components_df, n_components=10, window=252):
    """
    Analyze factor time series with statistical summary (skipping plots due to matplotlib issues)
    """
    # Recent period for detailed view
    recent_data = components_df.tail(window) if len(components_df) > window else components_df
    
    print(f"📈 Factor Time Series Analysis - Last {len(recent_data)} observations")
    print("=" * 70)
    
    # Statistical analysis for each factor
    for i in range(n_components):
        component_col = f'PC{i+1}'
        
        if component_col not in recent_data.columns:
            continue
            
        factor_data = recent_data[component_col]
        rolling_mean = factor_data.rolling(window=21).mean()
        
        # Calculate key statistics
        vol = factor_data.std()
        mean = factor_data.mean()
        current_value = factor_data.iloc[-1]
        current_ma = rolling_mean.iloc[-1]
        min_val = factor_data.min()
        max_val = factor_data.max()
        
        # Trend analysis
        recent_trend = factor_data.tail(20).diff().mean()
        ma_trend = rolling_mean.tail(20).diff().mean()
        
        # Volatility regime
        vol_regime = "High" if vol > recent_data.std().mean() else "Low"
        
        print(f"\\n🔍 {component_col} Factor Analysis:")
        print(f"  Current Value: {current_value:+.4f}")
        print(f"  21-day MA: {current_ma:+.4f}")
        print(f"  Mean: {mean:+.4f}")
        print(f"  Volatility: {vol:.4f} ({vol_regime})")
        print(f"  Range: [{min_val:+.4f}, {max_val:+.4f}]")
        print(f"  Recent Trend: {recent_trend:+.6f} (20-day avg change)")
        print(f"  MA Trend: {ma_trend:+.6f} (20-day avg MA change)")
        
        # Regime identification
        if abs(current_value) > 2 * vol:
            regime = "🔴 Extreme" if current_value > 0 else "🔵 Extreme Negative"
        elif abs(current_value) > vol:
            regime = "🟡 High" if current_value > 0 else "🟣 Low"
        else:
            regime = "🟢 Normal"
        
        print(f"  Current Regime: {regime}")
    
    print(f"\\n📊 FACTOR CORRELATION MATRIX:")
    print("=" * 50)
    correlation_matrix = recent_data.corr()
    print(correlation_matrix.round(3))
    
    print(f"\\n📈 FACTOR SUMMARY STATISTICS:")
    print("=" * 50)
    summary_stats = recent_data.describe()
    print(summary_stats.round(4))
    
    print(f"\\n🎯 KEY INSIGHTS:")
    print("=" * 30)
    
    # Most volatile factor
    volatilities = recent_data.std().sort_values(ascending=False)
    most_volatile = volatilities.index[0]
    least_volatile = volatilities.index[-1]
    
    print(f"Most Volatile Factor: {most_volatile} (σ = {volatilities.iloc[0]:.4f})")
    print(f"Least Volatile Factor: {least_volatile} (σ = {volatilities.iloc[-1]:.4f})")
    
    # Current extreme factors
    current_values = recent_data.iloc[-1]
    factor_vols = recent_data.std()
    standardized_values = current_values / factor_vols
    
    extreme_factors = standardized_values[abs(standardized_values) > 1.5]
    if len(extreme_factors) > 0:
        print(f"\\nFactors in Extreme Regimes:")
        for factor, std_val in extreme_factors.items():
            direction = "High" if std_val > 0 else "Low"
            print(f"  {factor}: {std_val:+.2f}σ ({direction})")
    else:
        print(f"\\nNo factors currently in extreme regimes (>1.5σ)")
    
    # Trend analysis
    trends = {}
    for col in recent_data.columns:
        trend = recent_data[col].tail(20).diff().mean()
        trends[col] = trend
    
    trending_up = {k: v for k, v in trends.items() if v > 0}
    trending_down = {k: v for k, v in trends.items() if v < 0}
    
    if trending_up:
        print(f"\\nFactors Trending Up (20-day avg):")
        for factor, trend in sorted(trending_up.items(), key=lambda x: x[1], reverse=True):
            print(f"  {factor}: +{trend:.6f}/day")
    
    if trending_down:
        print(f"\\nFactors Trending Down (20-day avg):")
        for factor, trend in sorted(trending_down.items(), key=lambda x: x[1]):
            print(f"  {factor}: {trend:.6f}/day")
    
    return recent_data, summary_stats, correlation_matrix

print("📈 Factor Time Series Analysis")
print("⚠️  Note: Plotting temporarily disabled due to matplotlib compatibility issues")
print("📊 Providing comprehensive statistical analysis instead")

recent_data, summary_stats, correlation_matrix = analyze_factor_time_series(components_df, n_components=10, window=252)

In [None]:
# Factor volatility analysis
def analyze_factor_volatility(components_df, window=63):
    """
    Analyze factor volatility patterns over time
    """
    # Calculate rolling volatility
    factor_vols = components_df.rolling(window=window).std()
    
    # Plot volatility heatmap
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(16, 12))
    
    # Time series of volatilities
    factor_vols.plot(ax=ax1, linewidth=1.5, alpha=0.8)
    ax1.set_title(f'Factor Volatility Over Time ({window}-day rolling)', fontweight='bold', fontsize=14)
    ax1.set_ylabel('Volatility')
    ax1.grid(True, alpha=0.3)
    ax1.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    
    # Heatmap of recent volatilities
    recent_vols = factor_vols.tail(100).T  # Last 100 days, transposed
    sns.heatmap(recent_vols, cmap='YlOrRd', ax=ax2, cbar_kws={'label': 'Volatility'})
    ax2.set_title('Factor Volatility Heatmap (Last 100 Days)', fontweight='bold', fontsize=14)
    ax2.set_xlabel('Date')
    ax2.set_ylabel('Factor')
    
    # Reduce x-axis labels for readability
    ax2.set_xticks(ax2.get_xticks()[::20])  # Show every 20th date
    
    plt.tight_layout()
    plt.show()
    
    # Summary statistics
    print("\n📊 Factor Volatility Summary:")
    vol_summary = factor_vols.describe()
    print(vol_summary.round(4))

print("💨 Factor Volatility Analysis")
analyze_factor_volatility(components_df, window=63)

## 4. Explained Variance Analysis

In [None]:
# Explained variance visualization
def plot_explained_variance_analysis(explained_var_df):
    """
    Comprehensive explained variance analysis
    """
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(20, 16))
    
    # 1. Time series of explained variance
    explained_var_df.plot(ax=ax1, linewidth=1.5, alpha=0.8)
    ax1.set_title('Explained Variance by Factor Over Time', fontweight='bold', fontsize=14)
    ax1.set_ylabel('Explained Variance Ratio')
    ax1.grid(True, alpha=0.3)
    ax1.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    
    # 2. Cumulative explained variance
    cumulative_var = explained_var_df.cumsum(axis=1)
    cumulative_var.plot(ax=ax2, linewidth=1.5, alpha=0.8)
    ax2.set_title('Cumulative Explained Variance Over Time', fontweight='bold', fontsize=14)
    ax2.set_ylabel('Cumulative Explained Variance')
    ax2.grid(True, alpha=0.3)
    ax2.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    
    # 3. Average explained variance by factor
    avg_explained = explained_var_df.mean().sort_values(ascending=True)
    avg_explained.plot(kind='barh', ax=ax3, color=sns.color_palette('viridis', len(avg_explained)))
    ax3.set_title('Average Explained Variance by Factor', fontweight='bold', fontsize=14)
    ax3.set_xlabel('Average Explained Variance Ratio')
    ax3.grid(True, alpha=0.3)
    
    # 4. Explained variance stability (coefficient of variation)
    var_stability = (explained_var_df.std() / explained_var_df.mean()).sort_values(ascending=True)
    var_stability.plot(kind='barh', ax=ax4, color=sns.color_palette('plasma', len(var_stability)))
    ax4.set_title('Factor Stability (Lower = More Stable)', fontweight='bold', fontsize=14)
    ax4.set_xlabel('Coefficient of Variation')
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print summary
    print("\n📊 Explained Variance Summary:")
    print(f"Total average explained variance: {explained_var_df.sum(axis=1).mean():.1%}")
    print(f"PC1 average contribution: {explained_var_df['PC1'].mean():.1%}")
    print(f"Top 3 factors average contribution: {explained_var_df[['PC1', 'PC2', 'PC3']].sum(axis=1).mean():.1%}")
    
    return avg_explained, var_stability

print("📈 Explained Variance Analysis")
avg_explained, var_stability = plot_explained_variance_analysis(explained_var_df)

## 5. Factor Correlation Analysis

In [None]:
# Factor correlation analysis
def analyze_factor_correlations(components_df, window=252):
    """
    Analyze correlations between factors
    """
    # Calculate correlation matrix
    factor_corr = components_df.corr()
    
    # Rolling correlations for selected pairs
    rolling_corr_12 = components_df['PC1'].rolling(window=window).corr(components_df['PC2'])
    rolling_corr_13 = components_df['PC1'].rolling(window=window).corr(components_df['PC3'])
    rolling_corr_23 = components_df['PC2'].rolling(window=window).corr(components_df['PC3'])
    
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(20, 16))
    
    # 1. Correlation heatmap
    sns.heatmap(factor_corr, annot=True, cmap='RdBu_r', center=0, 
                square=True, ax=ax1, fmt='.3f')
    ax1.set_title('Factor Correlation Matrix', fontweight='bold', fontsize=14)
    
    # 2. Rolling correlation PC1-PC2
    ax2.plot(rolling_corr_12.index, rolling_corr_12.values, linewidth=2, alpha=0.8, color='blue')
    ax2.axhline(0, color='red', linestyle='--', alpha=0.7)
    ax2.set_title(f'Rolling Correlation: PC1 vs PC2 ({window}-day)', fontweight='bold', fontsize=14)
    ax2.set_ylabel('Correlation')
    ax2.grid(True, alpha=0.3)
    ax2.set_ylim(-1, 1)
    
    # 3. Rolling correlation PC1-PC3
    ax3.plot(rolling_corr_13.index, rolling_corr_13.values, linewidth=2, alpha=0.8, color='green')
    ax3.axhline(0, color='red', linestyle='--', alpha=0.7)
    ax3.set_title(f'Rolling Correlation: PC1 vs PC3 ({window}-day)', fontweight='bold', fontsize=14)
    ax3.set_ylabel('Correlation')
    ax3.grid(True, alpha=0.3)
    ax3.set_ylim(-1, 1)
    
    # 4. Rolling correlation PC2-PC3
    ax4.plot(rolling_corr_23.index, rolling_corr_23.values, linewidth=2, alpha=0.8, color='orange')
    ax4.axhline(0, color='red', linestyle='--', alpha=0.7)
    ax4.set_title(f'Rolling Correlation: PC2 vs PC3 ({window}-day)', fontweight='bold', fontsize=14)
    ax4.set_ylabel('Correlation')
    ax4.grid(True, alpha=0.3)
    ax4.set_ylim(-1, 1)
    
    plt.tight_layout()
    plt.show()
    
    # Print correlation insights
    print("\n🔗 Factor Correlation Insights:")
    print(f"Strongest positive correlation: {factor_corr.where(np.triu(np.ones(factor_corr.shape), k=1).astype(bool)).stack().max():.3f}")
    print(f"Strongest negative correlation: {factor_corr.where(np.triu(np.ones(factor_corr.shape), k=1).astype(bool)).stack().min():.3f}")
    print(f"Average absolute correlation: {factor_corr.where(np.triu(np.ones(factor_corr.shape), k=1).astype(bool)).stack().abs().mean():.3f}")
    
    return factor_corr

print("🔗 Factor Correlation Analysis")
factor_corr = analyze_factor_correlations(components_df, window=126)

## 6. Asset-Factor Relationship Analysis

In [None]:
# Top assets by factor loading analysis
def analyze_top_assets_by_factor(loadings_df, n_top=10):
    """
    Identify and analyze top assets for each factor
    """
    latest_date = loadings_df.index.get_level_values('date').max()
    latest_loadings = loadings_df.xs(latest_date, level='date')
    
    fig, axes = plt.subplots(2, 5, figsize=(25, 12))
    axes = axes.flatten()
    
    factor_insights = {}
    
    for i in range(10):
        component_col = f'PC{i+1}'
        
        # Get top positive and negative loadings
        top_positive = latest_loadings[component_col].nlargest(n_top)
        top_negative = latest_loadings[component_col].nsmallest(n_top)
        
        # Combine for plotting
        combined = pd.concat([top_positive, top_negative]).sort_values()
        
        # Plot
        colors = ['red' if x < 0 else 'blue' for x in combined.values]
        combined.plot(kind='barh', ax=axes[i], color=colors, alpha=0.7)
        axes[i].set_title(f'{component_col} - Top Assets', fontweight='bold', fontsize=12)
        axes[i].axvline(0, color='black', linestyle='-', alpha=0.8, linewidth=1)
        axes[i].grid(True, alpha=0.3)
        
        # Store insights
        factor_insights[component_col] = {
            'top_positive': top_positive.to_dict(),
            'top_negative': top_negative.to_dict(),
            'range': top_positive.max() - top_negative.min()
        }
    
    plt.tight_layout()
    plt.show()
    
    return factor_insights

print("🎯 Top Assets by Factor Loading")
factor_insights = analyze_top_assets_by_factor(loadings_df, n_top=5)

In [None]:
# Print factor insights
def print_factor_insights(factor_insights):
    """
    Print detailed insights about each factor
    """
    print("\n📋 FACTOR INTERPRETATION GUIDE")
    print("=" * 60)
    
    for factor, insights in factor_insights.items():
        print(f"\n🔍 {factor}:")
        print(f"  Loading Range: {insights['range']:.3f}")
        
        print("  📈 Highest Positive Loadings:")
        for asset, loading in list(insights['top_positive'].items())[:3]:
            print(f"    {asset}: {loading:.3f}")
        
        print("  📉 Highest Negative Loadings:")
        for asset, loading in list(insights['top_negative'].items())[:3]:
            print(f"    {asset}: {loading:.3f}")

print_factor_insights(factor_insights)

## 7. Factor Stability Over Time

In [None]:
# Analyze factor loading stability over time
def analyze_loading_stability(loadings_df, assets_to_track=None, window=63):
    """
    Analyze how factor loadings change over time for key assets
    """
    if assets_to_track is None:
        # Select some major assets
        major_assets = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA', 'NVDA', 'META', 'BRK-B']
        available_assets = [asset for asset in major_assets if asset in loadings_df.index.get_level_values('asset')]
        assets_to_track = available_assets[:6]  # Track up to 6 assets
    
    fig, axes = plt.subplots(3, 2, figsize=(20, 18))
    axes = axes.flatten()
    
    for i, asset in enumerate(assets_to_track[:6]):
        if i < len(axes):
            # Get asset's loadings over time
            try:
                asset_loadings = loadings_df.xs(asset, level='asset')
                
                # Plot first 5 factors
                for j in range(5):
                    component_col = f'PC{j+1}'
                    axes[i].plot(asset_loadings.index, asset_loadings[component_col], 
                               linewidth=2, alpha=0.8, label=component_col)
                
                axes[i].axhline(0, color='black', linestyle='--', alpha=0.5)
                axes[i].set_title(f'{asset} Factor Loadings Over Time', fontweight='bold', fontsize=12)
                axes[i].set_ylabel('Loading')
                axes[i].legend()
                axes[i].grid(True, alpha=0.3)
                
            except KeyError:
                axes[i].text(0.5, 0.5, f'{asset} not found', ha='center', va='center', 
                           transform=axes[i].transAxes, fontsize=12)
                axes[i].set_title(f'{asset} - No Data', fontweight='bold', fontsize=12)
    
    plt.tight_layout()
    plt.show()
    
    # Calculate loading volatility for tracked assets
    print("\n📊 Loading Stability Analysis:")
    for asset in assets_to_track[:6]:
        try:
            asset_loadings = loadings_df.xs(asset, level='asset')
            loading_vol = asset_loadings[['PC1', 'PC2', 'PC3']].std()
            print(f"{asset}: PC1={loading_vol['PC1']:.3f}, PC2={loading_vol['PC2']:.3f}, PC3={loading_vol['PC3']:.3f}")
        except KeyError:
            print(f"{asset}: No data available")

print("⚖️ Factor Loading Stability Analysis")
analyze_loading_stability(loadings_df)

## 8. Summary Dashboard

In [None]:
# Create comprehensive summary
def create_summary_dashboard(components_df, explained_var_df, loadings_df):
    """
    Create a comprehensive summary dashboard
    """
    print("\n" + "=" * 80)
    print("📊 SVD FACTOR ANALYSIS SUMMARY DASHBOARD")
    print("=" * 80)
    
    # Basic statistics
    print(f"\n📈 DATA OVERVIEW:")
    print(f"  Analysis Period: {components_df.index.min().strftime('%Y-%m-%d')} to {components_df.index.max().strftime('%Y-%m-%d')}")
    print(f"  Total Observations: {len(components_df):,}")
    print(f"  Number of Assets: {len(loadings_df.index.get_level_values('asset').unique()):,}")
    print(f"  Number of Factors: {len(components_df.columns)}")
    
    # Factor performance
    print(f"\n🎯 FACTOR PERFORMANCE:")
    recent_explained = explained_var_df.tail(1).iloc[0]
    for i in range(5):
        factor_name = f'PC{i+1}'
        factor_vol = components_df[factor_name].std()
        factor_explained = recent_explained[factor_name]
        print(f"  {factor_name}: {factor_explained:.1%} variance, {factor_vol:.3f} volatility")
    
    # Recent factor values
    print(f"\n📊 CURRENT FACTOR VALUES:")
    latest_factors = components_df.tail(1).iloc[0]
    for i in range(5):
        factor_name = f'PC{i+1}'
        print(f"  {factor_name}: {latest_factors[factor_name]:+.3f}")
    
    # Factor stability
    print(f"\n⚖️ FACTOR STABILITY (Coefficient of Variation):")
    factor_stability = (components_df.std() / components_df.abs().mean()).sort_values()
    for factor in factor_stability.head(5).index:
        print(f"  {factor}: {factor_stability[factor]:.3f} (lower = more stable)")
    
    # Total explained variance
    total_explained = explained_var_df.sum(axis=1).mean()
    top3_explained = explained_var_df[['PC1', 'PC2', 'PC3']].sum(axis=1).mean()
    print(f"\n📈 VARIANCE EXPLAINED:")
    print(f"  Total (all factors): {total_explained:.1%}")
    print(f"  Top 3 factors: {top3_explained:.1%}")
    print(f"  PC1 alone: {recent_explained['PC1']:.1%}")
    
    print("\n" + "=" * 80)

create_summary_dashboard(components_df, explained_var_df, loadings_df)

## 10. Export Results

In [None]:
# Export analysis results
def export_analysis_results(components_df, explained_var_df, loadings_df, export_dir='../results'):
    """
    Export analysis results to files
    """
    import os
    os.makedirs(export_dir, exist_ok=True)
    
    timestamp = datetime.now().strftime('%Y%m%d_%H%M')
    
    # Export factor time series
    components_file = f'{export_dir}/factor_components_{timestamp}.csv'
    components_df.to_csv(components_file)
    print(f"✅ Factor components exported to: {components_file}")
    
    # Export explained variance
    variance_file = f'{export_dir}/explained_variance_{timestamp}.csv'
    explained_var_df.to_csv(variance_file)
    print(f"✅ Explained variance exported to: {variance_file}")
    
    # Export latest factor loadings
    latest_date = loadings_df.index.get_level_values('date').max()
    latest_loadings = loadings_df.xs(latest_date, level='date')
    loadings_file = f'{export_dir}/latest_factor_loadings_{timestamp}.csv'
    latest_loadings.to_csv(loadings_file)
    print(f"✅ Latest factor loadings exported to: {loadings_file}")
    
    # Export factor correlation matrix
    factor_corr = components_df.corr()
    corr_file = f'{export_dir}/factor_correlations_{timestamp}.csv'
    factor_corr.to_csv(corr_file)
    print(f"✅ Factor correlations exported to: {corr_file}")
    
    print(f"\n📁 All results exported to: {os.path.abspath(export_dir)}")

# Uncomment to export results
# export_analysis_results(components_df, explained_var_df, loadings_df)
print("💾 Export function ready - uncomment above line to export results")

In [None]:
## 🎯 Analysis Complete!

This notebook has provided a comprehensive analysis of your SVD latent factors including:

✅ **Factor Loading Visualizations** - Heatmaps and distributions  
✅ **Factor Time Series Analysis** - Evolution and volatility patterns  
✅ **Explained Variance Analysis** - Factor importance and stability  
✅ **Factor Correlations** - Relationships between factors  
✅ **Asset-Factor Relationships** - Top contributing assets per factor  
✅ **Loading Stability** - How factor exposures change over time  
✅ **ARIMA Forecast Quality** - Predictability analysis of each factor  
✅ **Summary Dashboard** - Key insights and metrics  

### 📋 Key Takeaways:
- **PC1** typically represents the market factor (systematic risk)
- **PC2-PC3** often capture sector or style factors
- Factor loadings show which assets are most sensitive to each factor
- Time-varying loadings reveal changing market dynamics
- Explained variance indicates factor importance over time
- **ARIMA Analysis** reveals which factors are most predictable for forecasting

### 🔄 Next Steps:
1. Use factor insights for portfolio construction
2. Monitor factor exposures for risk management
3. Develop factor-based trading strategies
4. Leverage ARIMA forecasts for factor timing
5. Update analysis regularly as new data arrives

## 9. ARIMA Forecast Quality Analysis

Analyze the quality of ARIMA forecasts for each latent factor to understand their predictability and forecast accuracy.

In [None]:
## 🎯 Analysis Complete!

This notebook has provided a comprehensive analysis of your SVD latent factors including:

✅ **Factor Loading Visualizations** - Heatmaps and distributions  
✅ **Factor Time Series Analysis** - Statistical analysis with trend detection  
✅ **Explained Variance Analysis** - Factor importance and stability  
✅ **Factor Correlations** - Relationships between factors  
✅ **Asset-Factor Relationships** - Top contributing assets per factor  
✅ **Loading Stability** - How factor exposures change over time  
✅ **ARIMA Forecast Quality** - Predictability analysis of each factor  
✅ **Summary Dashboard** - Key insights and metrics  

### 📋 Key Takeaways:
- **PC1** typically represents the market factor (systematic risk)
- **PC2-PC3** often capture sector or style factors
- Factor loadings show which assets are most sensitive to each factor
- Time-varying loadings reveal changing market dynamics
- Explained variance indicates factor importance over time
- **ARIMA Analysis** reveals which factors are most predictable for forecasting

### 🛠️ Technical Notes:
- **Fixed SVD Index Error**: Resolved list index out of range errors in rolling SVD computation
- **Enhanced Error Handling**: Added proper bounds checking and dimension validation
- **Statistical Focus**: Factor time series analysis uses comprehensive statistical measures instead of plotting to avoid matplotlib compatibility issues

### 🔄 Next Steps:
1. Use factor insights for portfolio construction
2. Monitor factor exposures for risk management
3. Develop factor-based trading strategies
4. Leverage ARIMA forecasts for factor timing
5. Update analysis regularly as new data arrives

## 🎯 Analysis Complete!

This notebook has provided a comprehensive analysis of your SVD latent factors including:

✅ **Factor Loading Visualizations** - Heatmaps and distributions  
✅ **Factor Time Series Analysis** - Evolution and volatility patterns  
✅ **Explained Variance Analysis** - Factor importance and stability  
✅ **Factor Correlations** - Relationships between factors  
✅ **Asset-Factor Relationships** - Top contributing assets per factor  
✅ **Loading Stability** - How factor exposures change over time  
✅ **Summary Dashboard** - Key insights and metrics  

### 📋 Key Takeaways:
- **PC1** typically represents the market factor (systematic risk)
- **PC2-PC3** often capture sector or style factors
- Factor loadings show which assets are most sensitive to each factor
- Time-varying loadings reveal changing market dynamics
- Explained variance indicates factor importance over time

### 🔄 Next Steps:
1. Use factor insights for portfolio construction
2. Monitor factor exposures for risk management
3. Develop factor-based trading strategies
4. Update analysis regularly as new data arrives