# 📈 03 – Advanced Visualizations & Rolling Analysis

## Overview

This notebook extends the factor modeling analysis with **time-varying** perspectives and **advanced visualizations**:

1. **Rolling Window Analysis**: Examine how model fit (R²) changes over time
2. **Dynamic Factor Sensitivity**: Track evolving relationships between EM and macro factors  
3. **Interactive Visualizations**: Generate comprehensive charts for all EM indices
4. **Reusable Functions**: Modular code for reproducible analysis

### Key Features:
- **Rolling R² Analysis**: 60-day rolling window regression models
- **Time-Series Visualization**: Dynamic model performance tracking
- **Batch Processing**: Automated chart generation for all EM indices
- **Export Functionality**: Save all visualizations to output folder

### Use Cases:
- **Risk Management**: Identify periods of high/low factor sensitivity
- **Portfolio Analysis**: Understand when diversification benefits change
- **Market Timing**: Spot regime changes in EM-macro relationships

## 📦 Import Required Libraries

Loading libraries for advanced analysis and visualization:

In [None]:
# Core data manipulation
import pandas as pd
import numpy as np
import os

# Machine learning components  
from sklearn.linear_model import LinearRegression
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Set plotting style for professional appearance
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (10, 6)

## 📁 Data Loading & Preparation

Load the dataset and prepare variables for rolling analysis.

In [None]:
# Load the combined dataset
df = pd.read_csv('../data/combined_em_macro_data.csv', parse_dates=['date'], index_col='date')

# Convert to log returns
log_returns = np.log(df / df.shift(1)).dropna()

# Separate EM and macro variables
em_cols = [c for c in df.columns if c.startswith(('Brazil', 'India', 'China', 'SouthAfrica', 'Mexico', 'Indonesia'))]
macro_cols = [c for c in df.columns if c not in em_cols]

Y_all = log_returns[em_cols]    # EM equity returns
X_all = log_returns[macro_cols] # Macro factor returns

print(f"📊 Dataset loaded for rolling analysis:")
print(f"   • Time period: {df.index.min()} to {df.index.max()}")
print(f"   • Total observations: {len(log_returns)}")
print(f"   • EM indices: {len(em_cols)}")
print(f"   • Macro factors: {len(macro_cols)}")

print(f"\n🌏 EM Indices: {em_cols}")
print(f"📈 Macro Factors: {macro_cols}")

## 🔄 Rolling Window Analysis Function

Create a reusable function to perform rolling window PCA-regression analysis.

In [None]:
def rolling_r2_scores(X, Y, window=60, n_components=3):
    """
    Calculate rolling R² scores for EM indices using PCA-based factor models.
    
    Parameters:
    -----------
    X : pd.DataFrame
        Macro factor returns (independent variables)
    Y : pd.DataFrame  
        EM equity returns (dependent variables)
    window : int
        Rolling window size in days (default: 60)
    n_components : int
        Number of principal components to use (default: 3)
    
    Returns:
    --------
    pd.DataFrame
        Rolling R² scores for each EM index
    """
    
    # Initialize results DataFrame
    results = pd.DataFrame(index=Y.index[window:], columns=Y.columns)
    
    print(f"🔄 Computing rolling R² with {window}-day windows...")
    print(f"   • Total windows: {len(Y) - window + 1}")
    print(f"   • PCA components: {n_components}")
    
    # Loop through each EM index
    for col_idx, col in enumerate(Y.columns):
        print(f"   • Processing {col} ({col_idx + 1}/{len(Y.columns)})")
        
        # Loop through time windows
        for i in range(window, len(Y)):
            # Extract window data
            X_window = X.iloc[i - window:i]
            Y_window = Y[col].iloc[i - window:i]
            
            try:
                # Standardize macro factors
                scaler = StandardScaler()
                X_scaled = scaler.fit_transform(X_window)
                
                # Apply PCA
                pca = PCA(n_components=n_components)
                X_pca = pca.fit_transform(X_scaled)
                
                # Fit regression model
                model = LinearRegression().fit(X_pca, Y_window)
                
                # Store R² score
                results.at[Y_window.index[-1], col] = model.score(X_pca, Y_window)
                
            except Exception as e:
                # Handle potential numerical issues
                results.at[Y_window.index[-1], col] = np.nan
    
    print("✅ Rolling analysis complete!")
    return results.astype(float)

## 📊 Execute Rolling Analysis

Run the rolling window analysis to track model performance over time.

In [None]:
# Execute rolling analysis with 60-day windows
window_size = 60
rolling_r2 = rolling_r2_scores(X_all, Y_all, window=window_size)

print(f"\n📈 Rolling R² Analysis Results:")
print(f"   • Window size: {window_size} trading days")
print(f"   • Analysis period: {rolling_r2.index.min()} to {rolling_r2.index.max()}")
print(f"   • Total observations: {len(rolling_r2)}")

# Summary statistics
print(f"\n📊 Rolling R² Summary Statistics:")
summary_stats = rolling_r2.describe()
print(summary_stats.round(3))

## 📈 Visualization & Export

Generate and save rolling R² charts for all EM indices.

In [None]:
# Create output directory
output_dir = "../output/plots"
os.makedirs(output_dir, exist_ok=True)

print(f"📊 Generating rolling R² visualizations for {len(rolling_r2.columns)} EM indices...\n")

# Generate and save charts for each EM index
for col in rolling_r2.columns:
    plt.figure(figsize=(12, 6))
    
    # Plot rolling R²
    plt.plot(rolling_r2.index, rolling_r2[col], linewidth=2, alpha=0.8)
    
    # Add mean line
    mean_r2 = rolling_r2[col].mean()
    plt.axhline(y=mean_r2, color='red', linestyle='--', alpha=0.7, 
                label=f'Mean R² = {mean_r2:.3f}')
    
    # Formatting
    plt.title(f'Rolling R²: {col} vs Macro Factors ({window_size}-day PCA Model)', 
              fontsize=14, pad=20)
    plt.xlabel('Date')
    plt.ylabel('R² Score')
    plt.grid(True, alpha=0.3)
    plt.legend()
    plt.tight_layout()
    
    # Save plot
    filename = f"rolling_r2_{col.replace('/', '_').replace(' ', '_')}.png"
    filepath = os.path.join(output_dir, filename)
    plt.savefig(filepath, dpi=300, bbox_inches='tight')
    plt.show()
    
    print(f"✅ {col}: Mean R² = {mean_r2:.3f}, Chart saved to {filename}")

# Create comprehensive summary plot
plt.figure(figsize=(14, 8))
for col in rolling_r2.columns:
    plt.plot(rolling_r2.index, rolling_r2[col], label=col, linewidth=1.5, alpha=0.8)

plt.title(f'Rolling R² Comparison: All EM Indices vs Macro Factors ({window_size}-day PCA Model)', 
          fontsize=14, pad=20)
plt.xlabel('Date')
plt.ylabel('R² Score')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()

# Save summary plot
summary_filename = "rolling_r2_all_indices_comparison.png"
summary_filepath = os.path.join(output_dir, summary_filename)
plt.savefig(summary_filepath, dpi=300, bbox_inches='tight')
plt.show()

print(f"\n💾 All visualizations saved to: {output_dir}")
print(f"📊 Summary chart: {summary_filename}")
print(f"🎯 Rolling analysis complete for {len(rolling_r2.columns)} EM indices!")

# Yearly Analysis: Factor Sensitivity Evolution

## Overview
In this section, we examine how factor sensitivities evolved across three distinct yearly periods:
- **2022/2023**: Post-pandemic recovery period with elevated inflation concerns
- **2023/2024**: Central bank tightening cycle and geopolitical tensions
- **2024/2025**: Current period with rate normalization expectations

This temporal analysis helps identify:
1. **Structural Changes**: How macro-EM relationships evolved over time
2. **Regime Shifts**: Periods where factor loadings significantly changed
3. **Investment Implications**: How factor strategies performed across different market environments

In [None]:
# Calculate yearly R² scores for trend analysis
yearly_periods = {
    '2022/2023': ('2022-01-01', '2023-12-31'),
    '2023/2024': ('2023-01-01', '2024-12-31'),
    '2024/2025': ('2024-01-01', '2025-12-31')
}

yearly_r2_results = {}

# Load the combined dataset if not already loaded
try:
    data = df  # Use the df from previous cell
except NameError:
    data = pd.read_csv('../data/combined_em_macro_data.csv', parse_dates=['date'], index_col='date')

# Define column names based on the dataset structure
em_etf_columns = em_cols  # Use the em_cols from data loading section
macro_factor_columns = macro_cols  # Use the macro_cols from data loading section

for period_name, (start_date, end_date) in yearly_periods.items():
    print(f"\n{'='*50}")
    print(f"Period: {period_name}")
    print(f"{'='*50}")
    
    # Filter data for the period
    period_mask = (data.index >= start_date) & (data.index <= end_date)
    period_data = data[period_mask]
    
    if len(period_data) < 50:  # Minimum data requirement
        print(f"Insufficient data for {period_name}: {len(period_data)} observations")
        continue
    
    # Calculate log returns for this period
    period_log_returns = np.log(period_data / period_data.shift(1)).dropna()
    
    # Separate EM ETFs and macro factors for this period
    em_etfs_period = period_log_returns[em_etf_columns]
    macro_factors_period = period_log_returns[macro_factor_columns]
    
    period_r2_scores = {}
    
    # Calculate R² for each ETF in this period
    for etf in em_etf_columns:
        etf_returns = em_etfs_period[etf].dropna()
        
        # Align macro factors with ETF data
        common_dates = etf_returns.index.intersection(macro_factors_period.index)
        if len(common_dates) < 30:
            continue
            
        X_period = macro_factors_period.loc[common_dates]
        y_period = etf_returns.loc[common_dates]
        
        # Standardize and apply PCA to macro factors
        scaler_period = StandardScaler()
        X_scaled_period = scaler_period.fit_transform(X_period.fillna(0))
        
        pca_period = PCA(n_components=3)
        X_pca_period = pca_period.fit_transform(X_scaled_period)
        
        # Fit linear regression
        model_period = LinearRegression()
        model_period.fit(X_pca_period, y_period)
        
        # Calculate R²
        r2_period = model_period.score(X_pca_period, y_period)
        period_r2_scores[etf] = r2_period
        
        print(f"{etf}: R² = {r2_period:.3f}")
    
    yearly_r2_results[period_name] = period_r2_scores

# Create comprehensive yearly comparison visualizations
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Yearly Factor Sensitivity Evolution Analysis (2022-2025)', fontsize=16, fontweight='bold')

# 1. R² Evolution Line Chart
ax1 = axes[0, 0]
for etf in em_etf_columns:
    r2_values = []
    periods = []
    for period in yearly_r2_results.keys():
        if etf in yearly_r2_results[period]:
            r2_values.append(yearly_r2_results[period][etf])
            periods.append(period)
    
    if r2_values:
        ax1.plot(periods, r2_values, marker='o', linewidth=2, label=etf, markersize=8)

ax1.set_title('R² Score Evolution Across Annual Periods', fontsize=12, fontweight='bold')
ax1.set_ylabel('R² Score')
ax1.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
ax1.grid(True, alpha=0.3)
ax1.tick_params(axis='x', rotation=45)

# 2. Yearly R² Heatmap
ax2 = axes[0, 1]
yearly_df = pd.DataFrame(yearly_r2_results).T
if not yearly_df.empty:
    sns.heatmap(yearly_df, annot=True, fmt='.3f', cmap='RdYlBu_r', 
                ax=ax2, cbar_kws={'label': 'R² Score'})
ax2.set_title('Annual Period R² Heatmap', fontsize=12, fontweight='bold')
ax2.set_xlabel('EM ETFs')
ax2.set_ylabel('Time Periods')

# 3. R² Range Analysis (Temporal Volatility)
ax3 = axes[1, 0]
etf_ranges = {}
for etf in em_etf_columns:
    etf_r2_values = []
    for period_data in yearly_r2_results.values():
        if etf in period_data:
            etf_r2_values.append(period_data[etf])
    
    if len(etf_r2_values) > 1:
        etf_ranges[etf] = max(etf_r2_values) - min(etf_r2_values)

if etf_ranges:
    etfs = list(etf_ranges.keys())
    ranges = list(etf_ranges.values())
    bars = ax3.bar(etfs, ranges, color='skyblue', alpha=0.7)
    ax3.set_title('R² Temporal Volatility Across Periods', fontsize=12, fontweight='bold')
    ax3.set_ylabel('R² Range (Max - Min)')
    ax3.tick_params(axis='x', rotation=45)
    
    # Add value labels on bars
    for bar, value in zip(bars, ranges):
        ax3.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
                f'{value:.3f}', ha='center', va='bottom')

# 4. Period Comparison Statistics
ax4 = axes[1, 1]
period_avg_r2 = {}
for period, r2_dict in yearly_r2_results.items():
    if r2_dict:
        period_avg_r2[period] = np.mean(list(r2_dict.values()))

if period_avg_r2:
    periods = list(period_avg_r2.keys())
    avg_r2 = list(period_avg_r2.values())
    bars = ax4.bar(periods, avg_r2, color='lightcoral', alpha=0.7)
    ax4.set_title('Average R² by Annual Period', fontsize=12, fontweight='bold')
    ax4.set_ylabel('Average R² Score')
    ax4.tick_params(axis='x', rotation=45)
    
    # Add value labels
    for bar, value in zip(bars, avg_r2):
        ax4.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01,
                f'{value:.3f}', ha='center', va='bottom')

plt.tight_layout()
plt.savefig('../output/plots/yearly_factor_evolution.png', dpi=300, bbox_inches='tight')
plt.show()

# Print detailed summary insights
print(f"\n{'='*70}")
print("ANNUAL PERIOD ANALYSIS SUMMARY (2022-2025)")
print(f"{'='*70}")

for period, r2_scores in yearly_r2_results.items():
    if r2_scores:
        best_etf = max(r2_scores.items(), key=lambda x: x[1])
        worst_etf = min(r2_scores.items(), key=lambda x: x[1])
        avg_r2 = np.mean(list(r2_scores.values()))
        
        print(f"\n{period} Period:")
        print(f"  Highest Macro Sensitivity: {best_etf[0]} (R² = {best_etf[1]:.3f})")
        print(f"  Lowest Macro Sensitivity:  {worst_etf[0]} (R² = {worst_etf[1]:.3f})")
        print(f"  Average R²:                 {avg_r2:.3f}")
        print(f"  ETFs Analyzed:              {len(r2_scores)}")

# Identify trend patterns across the three annual periods
print(f"\n{'='*70}")
print("TREND IDENTIFICATION ACROSS ANNUAL PERIODS")
print(f"{'='*70}")

for etf in em_etf_columns:
    etf_trend = []
    periods_analyzed = []
    for period in sorted(yearly_r2_results.keys()):
        if etf in yearly_r2_results[period]:
            etf_trend.append(yearly_r2_results[period][etf])
            periods_analyzed.append(period)
    
    if len(etf_trend) >= 2:
        # Calculate overall trend
        if etf_trend[-1] > etf_trend[0]:
            trend_direction = "INCREASING"
            trend_arrow = "↗️"
        else:
            trend_direction = "DECREASING"
            trend_arrow = "↘️"
        
        trend_magnitude = abs(etf_trend[-1] - etf_trend[0])
        trend_volatility = np.std(etf_trend) if len(etf_trend) > 2 else 0
        
        print(f"{etf}: {trend_arrow} {trend_direction} trend")
        print(f"  • Magnitude: Δ = {trend_magnitude:.3f}")
        print(f"  • Volatility: σ = {trend_volatility:.3f}")
        print(f"  • Periods: {' → '.join(periods_analyzed)}")

print(f"\n{'='*70}")
print("INVESTMENT IMPLICATIONS BY PERIOD")
print(f"{'='*70}")

# Identify period-specific characteristics
all_periods = list(yearly_r2_results.keys())
if len(all_periods) >= 3:
    early_period = all_periods[0]
    middle_period = all_periods[1] 
    recent_period = all_periods[2]
    
    print(f"\n📊 {early_period} (Post-Pandemic Recovery):")
    if early_period in yearly_r2_results and yearly_r2_results[early_period]:
        early_avg = np.mean(list(yearly_r2_results[early_period].values()))
        print(f"  • Average Factor Sensitivity: {early_avg:.3f}")
        print(f"  • Market Regime: Elevated macro sensitivity during recovery")
        print(f"  • Investment Theme: Reflation trade and commodity super-cycle")
    
    print(f"\n📊 {middle_period} (Tightening Cycle):")
    if middle_period in yearly_r2_results and yearly_r2_results[middle_period]:
        middle_avg = np.mean(list(yearly_r2_results[middle_period].values()))
        print(f"  • Average Factor Sensitivity: {middle_avg:.3f}")
        print(f"  • Market Regime: Policy divergence and geopolitical tensions")
        print(f"  • Investment Theme: Differentiated EM responses to global tightening")
    
    print(f"\n📊 {recent_period} (Normalization Phase):")
    if recent_period in yearly_r2_results and yearly_r2_results[recent_period]:
        recent_avg = np.mean(list(yearly_r2_results[recent_period].values()))
        print(f"  • Average Factor Sensitivity: {recent_avg:.3f}")
        print(f"  • Market Regime: Rate normalization and evolving factor structures")
        print(f"  • Investment Theme: New equilibrium in EM-macro relationships")

print(f"\n{'='*70}")
print("STRATEGIC RECOMMENDATIONS")
print(f"{'='*70}")
print("🎯 Factor Strategy Insights:")
print("  • ETFs with increasing R² trends show growing macro sensitivity")
print("  • ETFs with decreasing R² trends may offer better diversification benefits")
print("  • High temporal volatility indicates regime-dependent factor exposure")
print("  • Period-specific analysis enables more precise factor timing strategies")
print("\n💼 Portfolio Construction:")
print("  • Implement dynamic rebalancing based on rolling factor sensitivities")
print("  • Use period-specific factor loadings for tactical asset allocation")
print("  • Monitor regime changes for proactive risk management")
print("\n⚠️ Risk Management:")
print("  • Update hedge ratios quarterly based on evolving factor sensitivities")
print("  • Stress test portfolios across different temporal regimes")
print("  • Track correlation evolution for improved diversification")