# Factor-Based Stock Selection System
## Leverage Factor Forecasts and Loadings for Directional Stock Picking

This notebook demonstrates how to use latent factor forecasts combined with factor loadings to identify stocks most likely to go up (long picks) and down (short picks).

## Methodology:
1. **Factor Forecasting**: Generate ARIMA/trend-based forecasts for each latent factor
2. **Loading Analysis**: Use latest factor loadings to understand asset sensitivities
3. **Score Calculation**: Combine forecasts and loadings with confidence weighting
4. **Stock Selection**: Rank stocks by directional scores for long/short recommendations
5. **Portfolio Construction**: Build balanced portfolios with factor exposure analysis

## Setup and Data Loading

In [2]:
import sys
import os
sys.path.append(os.path.join(os.getcwd(), '..'))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Import our custom modules
from src.data.loader import get_multiple_stocks
from src.data.preprocess import preprocess_price_matrix
from src.models.svd import rolling_svd_factors
from src.models.factor_stock_selection import FactorBasedStockSelector

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")

print("✅ Factor-Based Stock Selection System Initialized")
print(f"📅 Analysis Date: {datetime.now().strftime('%Y-%m-%d %H:%M')}")

ModuleNotFoundError: No module named 'config'

## 1. Load and Prepare Data

In [None]:
# Load stock data from cache
print("📈 Loading stock data...")

# Load constituent lists
try:
    s_and_p_500_constituents = pd.read_csv('../cache/constituents.csv')
    nasdaq_100_constituents = pd.read_csv('../cache/nasdaq-100.csv')
    active_tickers = sorted(list(set(s_and_p_500_constituents['Symbol'].dropna()) | 
                                set(nasdaq_100_constituents['Symbol'].dropna())))
    
    print(f"📊 Total symbols available: {len(active_tickers)}")
    
    # Load stock data
    stock_data = get_multiple_stocks(active_tickers, update=False, rate_limit=5.0)
    close_prices = stock_data['Close']
    
    print(f"💹 Price data shape: {close_prices.shape}")
    print(f"📅 Date range: {close_prices.index.min()} to {close_prices.index.max()}")
    
except Exception as e:
    print(f"❌ Error loading data: {e}")
    print("Please ensure stock data is available in cache")
    raise

## 2. Generate SVD Factors

In [None]:
# Preprocess data for SVD
print("🔄 Preprocessing data for SVD...")

pre_scaled = preprocess_price_matrix(
    close_prices, 
    winsorize_span=40,
    method='log_return',
    rolling_window=5
)

print(f"✅ Preprocessed data shape: {pre_scaled.shape}")
print(f"📊 Final assets: {len(pre_scaled.columns)}")

# Compute rolling SVD factors
print("\n🔍 Computing rolling SVD factors...")

loadings_df, components_df, explained_var_df = rolling_svd_factors(
    X=pre_scaled,
    dates=pre_scaled.index,
    assets=pre_scaled.columns,
    window_size=180,  # 6-month rolling window
    n_components=10   # Extract top 10 factors
)

print(f"✅ SVD computation complete!")
print(f"📊 Loadings shape: {loadings_df.shape}")
print(f"📈 Components shape: {components_df.shape}")
print(f"📉 Explained variance shape: {explained_var_df.shape}")

# Display recent factor values
print(f"\n📊 Recent Factor Values:")
recent_factors = components_df.tail(5)
print(recent_factors.round(4))

## 3. Initialize Factor-Based Stock Selector

In [None]:
# Initialize the stock selector
print("🎯 Initializing Factor-Based Stock Selection System...")

selector = FactorBasedStockSelector(
    forecast_horizon=20,      # 20-day forecast horizon
    confidence_threshold=0.5  # 50% minimum confidence threshold
)

# Fit the selector with our factor data
print("\n🔧 Fitting stock selector with factor data...")
selector.fit(components_df, loadings_df)

print("✅ Stock selector fitted successfully!")

## 4. Generate Stock Recommendations

In [None]:
# Get top long and short picks
print("📈 Generating stock recommendations...")

# Top long picks (stocks likely to go up)
top_longs = selector.get_top_long_picks(n_picks=25, min_confidence=0.4)
print(f"\n🚀 TOP 25 LONG PICKS (Likely to go UP):")
print("=" * 60)
for i, row in top_longs.head(15).iterrows():
    print(f"{row['Asset']:>6}: Score={row['Score']:+.4f}, Strength={row['Strength']:.4f}, Confidence={row['Confidence']:.3f}")

# Top short picks (stocks likely to go down)
top_shorts = selector.get_top_short_picks(n_picks=25, min_confidence=0.4)
print(f"\n📉 TOP 25 SHORT PICKS (Likely to go DOWN):")
print("=" * 60)
for i, row in top_shorts.head(15).iterrows():
    print(f"{row['Asset']:>6}: Score={row['Score']:+.4f}, Strength={row['Strength']:.4f}, Confidence={row['Confidence']:.3f}")

print(f"\n📊 SUMMARY:")
print(f"  Total Long Recommendations: {len(top_longs)}")
print(f"  Total Short Recommendations: {len(top_shorts)}")
print(f"  Average Long Score: {top_longs['Score'].mean():+.4f}")
print(f"  Average Short Score: {top_shorts['Score'].mean():+.4f}")
print(f"  Average Long Confidence: {top_longs['Confidence'].mean():.3f}")
print(f"  Average Short Confidence: {top_shorts['Confidence'].mean():.3f}")

## 5. Factor Attribution Analysis

In [None]:
# Analyze factor attribution for top picks
print("🔍 Factor Attribution Analysis for Top Picks")
print("=" * 50)

# Analyze top 3 long picks
if len(top_longs) >= 3:
    print("\n📈 TOP 3 LONG PICKS - Factor Attribution:")
    for i in range(3):
        asset = top_longs.iloc[i]['Asset']
        score = top_longs.iloc[i]['Score']
        
        print(f"\n🎯 {asset} (Score: {score:+.4f}):")
        attribution = selector.get_factor_attribution(asset)
        
        for j, row in attribution.head(5).iterrows():
            direction_icon = "📈" if row['Contribution'] > 0 else "📉"
            print(f"  {direction_icon} {row['Factor']}: {row['Contribution']:+.4f} "
                  f"(Loading: {row['Loading']:+.3f}, Forecast: {row['Forecast_Direction']}, "
                  f"Conf: {row['Confidence']:.3f})")

# Analyze top 3 short picks
if len(top_shorts) >= 3:
    print("\n📉 TOP 3 SHORT PICKS - Factor Attribution:")
    for i in range(3):
        asset = top_shorts.iloc[i]['Asset']
        score = top_shorts.iloc[i]['Score']
        
        print(f"\n🎯 {asset} (Score: {score:+.4f}):")
        attribution = selector.get_factor_attribution(asset)
        
        for j, row in attribution.head(5).iterrows():
            direction_icon = "📈" if row['Contribution'] > 0 else "📉"
            print(f"  {direction_icon} {row['Factor']}: {row['Contribution']:+.4f} "
                  f"(Loading: {row['Loading']:+.3f}, Forecast: {row['Forecast_Direction']}, "
                  f"Conf: {row['Confidence']:.3f})")

## 6. Portfolio Construction Analysis

In [None]:
# Generate comprehensive portfolio report
print("🏗️ Portfolio Construction Analysis")

# Generate detailed portfolio report
selector.print_portfolio_report(
    long_picks=20, 
    short_picks=20, 
    min_confidence=0.4
)

## 7. Visualizations

In [None]:
# Plot stock scores distribution
print("📊 Generating stock selection visualizations...")

try:
    selector.plot_stock_scores_distribution(figsize=(16, 12))
except Exception as e:
    print(f"⚠️ Plotting temporarily disabled due to matplotlib issues: {e}")
    print("📊 Providing alternative statistical summary instead")
    
    # Alternative statistical summary
    print(f"\n📈 SCORE DISTRIBUTION STATISTICS:")
    print(f"  Total Stocks Analyzed: {len(selector.stock_scores)}")
    print(f"  Score Range: [{selector.stock_scores['Score'].min():+.4f}, {selector.stock_scores['Score'].max():+.4f}]")
    print(f"  Score Mean: {selector.stock_scores['Score'].mean():+.4f}")
    print(f"  Score Std: {selector.stock_scores['Score'].std():.4f}")
    
    # Long vs Short distribution
    long_count = (selector.stock_scores['Direction'] == 'Long').sum()
    short_count = (selector.stock_scores['Direction'] == 'Short').sum()
    print(f"  Long Direction: {long_count} stocks ({long_count/len(selector.stock_scores)*100:.1f}%)")
    print(f"  Short Direction: {short_count} stocks ({short_count/len(selector.stock_scores)*100:.1f}%)")
    
    # Confidence distribution
    high_conf = (selector.stock_scores['Confidence'] > 0.6).sum()
    med_conf = ((selector.stock_scores['Confidence'] > 0.4) & (selector.stock_scores['Confidence'] <= 0.6)).sum()
    low_conf = (selector.stock_scores['Confidence'] <= 0.4).sum()
    
    print(f"\n🎯 CONFIDENCE DISTRIBUTION:")
    print(f"  High Confidence (>60%): {high_conf} stocks")
    print(f"  Medium Confidence (40-60%): {med_conf} stocks")
    print(f"  Low Confidence (<40%): {low_conf} stocks")

## 8. Detailed Analysis of Specific Stocks

In [None]:
# Analyze specific high-conviction picks
print("🔬 Detailed Analysis of High-Conviction Picks")
print("=" * 50)

# Function to analyze a specific stock
def analyze_stock_detail(asset_name, selector):
    try:
        attribution = selector.get_factor_attribution(asset_name)
        stock_info = selector.stock_scores[selector.stock_scores['Asset'] == asset_name].iloc[0]
        
        print(f"\n📊 {asset_name} - Detailed Analysis:")
        print(f"  Overall Score: {stock_info['Score']:+.4f}")
        print(f"  Direction: {stock_info['Direction']}")
        print(f"  Strength: {stock_info['Strength']:.4f}")
        print(f"  Confidence: {stock_info['Confidence']:.3f}")
        
        print(f"\n  🎯 Top Factor Contributors:")
        top_contributors = attribution.head(3)
        for idx, row in top_contributors.iterrows():
            impact = "Positive" if row['Contribution'] > 0 else "Negative"
            print(f"    {row['Factor']}: {row['Contribution']:+.4f} ({impact})")
            print(f"      Loading: {row['Loading']:+.3f}, Forecast: {row['Forecast_Direction']}, Magnitude: {row['Forecast_Magnitude']:.3f}")
        
        return True
    except Exception as e:
        print(f"  ❌ Error analyzing {asset_name}: {e}")
        return False

# Analyze top 3 long picks in detail
print("\n📈 TOP LONG PICKS - Detailed Analysis:")
for i in range(min(3, len(top_longs))):
    asset = top_longs.iloc[i]['Asset']
    analyze_stock_detail(asset, selector)

# Analyze top 3 short picks in detail
print("\n📉 TOP SHORT PICKS - Detailed Analysis:")
for i in range(min(3, len(top_shorts))):
    asset = top_shorts.iloc[i]['Asset']
    analyze_stock_detail(asset, selector)

## 9. Risk Analysis and Portfolio Optimization

In [None]:
# Risk analysis and portfolio optimization
print("⚖️ Risk Analysis and Portfolio Optimization")
print("=" * 50)

# Get portfolio construction report
portfolio_report = selector.get_portfolio_construction_report(
    long_picks=20, 
    short_picks=20, 
    min_confidence=0.4
)

# Factor exposure analysis
factor_exposures = portfolio_report['factor_exposures']

print("\n🎯 FACTOR RISK ANALYSIS:")
print("Analyzing net factor exposures for portfolio risk...")

# Identify high-risk factor exposures
high_risk_factors = factor_exposures[abs(factor_exposures['Net_Exposure']) > 0.15]
if len(high_risk_factors) > 0:
    print(f"\n🔴 HIGH RISK Factor Exposures (>15%):")
    for idx, row in high_risk_factors.iterrows():
        direction = "Long" if row['Net_Exposure'] > 0 else "Short"
        print(f"  {row['Factor']}: {row['Net_Exposure']:+.3f} ({direction} bias)")
else:
    print(f"\n🟢 Portfolio appears well-balanced - no extreme factor exposures detected")

# Concentration analysis
print(f"\n📊 CONCENTRATION ANALYSIS:")
long_picks_report = portfolio_report['long_picks']
short_picks_report = portfolio_report['short_picks']

if len(long_picks_report) > 0:
    top_long_score = long_picks_report['Score'].iloc[0] if len(long_picks_report) > 0 else 0
    avg_long_score = long_picks_report['Score'].mean()
    long_concentration = top_long_score / avg_long_score if avg_long_score != 0 else 0
    print(f"  Long Side Concentration: {long_concentration:.2f}x (top pick vs average)")

if len(short_picks_report) > 0:
    top_short_score = abs(short_picks_report['Score'].iloc[0]) if len(short_picks_report) > 0 else 0
    avg_short_score = abs(short_picks_report['Score'].mean())
    short_concentration = top_short_score / avg_short_score if avg_short_score != 0 else 0
    print(f"  Short Side Concentration: {short_concentration:.2f}x (top pick vs average)")

# Confidence distribution
if len(long_picks_report) > 0 and len(short_picks_report) > 0:
    all_picks = pd.concat([long_picks_report, short_picks_report])
    high_conf_picks = (all_picks['Confidence'] > 0.6).sum()
    total_picks = len(all_picks)
    
    print(f"\n🎯 CONFIDENCE ANALYSIS:")
    print(f"  High Confidence Picks (>60%): {high_conf_picks}/{total_picks} ({high_conf_picks/total_picks*100:.1f}%)")
    print(f"  Average Portfolio Confidence: {all_picks['Confidence'].mean():.3f}")

# Risk recommendations
print(f"\n💡 RISK MANAGEMENT RECOMMENDATIONS:")
if len(high_risk_factors) > 0:
    print(f"  🔴 Consider reducing positions in assets with high {high_risk_factors.iloc[0]['Factor']} exposure")
else:
    print(f"  🟢 Factor exposures appear balanced")

if len(all_picks) > 0:
    low_conf_count = (all_picks['Confidence'] < 0.5).sum()
    if low_conf_count > len(all_picks) * 0.3:
        print(f"  🟡 Consider raising confidence threshold - {low_conf_count}/{len(all_picks)} picks have low confidence")
    else:
        print(f"  🟢 Confidence levels appear adequate for most picks")

print(f"  ℹ️ Monitor factor forecasts regularly and rebalance as conditions change")
print(f"  ℹ️ Consider position sizing based on confidence levels")

## 10. Export Results

In [None]:
# Export results for further analysis
print("💾 Exporting Results")
print("=" * 30)

# Create results directory
results_dir = '../results'
os.makedirs(results_dir, exist_ok=True)

timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

try:
    # Export top picks
    long_picks_file = f'{results_dir}/factor_long_picks_{timestamp}.csv'
    top_longs.to_csv(long_picks_file, index=False)
    print(f"✅ Long picks exported to: {long_picks_file}")
    
    short_picks_file = f'{results_dir}/factor_short_picks_{timestamp}.csv'
    top_shorts.to_csv(short_picks_file, index=False)
    print(f"✅ Short picks exported to: {short_picks_file}")
    
    # Export all stock scores
    all_scores_file = f'{results_dir}/factor_all_stock_scores_{timestamp}.csv'
    selector.stock_scores[['Asset', 'Score', 'Direction', 'Strength', 'Confidence']].to_csv(all_scores_file, index=False)
    print(f"✅ All stock scores exported to: {all_scores_file}")
    
    # Export factor forecasts summary
    factor_forecasts_file = f'{results_dir}/factor_forecasts_summary_{timestamp}.csv'
    portfolio_report['factor_forecasts_summary'].to_csv(factor_forecasts_file, index=False)
    print(f"✅ Factor forecasts exported to: {factor_forecasts_file}")
    
    # Export factor exposures
    factor_exposures_file = f'{results_dir}/portfolio_factor_exposures_{timestamp}.csv'
    portfolio_report['factor_exposures'].to_csv(factor_exposures_file, index=False)
    print(f"✅ Factor exposures exported to: {factor_exposures_file}")
    
    print(f"\n📁 All results exported to: {os.path.abspath(results_dir)}")
    
except Exception as e:
    print(f"❌ Error exporting results: {e}")

## 11. Summary and Next Steps

In [None]:
# Final summary
print("\n" + "=" * 80)
print("🎯 FACTOR-BASED STOCK SELECTION - FINAL SUMMARY")
print("=" * 80)

print(f"\n📊 ANALYSIS RESULTS:")
print(f"  Total Stocks Analyzed: {len(selector.stock_scores)}")
print(f"  Long Recommendations: {len(top_longs)} (likely to go UP)")
print(f"  Short Recommendations: {len(top_shorts)} (likely to go DOWN)")
print(f"  Forecast Horizon: {selector.forecast_horizon} days")
print(f"  Factors Used: {len(selector.factor_forecasts)}")

if len(top_longs) > 0:
    print(f"\n📈 TOP LONG PICK: {top_longs.iloc[0]['Asset']} (Score: {top_longs.iloc[0]['Score']:+.4f})")
if len(top_shorts) > 0:
    print(f"📉 TOP SHORT PICK: {top_shorts.iloc[0]['Asset']} (Score: {top_shorts.iloc[0]['Score']:+.4f})")

print(f"\n🔮 FACTOR FORECAST SUMMARY:")
forecasts_summary = portfolio_report['factor_forecasts_summary']
bullish_factors = (forecasts_summary['Direction'] == '+').sum()
bearish_factors = (forecasts_summary['Direction'] == '-').sum()
print(f"  Bullish Factors: {bullish_factors}/{len(forecasts_summary)}")
print(f"  Bearish Factors: {bearish_factors}/{len(forecasts_summary)}")
print(f"  Avg Forecast Confidence: {forecasts_summary['Confidence'].mean():.3f}")

print(f"\n⚖️ PORTFOLIO RISK ASSESSMENT:")
max_exposure = abs(portfolio_report['factor_exposures']['Net_Exposure']).max()
risk_level = "High" if max_exposure > 0.2 else "Medium" if max_exposure > 0.1 else "Low"
print(f"  Maximum Factor Exposure: {max_exposure:.3f}")
print(f"  Portfolio Risk Level: {risk_level}")

print(f"\n🎯 KEY INSIGHTS:")
print(f"  • Factor-based approach provides systematic stock selection")
print(f"  • Confidence-weighted scoring reduces false signals")
print(f"  • Factor exposure analysis enables risk management")
print(f"  • Regular rebalancing recommended as factors evolve")

print(f"\n🔄 NEXT STEPS:")
print(f"  1. Monitor factor forecasts and update recommendations regularly")
print(f"  2. Implement position sizing based on confidence levels")
print(f"  3. Track performance and refine model parameters")
print(f"  4. Consider transaction costs and market impact in execution")
print(f"  5. Integrate with risk management and portfolio optimization systems")

print("\n" + "=" * 80)
print("✅ Factor-Based Stock Selection Analysis Complete!")
print("=" * 80)

---

## Methodology Summary

### Factor-Based Stock Selection Process:

1. **Factor Extraction**: Rolling SVD decomposition on preprocessed return data
2. **Factor Forecasting**: ARIMA models with fallback to trend analysis
3. **Loading Analysis**: Latest factor loadings show asset sensitivities
4. **Score Calculation**: `Score = Σ(Loading × Forecast_Direction × Forecast_Magnitude × Confidence)`
5. **Stock Ranking**: Rank by absolute score for long/short recommendations
6. **Risk Management**: Portfolio factor exposure analysis and concentration limits

### Key Features:

- **Systematic Approach**: Removes emotion and bias from stock selection
- **Confidence Weighting**: Prioritizes high-confidence forecasts
- **Factor Attribution**: Explains why each stock is recommended
- **Risk Control**: Monitors factor exposures and concentration
- **Scalable**: Works with any number of stocks and factors

### Risk Considerations:

- Factor forecasts may be inaccurate
- Model assumes factor loadings remain stable
- Market regime changes can affect factor behavior
- Transaction costs and liquidity not considered
- Regular model updates and validation required

---