# Factor Analysis (Part 2)

This notebook analyzes the six monthly equity factors provided in factor_pricing_data_monthly.xlsx. The goal is to understand their standalone properties, interactions, and implications for portfolio construction.

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

plt.style.use('seaborn-v0_8-darkgrid')

In [None]:
# Load factor data
FACTOR_SHEET = 'factors (excess returns)'
DATA_PATH = 'factor_pricing_data_monthly.xlsx'

factors = pd.read_excel(DATA_PATH, sheet_name=FACTOR_SHEET, parse_dates=['Date'])
factors = factors.set_index('Date').sort_index()
factors.head()

## Univariate factor statistics

We evaluate monthly and annualized mean returns, volatility, and Sharpe ratios for each factor.

In [None]:
monthly_mean = factors.mean()
monthly_vol = factors.std()
monthly_sharpe = monthly_mean / monthly_vol

annual_mean = monthly_mean * 12
annual_vol = monthly_vol * np.sqrt(12)
annual_sharpe = annual_mean / annual_vol

summary_stats = pd.DataFrame({
    'mean_monthly': monthly_mean,
    'vol_monthly': monthly_vol,
    'sharpe_monthly': monthly_sharpe,
    'mean_annualized': annual_mean,
    'vol_annualized': annual_vol,
    'sharpe_annualized': annual_sharpe
})
summary_stats.round(4)

The excess-return means indicate whether each factor earns a positive risk premium. Below we flag positive vs. negative monthly averages.

In [None]:
positive_premia = (monthly_mean > 0).rename('positive_risk_premium')
positive_premia

## Post-2015 factor performance

We focus on the modern sample from 2015 onward to see how factor premia have evolved. We compare summary statistics and show cumulative excess returns.

In [None]:
recent_start = '2015-01-01'
recent = factors.loc[factors.index >= recent_start]

recent_stats = pd.DataFrame({
    'mean_monthly': recent.mean(),
    'vol_monthly': recent.std(),
    'sharpe_monthly': recent.mean() / recent.std(),
}).round(4)
recent_stats

In [None]:
cumulative_recent = (1 + recent).cumprod() - 1
ax = cumulative_recent.plot(figsize=(10, 6))
ax.set_title('Cumulative Excess Returns Since 2015')
ax.set_ylabel('Cumulative excess return')
ax.legend(loc='upper left', ncol=2, fontsize=9)
plt.show()

## Factor correlations

Small correlations support diversification among styles. We compute the correlation matrix across the six factors.

In [None]:
correlations = factors.corr()
correlations.round(3)

## Tangency portfolios

We compute maximum-Sharpe (tangency) weights using the six factors, assuming a zero risk-free rate because the inputs are already excess returns.

In [None]:
def tangency_portfolio(data):
    mean_returns = data.mean().values
    cov = data.cov().values
    weights = np.linalg.solve(cov, mean_returns)
    weights /= weights.sum()
    return pd.Series(weights, index=data.columns)

weights_all = tangency_portfolio(factors)
weights_all.sort_values(ascending=False).round(4)

In [None]:
tangency_all_stats = pd.Series({
    'expected_monthly_excess_return': np.dot(weights_all, monthly_mean),
    'vol_monthly': np.sqrt(np.dot(weights_all, factors.cov().dot(weights_all))),
})
tangency_all_stats['sharpe_monthly'] = tangency_all_stats['expected_monthly_excess_return'] / tangency_all_stats['vol_monthly']
tangency_all_stats.round(4)

We repeat the exercise restricting the universe to MKT, SMB, HML, and UMD to mimic the traditional styles.

In [None]:
subset_cols = ['MKT', 'SMB', 'HML', 'UMD']
weights_subset = tangency_portfolio(factors[subset_cols])
weights_subset.sort_values(ascending=False).round(4)

In [None]:
subset_mean = factors[subset_cols].mean()
subset_cov = factors[subset_cols].cov()
tangency_subset_stats = pd.Series({
    'expected_monthly_excess_return': np.dot(weights_subset, subset_mean),
    'vol_monthly': np.sqrt(np.dot(weights_subset, subset_cov.dot(weights_subset))),
})
tangency_subset_stats['sharpe_monthly'] = tangency_subset_stats['expected_monthly_excess_return'] / tangency_subset_stats['vol_monthly']
tangency_subset_stats.round(4)

## Interpretation

Key takeaways from the tables above:

- **Risk premia:** Most factors exhibit positive average excess returns over the full sample, but some (notably CMA) are modest. Post-2015 averages reveal which premia have persisted in recent years.
- **Correlations:** The correlation matrix highlights diversification potential. Compare HML's correlations to judge whether it is redundant relative to other profitability or investment factors.
- **Tangency weights:** The maximum-Sharpe allocations show which factors contribute most to the optimal diversified portfolio. Even low-mean factors can receive weight if they hedge others effectively. Re-running the tangency optimization on the MKT/SMB/HML/UMD subset reveals how momentum interacts with the Fama–French styles.

### Highlights

- **Full-sample premia:** Monthly averages range from 0.05% (SMB) to 0.73% (MKT). RMW (0.37%), CMA (0.24%), and UMD (0.50%) earn the best Sharpe ratios, while SMB's Sharpe is only 0.06.
- **Post-2015 behavior:** MKT (0.98% monthly), RMW (0.33%), and UMD (0.17%) remain positive. SMB (−0.20%), HML (−0.14%), and CMA (−0.08%) turn negative, illustrating the headwinds to traditional size, value, and investment tilts in the last decade.
- **Correlations:** HML and CMA are highly correlated (0.68), supporting the idea that HML overlaps with investment. MKT has modest positive correlation with SMB (0.23) and negative correlation with the quality/profitability factors.
- **Tangency (six factors):** Optimization favors CMA (0.32), RMW (0.30), and MKT (0.22), with a small short in HML (−0.02).
- **Tangency (MKT/SMB/HML/UMD):** Weights concentrate in MKT (0.38), HML (0.37), and UMD (0.31), while SMB carries a small short (−0.05). Momentum remains valuable when value and market are present, mitigating the need for SMB in the optimal mix.