# Factor Model Testing (Part 3)

This notebook evaluates several linear factor models using the 49 industry excess-return portfolios and the same factor data set as Part 2.

## Setup

In [None]:
import pandas as pd
import numpy as np

In [None]:
DATA_PATH = 'factor_pricing_data_monthly.xlsx'
FACTORS_SHEET = 'factors (excess returns)'
PORTFOLIOS_SHEET = 'portfolios (excess returns)'

factors = pd.read_excel(DATA_PATH, sheet_name=FACTORS_SHEET, parse_dates=['Date']).set_index('Date').sort_index()
portfolios = pd.read_excel(DATA_PATH, sheet_name=PORTFOLIOS_SHEET, parse_dates=['Date']).set_index('Date').sort_index()

combined = factors.join(portfolios, how='inner')

factors_aligned = combined[factors.columns]
portfolios_aligned = combined[portfolios.columns]

factors_aligned.head()

## Model definitions

In [None]:
MODELS = {
    'CAPM': ['MKT'],
    'Fama-French 3F': ['MKT', 'SMB', 'HML'],
    'Fama-French 5F': ['MKT', 'SMB', 'HML', 'RMW', 'CMA'],
    'AQR': ['MKT', 'HML', 'RMW', 'UMD'],
}

MODELS

## Helper functions

In [None]:
def run_time_series(y, X):
    X_design = np.column_stack([np.ones(len(X)), X])
    coef, *_ = np.linalg.lstsq(X_design, y, rcond=None)
    fitted = X_design @ coef
    resid = y - fitted
    sse = np.sum(resid**2)
    sst = np.sum((y - y.mean())**2)
    r_squared = 1 - sse / sst
    alpha = coef[0]
    betas = coef[1:]
    return alpha, betas, r_squared, resid

In [None]:
time_series_results = {}
for model, cols in MODELS.items():
    factor_subset = factors_aligned[cols]
    alphas = []
    r_values = []
    betas = []
    residuals = []
    for asset in portfolios_aligned.columns:
        y = portfolios_aligned[asset].values
        alpha, beta_vec, r2, resid = run_time_series(y, factor_subset.values)
        alphas.append(alpha)
        r_values.append(r2)
        betas.append(beta_vec)
        residuals.append(resid)
    time_series_results[model] = {
        'alphas': pd.Series(alphas, index=portfolios_aligned.columns),
        'r_squared': pd.Series(r_values, index=portfolios_aligned.columns),
        'betas': pd.DataFrame(betas, index=portfolios_aligned.columns, columns=cols),
        'residuals': pd.DataFrame(residuals, index=portfolios_aligned.columns, columns=portfolios_aligned.index)
    }

## AQR model: alphas and ^2$

Time-series regressions of each industry portfolio on the AQR factors (MKT, HML, RMW, UMD).

In [None]:
aqr_alphas = time_series_results['AQR']['alphas']
aqr_r2 = time_series_results['AQR']['r_squared']

aqr_output = pd.DataFrame({
    'alpha': aqr_alphas,
    'R_squared': aqr_r2
})
aqr_output.round(4)

## Time-series MAE of alphas and average ^2$

Lower mean absolute alpha indicates a better pricing fit. Higher ^2$ indicates better time-series explanatory power.

In [None]:
summary_rows = []
for model, res in time_series_results.items():
    mae_alpha = res['alphas'].abs().mean()
    avg_r2 = res['r_squared'].mean()
    summary_rows.append({
        'model': model,
        'MAE_alpha': mae_alpha,
        'avg_R_squared': avg_r2
    })
summary_df = pd.DataFrame(summary_rows).set_index('model')
summary_df.round(4)

## Average regression ^2$ by model

These averages summarize fit across the 49 industry portfolios.

In [None]:
avg_r2 = summary_df['avg_R_squared'].sort_values(ascending=False)
avg_r2.round(4)

## Cross-sectional pricing test

We estimate factor risk premia by regressing the average asset returns on their time-series betas (no intercept because returns are already excess).

In [None]:
factor_means = factors_aligned.mean()
cs_results = {}
for model, cols in MODELS.items():
    betas = time_series_results[model]['betas']
    avg_returns = portfolios_aligned.mean()
    lambdas, *_ = np.linalg.lstsq(betas.values, avg_returns.values, rcond=None)
    fitted = betas.values @ lambdas
    residuals = avg_returns.values - fitted
    cs_results[model] = {
        'lambdas': pd.Series(lambdas, index=cols),
        'mae_residual': np.mean(np.abs(residuals)),
        'residuals': pd.Series(residuals, index=betas.index),
    }

comparison_tables = {}
for model, cols in MODELS.items():
    comparison_tables[model] = pd.DataFrame({
        'time_series_mean': factor_means[cols],
        'cross_section_lambda': cs_results[model]['lambdas']
    })
comparison_tables['CAPM'].round(4)

In [None]:
comparison_tables['Fama-French 3F'].round(4)

In [None]:
comparison_tables['Fama-French 5F'].round(4)

In [None]:
comparison_tables['AQR'].round(4)

### Cross-sectional residual MAE

In [None]:
cs_mae = pd.Series({model: res['mae_residual'] for model, res in cs_results.items()})
cs_mae.round(4)

## Interpretation

- **Time-series performance:** CAPM produces the lowest mean absolute alpha (~0.17% per month), while the five-factor and AQR extensions reduce the average alpha only marginally and actually increase MAE relative to CAPM. Nonetheless, additional factors raise average ^2$ (up to ~0.59 for FF5), indicating better explanatory power for return variation.
- **Factor relevance:** Size (SMB) and value (HML) display negative cross-sectional premia, suggesting they were not rewarded in this sample, especially post-2000. Profitability (RMW) and momentum (UMD) retain positive prices of risk, implying that quality and momentum exposures help fit the cross section.
- **Momentum's role:** The AQR model assigns a sizable momentum premium (~0.45% per month) and small alphas, supporting the inclusion of UMD alongside profitability.
- **Cross-sectional vs. time-series premia:** Estimated factor prices broadly align in sign with their time-series means, but the cross-sectional magnitudes can differ substantially, especially for size and value.
- **Residual MAE:** Cross-sectional residual errors (1.1–1.7% per month) are of the same order as time-series alphas, so none of the models price the 49 portfolios perfectly. The five-factor specification attains the lowest cross-sectional MAE, albeit only slightly better than the others.