# 02 — Fit Student-t Margins & Probability Integral Transform (PIT)

**Objective**
- Fit a univariate **Student-t** distribution to each asset’s daily log returns.
- Apply **PIT** to transform returns → uniforms in \[0,1\] (required for copulas).
- Sanity-check the uniforms (summary + basic uniformity tests).
- Persist fitted parameters and uniforms to `data/processed/`.

**Why t-margins?**
They capture heavy tails better than Gaussian, which improves risk realism.


In [None]:
%load_ext autoreload
%autoreload 2

import os
import numpy as np
import pandas as pd
from scipy import stats

from quantcopula.data import get_prices
from quantcopula.margins import (
    compute_log_returns,
    fit_student_t_params,
    to_uniforms,
    params_to_frame,
)

# Project paths & params
TICKERS = ["XLK", "XLF", "XLI", "XLV"]
START_DATE = "2013-01-01"
END_DATE = "2023-12-31"

os.makedirs("data/processed", exist_ok=True)
os.makedirs("figures", exist_ok=True)


In [None]:
# Load cached prices from previous notebook/pipeline (downloaded if missing)
prices = get_prices(
    TICKERS,
    START_DATE,
    END_DATE,
    cache_path="data/raw/prices.parquet",
    auto_adjust=False,
)
prices = prices.dropna(how="any")

# Compute daily log returns
log_returns = compute_log_returns(prices)
log_returns.tail()


## Fit Student-t to each asset

We fit (ν, μ, σ) per column via MLE:
- **ν (df)** controls tail heaviness,
- **μ (loc)** is location,
- **σ (scale)** is dispersion.


In [None]:
params = fit_student_t_params(log_returns)
params_df = params_to_frame(params)
display(params_df.round(6))

# Save parameters for reproducibility / later reuse
params_path = "data/processed/t_params.csv"
params_df.to_csv(params_path, index=False)
params_path


## Probability Integral Transform (PIT)

For each series, transform returns `r` to uniforms `u = F_t(r | ν, μ, σ)`.
If the t-fit is reasonable, `u` should look approximately **Uniform(0,1)**.


In [None]:
U = to_uniforms(log_returns, params)

# Save for the next notebook (copula fit)
u_path = "data/processed/uniforms.parquet"
U.to_parquet(u_path)
u_path


## Sanity Checks

- Descriptive stats should be near mean ≈ 0.5, std ≈ ~0.29 for Uniform(0,1).
- Optional: one-sample Kolmogorov–Smirnov (KS) test against `Uniform(0,1)`.
- Basic visualization: histograms in U-space.


In [None]:
summary = U.describe().T  # count/mean/std/min/25%/50%/75%/max
display(summary.round(4))

# KS test per column against U(0,1)
ks_results = {}
for col in U.columns:
    stat, pval = stats.kstest(U[col].dropna().values, "uniform", args=(0, 1))
    ks_results[col] = {"KS stat": stat, "p-value": pval}

pd.DataFrame(ks_results).T.round(4)


In [None]:
import matplotlib.pyplot as plt
import math

cols = list(U.columns)
n = len(cols)
rows = math.ceil(n / 2)

plt.figure(figsize=(10, 3.5*rows))
for i, c in enumerate(cols, 1):
    plt.subplot(rows, 2, i)
    plt.hist(U[c].dropna(), bins=30, alpha=0.7, edgecolor="k")
    plt.title(f"{c} — U-space")
    plt.xlim(0, 1)
    plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()


## Takeaways

- **Fitted t parameters** show tail behavior per sector (ν smaller → heavier tails).
- **Uniform margins** look reasonable (means ≈ 0.5, KS p-values not too small).
- These uniforms are the **inputs for the copula** in the next notebook.


**Next:** `03_fit_gaussian_copula.ipynb`
Fit a **Gaussian copula** on the uniform margins and compare empirical vs simulated dependence.
