# 03 - Extreme Value Theory: Tail Risk Analysis

**Extreme Value Theory (EVT)** provides rigorous statistical tools for modelling the
tails of return distributions -- exactly where standard models (Gaussian, Student-t)
break down.

This notebook covers:
1. **GEV (Generalized Extreme Value)** distribution fitted to block maxima of losses
2. **GPD (Generalized Pareto Distribution)** fitted to exceedances over a threshold
3. **Value-at-Risk (VaR)** and **Expected Shortfall (ES)** from the GPD model
4. Rolling VaR/ES to track how tail risk evolves over time

## Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from fatcrash.data.ingest import from_yahoo
from fatcrash.data.transforms import (
    log_returns, time_index, negative_returns, block_maxima,
)
from fatcrash.indicators.evt_indicator import fit_gpd, fit_gev, compute_var_es, rolling_var_es
from fatcrash.viz.evt_plots import gpd_tail_fit, rolling_var_es_plot

plt.style.use("seaborn-v0_8-whitegrid")
plt.rcParams["figure.figsize"] = (14, 5)

## 1. Load and prepare data

In [None]:
df = from_yahoo("BTC-USD", start="2015-01-01", end="2025-12-31")
df = time_index(df)
df["log_return"] = log_returns(df["close"].values)
df = df.dropna(subset=["log_return"])

# Negative returns (losses) for tail analysis -- we flip sign so losses are positive
losses = negative_returns(df["log_return"].values)
print(f"Total observations: {len(df)}")
print(f"Number of negative return days: {len(losses)}")
print(f"Worst daily loss: {losses.max():.4f} ({losses.max()*100:.2f}%)")

## 2. GEV fit to block maxima

The **Block Maxima** approach divides the data into non-overlapping blocks (e.g.,
monthly) and takes the maximum loss in each block. The GEV distribution then
describes the distribution of these maxima.

$$F(x) = \exp\left(-\left[1 + \xi\left(\frac{x-\mu}{\sigma}\right)\right]^{-1/\xi}\right)$$

When $\xi > 0$ (Frechet domain), the tail is heavy -- typical for financial data.

In [None]:
# Compute monthly block maxima of losses
bm = block_maxima(df["log_return"], block_size="M")
print(f"Number of blocks: {len(bm)}")
print(f"Block maxima range: [{bm.min():.4f}, {bm.max():.4f}]")

In [None]:
# Fit GEV
gev_result = fit_gev(bm)

print("GEV parameters:")
print(f"  xi (shape):    {gev_result.xi:.4f}")
print(f"  mu (location): {gev_result.mu:.4f}")
print(f"  sigma (scale): {gev_result.sigma:.4f}")
print()
if gev_result.xi > 0:
    print("xi > 0 => Frechet domain (heavy tail). Tail index alpha ~ 1/xi =", f"{1/gev_result.xi:.2f}")
else:
    print("xi <= 0 => Weibull/Gumbel domain (thin tail or bounded).")

In [None]:
# Plot GEV fit vs empirical block maxima
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram
axes[0].hist(bm, bins=30, density=True, alpha=0.6, color="steelblue", label="Empirical")
from scipy.stats import genextreme
x = np.linspace(bm.min(), bm.max(), 200)
# scipy uses c = -xi convention
axes[0].plot(x, genextreme.pdf(x, -gev_result.xi, loc=gev_result.mu, scale=gev_result.sigma),
             "r-", linewidth=2, label="GEV fit")
axes[0].set_title("GEV Fit to Monthly Block Maxima of Losses")
axes[0].set_xlabel("Maximum Daily Loss in Month")
axes[0].legend()

# QQ plot
theoretical = np.sort(genextreme.rvs(-gev_result.xi, loc=gev_result.mu,
                                      scale=gev_result.sigma, size=len(bm)))
empirical = np.sort(bm)
axes[1].scatter(theoretical, empirical, s=10, alpha=0.6)
lims = [min(theoretical.min(), empirical.min()), max(theoretical.max(), empirical.max())]
axes[1].plot(lims, lims, "r--", linewidth=1)
axes[1].set_xlabel("GEV Theoretical Quantiles")
axes[1].set_ylabel("Empirical Quantiles")
axes[1].set_title("QQ Plot: GEV vs Block Maxima")

plt.tight_layout()
plt.show()

## 3. GPD fit to threshold exceedances

The **Peaks Over Threshold (POT)** method fits a GPD to losses exceeding a high
threshold $u$. This is often more data-efficient than block maxima.

$$G(y) = 1 - \left(1 + \frac{\xi y}{\sigma}\right)^{-1/\xi}, \quad y = x - u > 0$$

In [None]:
# Use the 95th percentile of losses as threshold
threshold = np.percentile(losses, 95)
print(f"Threshold (95th percentile of losses): {threshold:.4f} ({threshold*100:.2f}%)")
exceedances = losses[losses > threshold] - threshold
print(f"Number of exceedances: {len(exceedances)}")

In [None]:
# Fit GPD
gpd_result = fit_gpd(losses, threshold=threshold)

print("GPD parameters:")
print(f"  xi (shape):    {gpd_result.xi:.4f}")
print(f"  sigma (scale): {gpd_result.sigma:.4f}")
print(f"  threshold:     {gpd_result.threshold:.4f}")

In [None]:
# Visualize the GPD tail fit
gpd_tail_fit(
    losses=losses,
    threshold=threshold,
    xi=gpd_result.xi,
    sigma=gpd_result.sigma,
    title="GPD Tail Fit: BTC Daily Losses",
)

## 4. Compute VaR and Expected Shortfall

- **VaR(q)**: the loss level exceeded with probability q (e.g., 1%)
- **ES(q)**: the expected loss given that VaR is exceeded (conditional tail expectation)

In [None]:
for q in [0.01, 0.025, 0.05]:
    var, es = compute_var_es(
        xi=gpd_result.xi,
        sigma=gpd_result.sigma,
        threshold=threshold,
        n_total=len(losses) + (len(df) - len(losses)),  # total observations
        n_exceed=len(exceedances),
        quantile=q,
    )
    print(f"  VaR({q:.1%}) = {var:.4f} ({var*100:.2f}%)   |   ES({q:.1%}) = {es:.4f} ({es*100:.2f}%)")

## 5. Rolling VaR and ES

Track how tail risk evolves over time by re-fitting the GPD in rolling windows.

In [None]:
# Compute rolling VaR and ES with a 500-day window
rolling_result = rolling_var_es(
    returns=df["log_return"].values,
    window=500,
    quantile=0.01,
    step=5,  # re-estimate every 5 days
)

# Build DataFrame
roll_df = pd.DataFrame({
    "date": df.index[rolling_result.indices],
    "var_01": rolling_result.var,
    "es_01": rolling_result.es,
}).set_index("date")

print(f"Rolling estimates: {len(roll_df)} points")
roll_df.describe()

In [None]:
# Plot rolling VaR/ES alongside price
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# Price
axes[0].plot(df.index, df["close"], color="steelblue", linewidth=0.8)
axes[0].set_yscale("log")
axes[0].set_ylabel("Price (USD)")
axes[0].set_title("BTC Price")

# Rolling VaR and ES
axes[1].plot(roll_df.index, roll_df["var_01"], color="orange", linewidth=1, label="VaR(1%)")
axes[1].plot(roll_df.index, roll_df["es_01"], color="red", linewidth=1, label="ES(1%)")
axes[1].fill_between(roll_df.index, roll_df["var_01"], roll_df["es_01"],
                      color="red", alpha=0.15)
axes[1].set_ylabel("Loss (absolute)")
axes[1].set_title("Rolling 1% VaR and Expected Shortfall (500-day window)")
axes[1].legend()

plt.tight_layout()
plt.show()

In [None]:
# Alternatively, use the built-in plot
rolling_var_es_plot(
    dates=df.index,
    prices=df["close"].values,
    rolling_result=rolling_result,
    title="BTC Rolling Tail Risk (EVT-GPD)",
)

## Summary

- Both GEV and GPD fits confirm **heavy tails** ($\xi > 0$) in BTC returns.
- EVT-based VaR/ES gives more realistic risk estimates than Gaussian assumptions.
- Rolling VaR/ES shows that tail risk is **time-varying** -- it spikes during crash
  periods and compresses during calmer markets.
- These rolling risk measures feed into the aggregated crash signal (notebook 06).