# 04 - Hill Estimator and Kappa Analysis

The **Hill estimator** is the classical tool for estimating the tail index $\alpha$
of a heavy-tailed distribution. When $\alpha$ is small, extreme events are more
probable. The **kappa metric** generalizes this to detect changes in tail thickness.

This notebook:
1. Computes the Hill estimator for BTC returns
2. Creates a Hill plot (tail index vs. number of order statistics)
3. Shows rolling kappa to track tail regime changes over time
4. Produces QQ plots to assess tail fit quality
5. Compares tail behavior across multiple assets

## Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from fatcrash.data.ingest import from_yahoo
from fatcrash.data.transforms import log_returns, time_index, negative_returns
from fatcrash.indicators.tail_indicator import (
    estimate_tail_index, rolling_tail_index, estimate_kappa, rolling_kappa,
)
from fatcrash.viz.tail_dashboard import hill_plot, kappa_evolution, qq_plot

plt.style.use("seaborn-v0_8-whitegrid")
plt.rcParams["figure.figsize"] = (14, 5)

## 1. Load data

In [None]:
df = from_yahoo("BTC-USD", start="2015-01-01", end="2025-12-31")
df = time_index(df)
df["log_return"] = log_returns(df["close"].values)
df = df.dropna(subset=["log_return"])

returns = df["log_return"].values
losses = negative_returns(returns)  # positive values representing magnitude of losses

print(f"Observations: {len(df)}")
print(f"Loss days: {len(losses)}")

## 2. Hill estimator

The Hill estimator for the tail index $\alpha$ using the $k$ largest order statistics
$X_{(1)} \geq X_{(2)} \geq \cdots \geq X_{(k)}$ is:

$$\hat{\alpha}_k^{-1} = \frac{1}{k} \sum_{i=1}^{k} \ln X_{(i)} - \ln X_{(k+1)}$$

A smaller $\alpha$ means a heavier tail. For $\alpha \leq 2$, the variance is infinite;
for $\alpha \leq 1$, the mean is infinite.

In [None]:
# Estimate tail index with a specific number of order statistics
k = int(0.05 * len(losses))  # use top 5% of losses
alpha = estimate_tail_index(losses, k=k)

print(f"Number of order statistics (k): {k}")
print(f"Estimated tail index alpha: {alpha:.3f}")
print(f"Interpretation:")
if alpha < 2:
    print(f"  alpha < 2 => Infinite variance regime")
elif alpha < 4:
    print(f"  2 < alpha < 4 => Finite variance but very heavy tail")
else:
    print(f"  alpha >= 4 => Moderately heavy tail")

## 3. Hill plot

The Hill plot shows $\hat{\alpha}$ as a function of $k$. A stable plateau region
suggests a reliable estimate. Instability at low $k$ is due to high variance;
instability at high $k$ is due to bias (including non-tail observations).

In [None]:
# Compute Hill estimates for a range of k values
k_values = np.arange(10, int(0.15 * len(losses)), 5)
alpha_values = np.array([estimate_tail_index(losses, k=k_val) for k_val in k_values])

fig, ax = plt.subplots(figsize=(14, 5))
ax.plot(k_values, alpha_values, color="steelblue", linewidth=0.8)
ax.axhline(alpha, color="red", linestyle="--", alpha=0.6, label=f"alpha = {alpha:.2f} (k={k})")
ax.set_xlabel("k (number of order statistics)")
ax.set_ylabel("Estimated tail index alpha")
ax.set_title("Hill Plot: BTC Left Tail")
ax.set_ylim(0, 8)
ax.legend()
plt.tight_layout()
plt.show()

In [None]:
# Or use the built-in Hill plot visualization
hill_plot(losses, max_k=int(0.15 * len(losses)), title="Hill Plot: BTC Losses")

## 4. Rolling tail index

The tail index is not constant -- it evolves as market conditions change.
A declining tail index signals increasing crash risk.

In [None]:
# Compute rolling Hill estimator
rolling_alpha = rolling_tail_index(
    returns=returns,
    window=500,
    k_fraction=0.05,
    step=5,
)

# Build DataFrame
roll_df = pd.DataFrame({
    "date": df.index[rolling_alpha.indices],
    "alpha": rolling_alpha.values,
}).set_index("date")

fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

axes[0].plot(df.index, df["close"], color="steelblue", linewidth=0.8)
axes[0].set_yscale("log")
axes[0].set_ylabel("Price (USD)")
axes[0].set_title("BTC/USD Price")

axes[1].plot(roll_df.index, roll_df["alpha"], color="darkorange", linewidth=1)
axes[1].axhline(2, color="red", linestyle="--", alpha=0.5, label="alpha=2 (infinite variance)")
axes[1].axhline(3, color="gray", linestyle="--", alpha=0.5, label="alpha=3")
axes[1].set_ylabel("Tail Index (alpha)")
axes[1].set_title("Rolling Hill Tail Index (500-day window)")
axes[1].set_ylim(0, 6)
axes[1].legend()

plt.tight_layout()
plt.show()

## 5. Kappa metric

The kappa metric extends the Hill estimator by measuring how the tail index
changes as we vary the threshold. It captures not just the asymptotic tail
behavior but also the approach to it.

In [None]:
# Full-sample kappa
kappa = estimate_kappa(losses)
print(f"Kappa metric: {kappa:.4f}")
print("Interpretation:")
print("  kappa ~ 0   => Exponential tail (thin)")
print("  kappa > 0   => Power-law tail (fat)")
print("  kappa >> 0  => Extremely heavy tail")

In [None]:
# Rolling kappa
rolling_kap = rolling_kappa(
    returns=returns,
    window=500,
    step=5,
)

kap_df = pd.DataFrame({
    "date": df.index[rolling_kap.indices],
    "kappa": rolling_kap.values,
}).set_index("date")

# Use the built-in visualization
kappa_evolution(
    dates=df.index,
    prices=df["close"].values,
    kappa_dates=kap_df.index,
    kappa_values=kap_df["kappa"].values,
    title="BTC Rolling Kappa Metric",
)

## 6. QQ plot for Pareto tail

In [None]:
# QQ plot comparing empirical tail to a Pareto distribution with the estimated alpha
qq_plot(losses, alpha=alpha, title="QQ Plot: BTC Losses vs Pareto")

## 7. Cross-asset comparison

Compare tail behavior across different assets to see which markets exhibit
the heaviest tails.

In [None]:
assets = {
    "BTC": "BTC-USD",
    "ETH": "ETH-USD",
    "SPX": "^GSPC",
    "Gold": "GC=F",
}

results = []
for name, ticker in assets.items():
    try:
        asset_df = from_yahoo(ticker, start="2017-01-01", end="2025-12-31")
        asset_df = time_index(asset_df)
        asset_returns = log_returns(asset_df["close"].values)
        asset_losses = negative_returns(asset_returns)

        k_asset = max(10, int(0.05 * len(asset_losses)))
        alpha_asset = estimate_tail_index(asset_losses, k=k_asset)
        kappa_asset = estimate_kappa(asset_losses)

        results.append({
            "asset": name,
            "n_obs": len(asset_returns),
            "alpha (Hill)": alpha_asset,
            "kappa": kappa_asset,
            "max_loss_%": asset_losses.max() * 100,
        })
    except Exception as e:
        print(f"Could not load {name}: {e}")

comparison = pd.DataFrame(results).set_index("asset")
comparison

In [None]:
# Bar chart comparison
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

comparison["alpha (Hill)"].plot.bar(ax=axes[0], color="steelblue", alpha=0.7)
axes[0].set_ylabel("Tail Index (alpha)")
axes[0].set_title("Hill Tail Index by Asset")
axes[0].axhline(2, color="red", linestyle="--", alpha=0.5)
axes[0].tick_params(axis="x", rotation=0)

comparison["kappa"].plot.bar(ax=axes[1], color="darkorange", alpha=0.7)
axes[1].set_ylabel("Kappa")
axes[1].set_title("Kappa Metric by Asset")
axes[1].tick_params(axis="x", rotation=0)

plt.tight_layout()
plt.show()

## Summary

- The Hill plot reveals the tail index and helps select an appropriate number of order
  statistics.
- Rolling tail index shows that the tail becomes heavier during stressed markets.
- The kappa metric provides a complementary view of tail behavior.
- Crypto assets (BTC, ETH) typically have heavier tails than traditional assets (SPX, Gold).
- Both the rolling Hill and kappa signals feed into the aggregated crash indicator (notebook 06).