# 06 - Signal Aggregation

No single indicator is sufficient for reliable crash prediction. This notebook
shows how to combine **LPPLS confidence**, **EVT tail risk**, **Hill thinning**,
**kappa regime**, and **tc proximity** into a single composite crash probability.

We will:
1. Compute each individual signal component
2. Combine them using the `aggregate_signals` function
3. Tune the aggregation weights
4. Visualize the composite signal against actual crashes

## Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from fatcrash.data.ingest import from_yahoo
from fatcrash.data.transforms import log_returns, log_prices, time_index, negative_returns
from fatcrash.indicators.lppls_indicator import compute_confidence
from fatcrash.indicators.tail_indicator import rolling_tail_index, rolling_kappa
from fatcrash.indicators.evt_indicator import rolling_var_es
from fatcrash.aggregator.signals import (
    aggregate_signals,
    CrashSignal,
    lppls_confidence_signal,
    tc_proximity_signal,
    var_exceedance_signal,
    kappa_regime_signal,
    hill_thinning_signal,
)
from fatcrash.aggregator.calibration import calibrate_weights
from fatcrash.viz.charts import plot_crash_signal

plt.style.use("seaborn-v0_8-whitegrid")
plt.rcParams["figure.figsize"] = (14, 5)

## 1. Load and prepare data

In [None]:
df = from_yahoo("BTC-USD", start="2015-01-01", end="2025-12-31")
df = time_index(df)
df["log_return"] = log_returns(df["close"].values)
df["log_price"] = log_prices(df["close"].values)
df = df.dropna(subset=["log_return"])

dates = df.index
prices = df["close"].values
returns = df["log_return"].values
lp = df["log_price"].values
t = np.arange(len(lp), dtype=np.float64)

print(f"Observations: {len(df)}, from {dates[0].date()} to {dates[-1].date()}")

## 2. Compute individual signal components

In [None]:
# 2a. LPPLS confidence and tc proximity
lppls_conf = compute_confidence(
    t, lp,
    window_sizes=[120, 180, 250, 365],
    step=5,
)

sig_lppls = lppls_confidence_signal(lppls_conf.positive)
sig_tc = tc_proximity_signal(lppls_conf.tc_estimates, t)

print(f"LPPLS confidence signal: max = {sig_lppls.max():.3f}")
print(f"tc proximity signal: max = {sig_tc.max():.3f}")

In [None]:
# 2b. VaR exceedance signal
rolling_risk = rolling_var_es(returns=returns, window=500, quantile=0.01, step=5)
sig_var = var_exceedance_signal(
    returns=returns,
    rolling_var=rolling_risk.var,
    indices=rolling_risk.indices,
)

print(f"VaR exceedance signal: max = {sig_var.max():.3f}")

In [None]:
# 2c. Kappa regime signal
roll_kappa = rolling_kappa(returns=returns, window=500, step=5)
sig_kappa = kappa_regime_signal(roll_kappa.values)

print(f"Kappa regime signal: max = {sig_kappa.max():.3f}")

In [None]:
# 2d. Hill thinning signal (declining tail index = rising risk)
roll_hill = rolling_tail_index(returns=returns, window=500, k_fraction=0.05, step=5)
sig_hill = hill_thinning_signal(roll_hill.values)

print(f"Hill thinning signal: max = {sig_hill.max():.3f}")

## 3. Visualize individual signals

In [None]:
# Align all signals to the common daily index
# Each signal may have different lengths due to warmup periods
signal_names = [
    "LPPLS Confidence",
    "tc Proximity",
    "VaR Exceedance",
    "Kappa Regime",
    "Hill Thinning",
]

signals_list = [sig_lppls, sig_tc, sig_var, sig_kappa, sig_hill]

fig, axes = plt.subplots(len(signals_list) + 1, 1, figsize=(14, 3 * (len(signals_list) + 1)),
                          sharex=True)

# Price
axes[0].plot(dates, prices, color="steelblue", linewidth=0.8)
axes[0].set_yscale("log")
axes[0].set_ylabel("Price")
axes[0].set_title("BTC/USD with Individual Signal Components")

# Each signal component
signal_colors = ["red", "purple", "orange", "green", "blue"]
for i, (sig, name, color) in enumerate(zip(signals_list, signal_names, signal_colors)):
    ax = axes[i + 1]
    # Pad signal to match date length (signals start after warmup)
    padded = np.full(len(dates), np.nan)
    padded[len(dates) - len(sig):] = sig
    ax.fill_between(dates, 0, padded, color=color, alpha=0.4)
    ax.set_ylabel(name, fontsize=9)
    ax.set_ylim(0, 1)

plt.tight_layout()
plt.show()

## 4. Aggregate into composite crash signal

The `aggregate_signals` function combines individual components using a
weighted average. The `CrashSignal` dataclass holds both the composite
signal and the individual components.

In [None]:
# Default weights
weights = {
    "lppls_confidence": 0.30,
    "tc_proximity": 0.15,
    "var_exceedance": 0.20,
    "kappa_regime": 0.15,
    "hill_thinning": 0.20,
}

crash_signal = aggregate_signals(
    lppls_confidence=sig_lppls,
    tc_proximity=sig_tc,
    var_exceedance=sig_var,
    kappa_regime=sig_kappa,
    hill_thinning=sig_hill,
    weights=weights,
)

print(f"Composite signal shape: {crash_signal.composite.shape}")
print(f"Composite signal range: [{crash_signal.composite.min():.3f}, {crash_signal.composite.max():.3f}]")

In [None]:
# Plot composite crash signal
plot_crash_signal(
    dates=dates,
    prices=prices,
    signal=crash_signal.composite,
    title="BTC/USD Composite Crash Signal",
    threshold=0.5,
)

## 5. Tune weights using calibration

We can use historical crash dates to calibrate the signal weights,
maximizing the signal value before known crashes while minimizing
false positives.

In [None]:
# Known BTC crash dates (approximate peaks before major drawdowns)
crash_dates = pd.to_datetime([
    "2017-12-17",  # 2017 bubble peak
    "2021-04-14",  # 2021 spring crash
    "2021-11-10",  # 2021 bubble peak
])

# Calibrate weights to maximize signal before crash dates
calibrated_weights = calibrate_weights(
    signals={
        "lppls_confidence": sig_lppls,
        "tc_proximity": sig_tc,
        "var_exceedance": sig_var,
        "kappa_regime": sig_kappa,
        "hill_thinning": sig_hill,
    },
    dates=dates,
    crash_dates=crash_dates,
    lookahead_days=30,  # signal should be high 30 days before crash
)

print("Calibrated weights:")
for k, v in calibrated_weights.items():
    print(f"  {k}: {v:.3f}")

In [None]:
# Recompute with calibrated weights
calibrated_signal = aggregate_signals(
    lppls_confidence=sig_lppls,
    tc_proximity=sig_tc,
    var_exceedance=sig_var,
    kappa_regime=sig_kappa,
    hill_thinning=sig_hill,
    weights=calibrated_weights,
)

# Compare default vs calibrated
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

axes[0].plot(dates, prices, color="steelblue", linewidth=0.8)
axes[0].set_yscale("log")
axes[0].set_ylabel("Price")
axes[0].set_title("BTC/USD Price")
for cd in crash_dates:
    axes[0].axvline(cd, color="red", linestyle="--", alpha=0.5)

# Default signal
padded_default = np.full(len(dates), np.nan)
padded_default[len(dates) - len(crash_signal.composite):] = crash_signal.composite
axes[1].fill_between(dates, 0, padded_default, color="orange", alpha=0.5)
axes[1].set_ylabel("Signal")
axes[1].set_title("Default Weights")
axes[1].set_ylim(0, 1)
for cd in crash_dates:
    axes[1].axvline(cd, color="red", linestyle="--", alpha=0.5)

# Calibrated signal
padded_calib = np.full(len(dates), np.nan)
padded_calib[len(dates) - len(calibrated_signal.composite):] = calibrated_signal.composite
axes[2].fill_between(dates, 0, padded_calib, color="red", alpha=0.5)
axes[2].set_ylabel("Signal")
axes[2].set_title("Calibrated Weights")
axes[2].set_ylim(0, 1)
for cd in crash_dates:
    axes[2].axvline(cd, color="red", linestyle="--", alpha=0.5)

plt.tight_layout()
plt.show()

## 6. Signal statistics and thresholding

In [None]:
# Analyze the composite signal distribution
composite = calibrated_signal.composite
composite_clean = composite[~np.isnan(composite)]

print("Composite signal statistics:")
print(f"  Mean:   {np.mean(composite_clean):.4f}")
print(f"  Median: {np.median(composite_clean):.4f}")
print(f"  Std:    {np.std(composite_clean):.4f}")
print(f"  P90:    {np.percentile(composite_clean, 90):.4f}")
print(f"  P95:    {np.percentile(composite_clean, 95):.4f}")
print(f"  P99:    {np.percentile(composite_clean, 99):.4f}")

# Count days above threshold
for threshold in [0.3, 0.5, 0.7]:
    n_above = np.sum(composite_clean > threshold)
    pct = n_above / len(composite_clean) * 100
    print(f"  Days above {threshold}: {n_above} ({pct:.1f}%)")

## Summary

- Five independent signal components capture different aspects of crash risk:
  LPPLS bubble detection, critical time proximity, EVT tail risk, kappa regime,
  and Hill tail thinning.
- The composite signal is a **weighted average** that can be tuned using
  historical crash dates.
- Calibrated weights typically increase the signal's discriminating power
  compared to equal or default weights.
- The composite signal is the main output of the fatcrash system, used in the
  historical backtest (notebook 08).