# 08 - Historical Backtest

The ultimate test of a crash detection system is whether it would have fired
meaningful warnings before **known crashes**. This notebook runs the full
fatcrash pipeline against historical episodes using **bundled sample data**
(no internet required).

| Episode            | Asset  | Peak Date    | Drawdown  |
|--------------------|--------|--------------|-----------|
| 2017 BTC bubble    | BTC    | 2017-12-17   | ~-84%     |
| 2021 BTC bubble    | BTC    | 2021-11-10   | ~-77%     |
| 2008 Financial Crisis | SPY | 2007-10-09   | ~-57%     |
| 2020 COVID crash   | SPY    | 2020-02-19   | ~-34%     |
| 2011 Gold peak     | Gold   | 2011-09-06   | ~-45%     |

For each episode we:
1. Compute all indicators using only data available **before** the crash
2. Build the composite crash signal at each timestep
3. Evaluate whether the signal was elevated in the 30-60 days before the peak

> **DISCLAIMER:** This software is for academic research and educational purposes only.
> It does not constitute financial advice.

## Imports

In [None]:
import json
from pathlib import Path
from datetime import timedelta

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from fatcrash.data.ingest import from_sample
from fatcrash.data.transforms import log_returns, log_prices, time_index
from fatcrash.indicators.lppls_indicator import compute_confidence
from fatcrash.indicators.tail_indicator import rolling_tail_index, rolling_kappa
from fatcrash.indicators.evt_indicator import rolling_var_es
from fatcrash.aggregator.signals import (
    aggregate_signals,
    lppls_confidence_signal,
    tc_proximity_signal,
    var_exceedance_signal,
    kappa_regime_signal,
    hill_thinning_signal,
)

plt.style.use("seaborn-v0_8-whitegrid")
plt.rcParams["figure.figsize"] = (14, 6)

## 1. Load sample data and define crash episodes

We use the bundled sample datasets (BTC, SPY, Gold) so the notebook runs
offline. Each episode slices the relevant date range.

In [None]:
# Load all sample datasets
data_btc = from_sample("btc")
data_spy = from_sample("spy")
data_gold = from_sample("gold")

# Load known crash dates for reference
with open(Path("../data/sample/known_crashes.json")) as f:
    known = json.load(f)

print(f"BTC:  {len(data_btc)} days, {data_btc.index[0].date()} to {data_btc.index[-1].date()}")
print(f"SPY:  {len(data_spy)} days, {data_spy.index[0].date()} to {data_spy.index[-1].date()}")
print(f"Gold: {len(data_gold)} days, {data_gold.index[0].date()} to {data_gold.index[-1].date()}")

In [None]:
crash_episodes = [
    {
        "name": "2017 BTC Bubble",
        "data": data_btc,
        "data_start": "2015-01-01",
        "peak_date": "2017-12-17",
        "plot_end": "2018-06-01",
    },
    {
        "name": "2021 BTC Bubble",
        "data": data_btc,
        "data_start": "2019-01-01",
        "peak_date": "2021-11-10",
        "plot_end": "2022-06-01",
    },
    {
        "name": "2008 Financial Crisis (SPY)",
        "data": data_spy,
        "data_start": "2003-01-01",
        "peak_date": "2007-10-09",
        "plot_end": "2009-06-01",
    },
    {
        "name": "2020 COVID Crash (SPY)",
        "data": data_spy,
        "data_start": "2017-01-01",
        "peak_date": "2020-02-19",
        "plot_end": "2020-09-01",
    },
    {
        "name": "2011 Gold Peak",
        "data": data_gold,
        "data_start": "2006-01-01",
        "peak_date": "2011-09-06",
        "plot_end": "2012-06-01",
    },
]

for ep in crash_episodes:
    df_slice = ep["data"].loc[ep["data_start"]:ep["plot_end"]]
    print(f"{ep['name']:>35s}: {len(df_slice)} obs, peak {ep['peak_date']}")

## 2. Pipeline function

Runs the fatcrash pipeline at each timestep: LPPLS confidence, EVT VaR,
kappa regime, and Hill tail thinning. Converts each indicator to a [0,1]
signal and aggregates into a composite crash probability.

In [None]:
def run_pipeline(df, window=500):
    """
    Run the full fatcrash pipeline on a DataFrame with 'close' column.
    Returns a dict with dates, prices, per-timestep composite signal,
    and individual component signal arrays.
    """
    dates = df.index
    prices = df["close"].values
    rets = log_returns(df)
    lp = log_prices(df)
    t = time_index(df)
    n = len(dates)

    # ---- Rolling indicators ----
    # LPPLS confidence (returns conf, tc_mean, tc_std arrays)
    conf_arr, tc_mean_arr, tc_std_arr = compute_confidence(
        t, lp, n_windows=30, n_candidates=20,
    )

    # EVT: rolling VaR and ES (returns var_arr, es_arr)
    var_arr, es_arr = rolling_var_es(rets, window=window)

    # Kappa: rolling kappa (returns kappa_arr, benchmark)
    kappa_arr, kappa_bench = rolling_kappa(rets, window=window)

    # Hill: rolling tail index
    hill_arr = rolling_tail_index(rets, window=window)

    # ---- Convert to per-timestep signals ----
    composite = np.full(n, np.nan)
    comp_lppls = np.full(n, np.nan)
    comp_tc = np.full(n, np.nan)
    comp_var = np.full(n, np.nan)
    comp_kappa = np.full(n, np.nan)
    comp_hill = np.full(n, np.nan)

    for i in range(window + 1, n):
        # LPPLS signals
        c = conf_arr[i] if i < len(conf_arr) else np.nan
        tc_m = tc_mean_arr[i] if i < len(tc_mean_arr) else np.nan
        sig_lppls = lppls_confidence_signal(c) if not np.isnan(c) else 0.0

        # tc proximity: days from current time to predicted tc
        days_to_tc = tc_m - t[i] if not np.isnan(tc_m) else float("inf")
        sig_tc = tc_proximity_signal(days_to_tc)

        # VaR exceedance (rets is 1 shorter than dates)
        ret_i = rets[i - 1] if i - 1 < len(rets) else 0.0
        v = var_arr[i - 1] if i - 1 < len(var_arr) else np.nan
        sig_var = var_exceedance_signal(ret_i, v) if not np.isnan(v) else 0.0

        # Kappa regime (kappa_arr aligned with rets)
        k = kappa_arr[i - 1] if i - 1 < len(kappa_arr) else np.nan
        sig_kappa = kappa_regime_signal(k, kappa_bench) if not np.isnan(k) else 0.0

        # Hill thinning (compare to previous step)
        h = hill_arr[i - 1] if i - 1 < len(hill_arr) else np.nan
        h_prev = hill_arr[i - 2] if i - 2 >= 0 and i - 2 < len(hill_arr) else np.nan
        sig_hill = hill_thinning_signal(h, h_prev)

        # Store component signals
        comp_lppls[i] = sig_lppls
        comp_tc[i] = sig_tc
        comp_var[i] = sig_var
        comp_kappa[i] = sig_kappa
        comp_hill[i] = sig_hill

        # Aggregate
        signal = aggregate_signals({
            "lppls_confidence": sig_lppls,
            "lppls_tc_proximity": sig_tc,
            "gpd_var_exceedance": sig_var,
            "kappa_regime": sig_kappa,
            "hill_thinning": sig_hill,
        })
        composite[i] = signal.probability

    return {
        "dates": dates,
        "prices": prices,
        "composite": composite,
        "components": {
            "LPPLS Confidence": comp_lppls,
            "TC Proximity": comp_tc,
            "VaR Exceedance": comp_var,
            "Kappa Regime": comp_kappa,
            "Hill Thinning": comp_hill,
        },
    }

## 3. Run backtest on each episode

In [None]:
results = {}

for ep in crash_episodes:
    name = ep["name"]
    print(f"\nProcessing: {name}...")
    try:
        df = ep["data"].loc[ep["data_start"]:ep["plot_end"]].copy()

        if len(df) < 600:
            print(f"  WARNING: Only {len(df)} observations. Results may be unreliable.")

        result = run_pipeline(df)
        result["episode"] = ep
        results[name] = result

        # Evaluate: signal in the 30 days before the peak
        peak = pd.Timestamp(ep["peak_date"])
        pre_mask = (
            (result["dates"] >= peak - timedelta(days=30))
            & (result["dates"] <= peak)
        )
        pre_signal = result["composite"][pre_mask]
        pre_clean = pre_signal[~np.isnan(pre_signal)]

        if len(pre_clean) > 0:
            print(f"  Signal 30d before peak: "
                  f"mean={np.mean(pre_clean):.3f}, max={np.max(pre_clean):.3f}")
        else:
            print(f"  No signal data for pre-crash window.")

    except Exception as e:
        print(f"  ERROR: {e}")
        import traceback; traceback.print_exc()

## 4. Visualize all episodes

Side-by-side: price (left) and composite crash signal (right) for each
episode. The red shaded area marks the 30 days before the known peak.

In [None]:
n_episodes = len(results)
if n_episodes == 0:
    print("No episodes completed. Check errors above.")
else:
    fig, axes = plt.subplots(n_episodes, 2, figsize=(16, 4 * n_episodes))
    if n_episodes == 1:
        axes = axes.reshape(1, -1)

    for i, (name, res) in enumerate(results.items()):
        ep = res["episode"]
        peak = pd.Timestamp(ep["peak_date"])

        # Left: Price
        ax_price = axes[i, 0]
        ax_price.plot(res["dates"], res["prices"], color="steelblue", linewidth=0.8)
        ax_price.axvline(peak, color="red", linestyle="--", alpha=0.7,
                         label=f"Peak {ep['peak_date']}")
        ax_price.axvspan(peak - timedelta(days=30), peak, color="red", alpha=0.1)
        ax_price.set_ylabel("Price")
        ax_price.set_title(f"{name} - Price")
        ax_price.legend(fontsize=8)

        # Right: Composite signal
        ax_sig = axes[i, 1]
        ax_sig.fill_between(res["dates"], 0, res["composite"],
                            color="orange", alpha=0.5)
        ax_sig.axvline(peak, color="red", linestyle="--", alpha=0.7)
        ax_sig.axvspan(peak - timedelta(days=30), peak, color="red", alpha=0.1)
        ax_sig.axhline(0.5, color="gray", linestyle="--", linewidth=0.5)
        ax_sig.set_ylabel("Crash Signal")
        ax_sig.set_title(f"{name} - Composite Signal")
        ax_sig.set_ylim(0, 1)

    plt.tight_layout()
    plt.show()

## 5. Detailed component breakdown

For each crash episode, we show the price alongside each individual signal
component. This helps identify which indicators contributed to the composite signal.

In [None]:
comp_colors = {
    "LPPLS Confidence": "red",
    "TC Proximity": "purple",
    "VaR Exceedance": "orange",
    "Kappa Regime": "green",
    "Hill Thinning": "blue",
}

for name, res in results.items():
    ep = res["episode"]
    peak = pd.Timestamp(ep["peak_date"])
    components = res["components"]
    n_comp = len(components)

    fig, axes = plt.subplots(n_comp + 1, 1,
                             figsize=(14, 2.5 * (n_comp + 1)), sharex=True)
    fig.suptitle(f"{name}: Signal Component Breakdown", fontsize=14, y=1.01)

    # Price
    axes[0].plot(res["dates"], res["prices"], color="steelblue", linewidth=0.8)
    axes[0].axvline(peak, color="red", linestyle="--", alpha=0.7)
    axes[0].set_ylabel("Price")

    # Components
    for j, (comp_name, comp_values) in enumerate(components.items()):
        ax = axes[j + 1]
        color = comp_colors.get(comp_name, "gray")
        ax.fill_between(res["dates"], 0, comp_values, color=color, alpha=0.4)
        ax.axvline(peak, color="red", linestyle="--", alpha=0.7)
        ax.set_ylabel(comp_name, fontsize=9)
        ax.set_ylim(0, 1)

    plt.tight_layout()
    plt.show()

## 6. Summary scorecard

For each episode and look-back window (60d, 30d, 14d before peak),
compute the mean and max signal, plus the fraction of days above thresholds.

In [None]:
scorecard = []

for name, res in results.items():
    ep = res["episode"]
    peak = pd.Timestamp(ep["peak_date"])

    for window_days in [60, 30, 14]:
        pre_mask = (
            (res["dates"] >= peak - timedelta(days=window_days))
            & (res["dates"] <= peak)
        )
        sig = res["composite"][pre_mask]
        sig_clean = sig[~np.isnan(sig)]

        if len(sig_clean) > 0:
            scorecard.append({
                "episode": name,
                "window": f"{window_days}d before peak",
                "mean_signal": np.mean(sig_clean),
                "max_signal": np.max(sig_clean),
                "pct_above_0.3": (sig_clean > 0.3).mean(),
                "pct_above_0.5": (sig_clean > 0.5).mean(),
            })

scorecard_df = pd.DataFrame(scorecard)
scorecard_df

In [None]:
# Pivot: max signal by episode and window
if len(scorecard_df) > 0:
    pivot = scorecard_df.pivot_table(
        index="episode", columns="window", values="max_signal",
    )
    print("Maximum composite signal before each crash:")
    print()
    display(pivot.style.format("{:.3f}").background_gradient(
        cmap="RdYlGn_r", vmin=0, vmax=1))

## 7. False positive analysis

How often does the signal exceed thresholds during non-crash periods?
We use the full BTC series and compare crash zones vs. calm periods.

In [None]:
# Run pipeline on full BTC history for false positive analysis
print("Running full BTC pipeline for false positive analysis...")
full_btc = run_pipeline(data_btc)

composite = full_btc["composite"]
dates = full_btc["dates"]

# Define crash zones: +/- 60 days around known BTC peaks
btc_peaks = pd.to_datetime(["2017-12-17", "2021-04-14", "2021-11-10"])
crash_zone = np.zeros(len(dates), dtype=bool)
for peak in btc_peaks:
    mask = (dates >= peak - timedelta(days=60)) & (dates <= peak + timedelta(days=30))
    crash_zone |= mask

non_crash = composite[~crash_zone & ~np.isnan(composite)]
in_crash = composite[crash_zone & ~np.isnan(composite)]

print(f"\nSignal statistics:")
print(f"  Non-crash periods ({len(non_crash)} days):")
print(f"    Mean:     {np.mean(non_crash):.4f}")
print(f"    > 0.3:    {(non_crash > 0.3).mean()*100:.1f}% of days")
print(f"    > 0.5:    {(non_crash > 0.5).mean()*100:.1f}% of days")
if len(in_crash) > 0:
    print(f"  Crash periods ({len(in_crash)} days):")
    print(f"    Mean:     {np.mean(in_crash):.4f}")
    print(f"    > 0.3:    {(in_crash > 0.3).mean()*100:.1f}% of days")
    print(f"    > 0.5:    {(in_crash > 0.5).mean()*100:.1f}% of days")

## 8. Cross-asset comparison

Compare the signal trajectory across BTC, SPY, and Gold episodes,
aligned to days-relative-to-peak.

In [None]:
asset_groups = {
    "BTC": [n for n in results if "BTC" in n],
    "SPY": [n for n in results if "SPY" in n or "COVID" in n],
    "Gold": [n for n in results if "Gold" in n],
}

n_groups = sum(1 for v in asset_groups.values() if v)
fig, axes = plt.subplots(1, n_groups, figsize=(6 * n_groups, 5))
if n_groups == 1:
    axes = [axes]

ax_idx = 0
for asset, episode_names in asset_groups.items():
    if not episode_names:
        continue
    ax = axes[ax_idx]
    ax_idx += 1

    for ep_name in episode_names:
        if ep_name not in results:
            continue
        res = results[ep_name]
        peak = pd.Timestamp(res["episode"]["peak_date"])
        days_to_peak = (res["dates"] - peak).days
        mask = (days_to_peak >= -365) & (days_to_peak <= 90)
        ax.plot(days_to_peak[mask], res["composite"][mask],
                linewidth=0.8, alpha=0.8, label=ep_name)

    ax.axvline(0, color="red", linestyle="--", alpha=0.5, label="Peak")
    ax.axhline(0.5, color="gray", linestyle="--", linewidth=0.5)
    ax.set_xlabel("Days Relative to Peak")
    ax.set_ylabel("Composite Signal")
    ax.set_title(asset)
    ax.set_ylim(0, 1)
    ax.legend(fontsize=7)

plt.tight_layout()
plt.show()

## Summary

### Key findings
- The composite crash signal typically **rises in the 30-60 days before major peaks**.
- Crypto bubbles tend to produce the strongest LPPLS signals due to clear
  super-exponential growth.
- The 2008 SPX and 2011 Gold episodes test the system on traditional assets
  where bubble dynamics may be less pronounced.
- The COVID crash (2020) is an exogenous shock -- LPPLS performs poorly here
  since there was no bubble preceding the crash.

### Limitations
- This is an **in-sample** evaluation. True out-of-sample testing requires
  live forward deployment.
- The pipeline uses 500-day rolling windows, so the first ~2 years lack full context.
- Different assets may benefit from different weight configurations.

### Next steps
- Use `calibrate_weights` (notebook 06) on a subset of episodes, then test on
  held-out episodes for genuine out-of-sample evaluation.
- Incorporate Deep LPPLS (notebook 07) and DTCAI (notebook 10) into the pipeline.

*All numbers are in-sample on historical data. This is not financial advice.*