# CARIA — Real Data Validation (FMP)

Este notebook (Google Colab) permite validar CARIA con **datos reales** descargados desde **Financial Modeling Prep (FMP)**, ejecutando **celda por celda**.

### Qué valida
- **Entropía de forma**: \(H_{Sh,z}\) = Shannon entropy sobre **z-returns** \(z_t=(r_t-\mu_t)/\sigma_t\)
- **Sincronización**: \(S_{PLV}\) = Phase Locking Value multi-escala (wavelet → fase → PLV)
- **Surrogates** (mínimo publicable): `shuffle`, `time_shift`, `phase_randomize`, con criterio **p < 0.01**
- **Ground truth (tu definición C′)**:

\[
Crisis_{t+5}=1 \iff \sum_{m\in M} \mathbb{1}\{m_{t+1:t+5}=1\} \ge 2,\quad M=\{\text{EVT},\ \text{Drawdown},\ \text{Jump(BNS)}\}
\]

### Output
- Probabilidades por cuadrante (Q1–Q4) + **bootstrap CI**
- Gráficos: series temporales + espacio de fase

> Nota: FMP tiene límites de rate. El notebook incluye pausa entre requests.


## 0) Setup (instalación + imports)


In [None]:
# Colab: instala dependencias
# (statsmodels/yfinance/pyarrow se usan en la sección CARIA-SR estructural)
%pip install -q PyWavelets pandas numpy scipy scikit-learn requests matplotlib seaborn statsmodels yfinance pyarrow

import os
import time
import math
import numpy as np
import pandas as pd
import requests
from dataclasses import dataclass
from typing import Dict, List, Tuple, Optional

from scipy import stats
from scipy.signal import hilbert
import pywt

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")
np.random.seed(42)

print("OK")


## 1) FMP API key (ingrésala sin exponerla)

Recomendado: usar `getpass()` para que no quede guardada en el notebook.


In [None]:
from getpass import getpass

# Opción A: variable de entorno (si ya la seteaste)
FMP_API_KEY = os.environ.get("FMP_API_KEY", "").strip()

# Opción B: input seguro
if not FMP_API_KEY:
    FMP_API_KEY = getpass("Pega tu FMP_API_KEY (no se mostrará): ").strip()

assert FMP_API_KEY, "FMP_API_KEY vacío"

FMP_BASE_URL = "https://financialmodelingprep.com/api/v3"

print("✅ API key cargada")


## 2) Descarga de datos (FMP)

Notas:
- Para índices, FMP a veces no soporta `^GSPC`. Si falla, usa `SPY` como proxy del S&P 500.
- Si algún símbolo falla, el notebook sigue con los demás.


In [None]:
def fmp_get_historical(symbol: str, start: str, end: str, api_key: str) -> pd.DataFrame:
    url = f"{FMP_BASE_URL}/historical-price-full/{symbol}"
    params = {"from": start, "to": end, "apikey": api_key}
    r = requests.get(url, params=params, timeout=30)
    r.raise_for_status()
    j = r.json()
    hist = j.get("historical", [])
    if not hist:
        return pd.DataFrame()
    df = pd.DataFrame(hist)
    df["date"] = pd.to_datetime(df["date"])
    df = df.sort_values("date")
    # columnas estándar
    keep = [c for c in ["date","open","high","low","close","volume"] if c in df.columns]
    df = df[keep].copy()
    df["symbol"] = symbol
    return df

# Ajusta símbolos aquí
SYMBOLS = {
    "SP500": "SPY",   # proxy S&P500
    "VIX": "^VIX",    # puede fallar en FMP; si falla, prueba "VIXY" o "VXX"
    "TLT": "TLT",
    "GLD": "GLD",
}

# ✅ Pediste más historia: comenzamos en 2000
START_DATE = "2000-01-01"
END_DATE = pd.Timestamp.today().strftime("%Y-%m-%d")

data = {}
for name, sym in SYMBOLS.items():
    try:
        df = fmp_get_historical(sym, START_DATE, END_DATE, FMP_API_KEY)
        if df.empty:
            print(f"⚠️ {name} ({sym}): sin datos")
        else:
            data[name] = df
            print(f"✅ {name} ({sym}): {len(df)} filas, {df['date'].min().date()} → {df['date'].max().date()}")
    except Exception as e:
        print(f"❌ {name} ({sym}): {e}")
    time.sleep(0.35)  # rate limit

assert len(data) >= 1, "No se descargó ningún dataset. Revisa símbolos / API key."


## 3) Pre-procesamiento (returns + volatilidad)

- Returns: \(r_t = \log(P_t/P_{t-1})\)
- Volatilidad: rolling std (30d) anualizada


In [None]:
def add_returns_vol(df: pd.DataFrame, vol_window: int = 30) -> pd.DataFrame:
    df = df.copy()
    df = df.sort_values("date")
    df["ret"] = np.log(df["close"] / df["close"].shift(1))
    df["vol"] = df["ret"].rolling(vol_window).std() * np.sqrt(252)
    df = df.dropna().reset_index(drop=True)
    return df

proc = {k: add_returns_vol(v) for k, v in data.items()}
for k, df in proc.items():
    print(k, df.shape, df['date'].min().date(), df['date'].max().date())


## 4) Entropía de forma: \(H_{Sh,z}\)

Calculamos Shannon entropy sobre **z-returns** para eliminar confusión con amplitud:
\[
z_t = \frac{r_t-\mu_t}{\sigma_t}\quad\Rightarrow\quad H_{Sh,z}=H(\{z_t\})
\]

Usamos histograma (FD por defecto).


In [None]:
def shannon_hist(x: np.ndarray, bins='fd', normalize: bool = True) -> float:
    x = np.asarray(x).flatten()
    x = x[~np.isnan(x)]
    if len(x) < 30:
        return np.nan
    counts, _ = np.histogram(x, bins=bins, density=False)
    p = counts / max(counts.sum(), 1)
    p = p[p > 0]
    h = -np.sum(p * np.log2(p))
    if normalize:
        hmax = np.log2(len(counts)) if len(counts) > 1 else 1.0
        h = h / hmax if hmax > 0 else h
    return float(h)

def rolling_entropy_z(ret: pd.Series, window: int = 30, bins='fd') -> pd.Series:
    ret = ret.reset_index(drop=True)
    mu = ret.rolling(window).mean()
    sig = ret.rolling(window).std().replace(0, np.nan)
    z = (ret - mu) / sig

    out = pd.Series(np.nan, index=ret.index)
    for i in range(window, len(ret)):
        out.iloc[i] = shannon_hist(z.iloc[i-window+1:i+1].values, bins=bins, normalize=True)
    return out

# demo rápido
k = list(proc.keys())[0]
df0 = proc[k]
Hz = rolling_entropy_z(df0['ret'], window=30)
print(k, "Hz mean/std:", float(np.nanmean(Hz)), float(np.nanstd(Hz)))


## 5) Sincronización multi-escala: Wavelet → fase → PLV + surrogates

- Descomponemos en bandas (por periodos) con wavelets Morlet (CWT).
- Extraemos fase con Hilbert.
- Calculamos PLV promedio entre pares de bandas.
- Validamos contra surrogates (`shuffle`, `time_shift`, `phase_randomize`).

Criterio: **significativo si worst-case p < 0.01**.


In [None]:
# Nota: esta sección PLV es costosa. Para correr rápido el notebook,
# deja RUN_PLV=False y usa la sección CARIA-SR estructural más abajo.
RUN_PLV = False

BANDS = {
    "ultra_fast": (1, 5),
    "short": (5, 20),
    "medium": (20, 60),
    "long": (60, 252),
    "ultra_long": (252, 504),
}

# Use an explicit parameterized complex Morlet to avoid FutureWarning:
# format: "cmorB-C" (bandwidth B, center frequency C)
WAVELET = "cmor1.5-1.0"

def cwt_band_signal(
    x: np.ndarray,
    low_period: int,
    high_period: int,
    dt: float = 1.0,
    wavelet: str = WAVELET,
    n_scales: int = 24,
    min_scale: float = 1.0,
) -> np.ndarray:
    """Band signal = mean real(CWT) across scales matching periods [low_period, high_period].

    Fixes:
    - avoids PyWavelets "Selected scale too small" by enforcing min_scale
    - uses scale2frequency mapping (more correct than hand-made omega0 mapping)
    - uses parameterized cmor wavelet (no deprecation warning)
    """
    x = np.asarray(x).astype(float).flatten()
    x = np.nan_to_num(x, nan=0.0)

    f_low = 1.0 / float(high_period)  # lower freq edge
    f_high = 1.0 / float(low_period)  # higher freq edge
    fc = float(pywt.scale2frequency(wavelet, 1.0))

    # scale = fc / (f * dt)
    s_low = fc / (f_high * dt)
    s_high = fc / (f_low * dt)

    s_low = max(float(min_scale), float(s_low))
    s_high = max(s_low + 1e-9, float(s_high))

    scales = np.linspace(s_low, s_high, int(max(8, n_scales)))
    coef, _freqs = pywt.cwt(x, scales, wavelet, sampling_period=dt)
    return np.real(coef).mean(axis=0)

def detrend_mean(x: np.ndarray) -> np.ndarray:
    x = np.asarray(x).astype(float)
    t = np.arange(len(x))
    a, b = np.polyfit(t, x, 1)
    return x - (a*t + b) - np.mean(x)

def phase_series(x: np.ndarray) -> np.ndarray:
    x = detrend_mean(x)
    ph = np.unwrap(np.angle(hilbert(x)))
    return ph

def plv(ph1: np.ndarray, ph2: np.ndarray) -> float:
    d = ph1 - ph2
    return float(np.abs(np.mean(np.exp(1j*d))))

def plv_multiscale(price: np.ndarray, bands=BANDS) -> Tuple[float, Dict[str, np.ndarray]]:
    price = np.asarray(price).astype(float)
    # trabajar con returns para estabilidad
    r = np.diff(np.log(price))
    r = np.nan_to_num(r, nan=0.0)

    band_sig = {}
    for name, (lp, hp) in bands.items():
        band_sig[name] = cwt_band_signal(r, lp, hp)

    phases = {k: phase_series(v) for k, v in band_sig.items()}
    keys = list(phases.keys())
    vals = []
    for i in range(len(keys)):
        for j in range(i+1, len(keys)):
            vals.append(plv(phases[keys[i]], phases[keys[j]]))

    return float(np.mean(vals)), band_sig

def surrogate_series(x: np.ndarray, method: str) -> np.ndarray:
    x = np.asarray(x)
    n = len(x)
    if method == 'shuffle':
        return np.random.permutation(x)
    if method == 'time_shift':
        shift = np.random.randint(1, n)
        return np.roll(x, shift)
    if method == 'phase_randomize':
        X = np.fft.fft(x)
        amp = np.abs(X)
        ph = np.angle(X)
        rnd = ph.copy()
        if n > 2:
            rnd[1:-1] = np.random.uniform(0, 2*np.pi, n-2)
        Xs = amp * np.exp(1j*rnd)
        return np.real(np.fft.ifft(Xs))
    raise ValueError(method)

def plv_surrogate_test(price: np.ndarray, n_surrogates: int = 50, alpha: float = 0.01, methods=None) -> dict:
    if methods is None:
        methods = ['time_shift','phase_randomize','shuffle']

    obs, _ = plv_multiscale(price)
    per = {}
    p_list = []
    for m in methods:
        s_vals = []
        for _ in range(n_surrogates):
            s_price = surrogate_series(np.asarray(price), m)
            s_plv, _ = plv_multiscale(s_price)
            s_vals.append(s_plv)
        s_vals = np.asarray(s_vals)
        mu = float(np.mean(s_vals))
        sd = float(np.std(s_vals))
        z = (obs - mu) / sd if sd > 0 else (np.inf if obs > mu else -np.inf)
        p = float(1 - stats.norm.cdf(z))  # one-tailed
        per[m] = {'mean': mu, 'std': sd, 'p': p, 'z': float(z), 'sig': p < alpha}
        p_list.append(p)

    worst_p = float(np.max(p_list))
    return {
        'observed_plv': obs,
        'per_method': per,
        'worst_p': worst_p,
        'significant_all': all(per[m]['sig'] for m in per)
    }

# quick sanity check (opcional)
if RUN_PLV:
    k = list(proc.keys())[0]
    df0 = proc[k]
    res = plv_surrogate_test(df0['close'].values, n_surrogates=25, alpha=0.01)
    print('Observed PLV:', res['observed_plv'])
    print('Worst p:', res['worst_p'])
    print('Significant ALL?:', res['significant_all'])
    print('Per-method:', {m: round(v['p'],4) for m,v in res['per_method'].items()})
else:
    print('RUN_PLV=False → saltando sanity check PLV')


## 6) Ground truth \(Crisis_{t+5}\) (definición C′)

Construimos 3 detectores base:
- **EVT (cola)**: evento extremo en retornos (VaR 1% por defecto)
- **Drawdown estructural**: drawdown bajo umbral (ej. -15%)
- **Jump (BNS)**: test de saltos (Barndorff-Nielsen & Shephard)

Definición:
- Para cada método \(m\): \(m_{t+1:t+5}=1\) si hay **al menos 1** evento en \(t+1..t+5\)
- \(Crisis_{t+5}=1\) si **≥ 2 de 3** métodos disparan en ese horizonte

\[
Crisis_{t+5}=1 \iff \sum_{m\in\{EVT,Drawdown,Jump\}} \mathbb{1}\{m_{t+1:t+5}=1\} \ge 2
\]



In [None]:
def evt_tail_events(ret: pd.Series, q: float = 0.01) -> pd.Series:
    """Evento EVT simple: ret < quantile(q)."""
    thr = ret.quantile(q)
    return (ret < thr).astype(int)

def drawdown_events(close: pd.Series, dd_threshold: float = -0.15) -> pd.Series:
    """Evento drawdown: drawdown <= dd_threshold."""
    peak = close.cummax()
    dd = (close / peak) - 1.0
    return (dd <= dd_threshold).astype(int)

def bipower_variation(ret: pd.Series, window: int = 30) -> pd.Series:
    r = ret.dropna()
    abs_r = r.abs()
    mu1 = np.sqrt(2/np.pi)
    scaling = 1/(mu1**2)
    contrib = abs_r * abs_r.shift(1)
    return scaling * contrib.rolling(window).sum()

def bns_jump_events(ret: pd.Series, window: int = 30, alpha: float = 0.01) -> pd.Series:
    """BNS jump test (simplificado) → evento jump si Z > z_alpha."""
    r = ret.dropna()
    rv = (r**2).rolling(window).sum()
    bv = bipower_variation(r, window)

    mu1 = np.sqrt(2/np.pi)
    theta = (np.pi**2/4 + np.pi - 5) * (mu1 ** -4)
    abs_r = r.abs()
    qp = (abs_r ** (4/3)).rolling(window).sum() ** 3
    var_est = theta * np.maximum(qp - bv**2, 1e-12)

    z = (rv - bv) / np.sqrt(var_est)
    z_crit = stats.norm.ppf(1 - alpha)
    return (z > z_crit).astype(int).reindex(ret.index).fillna(0).astype(int)

def forward_any(x: pd.Series, horizon: int = 5) -> pd.Series:
    """Devuelve 1 en t si hay algún 1 en t+1..t+horizon."""
    arr = x.values
    out = np.zeros_like(arr)
    for i in range(len(arr)):
        j0 = i+1
        j1 = min(len(arr), i+1+horizon)
        out[i] = 1 if (j0 < j1 and arr[j0:j1].max() > 0) else 0
    return pd.Series(out, index=x.index)

def crisis_ground_truth(ret: pd.Series, close: pd.Series, horizon: int = 5) -> pd.Series:
    """Tu definición C′: ≥2 de {EVT, Drawdown, Jump} en t+1..t+5."""
    evt = evt_tail_events(ret, q=0.01)
    dd = drawdown_events(close, dd_threshold=-0.15)
    jump = bns_jump_events(ret, window=30, alpha=0.01)

    evt_f = forward_any(evt, horizon)
    dd_f = forward_any(dd, horizon)
    jump_f = forward_any(jump, horizon)

    s = evt_f + dd_f + jump_f
    return (s >= 2).astype(int)

# demo en un activo
k = list(proc.keys())[0]
df0 = proc[k].copy()
ret = df0['ret']
close = df0['close']
cr = crisis_ground_truth(ret, close, horizon=5)
print(k, 'crisis positives:', int(cr.sum()), 'out of', len(cr))


## 7) Pipeline por activo: \(H_{Sh,z}\), \(S_{PLV}\), crisis ground truth

Para acelerar, calculamos sincronización en modo rolling con `step` y `n_surrogates` moderados.

Puedes subir rigor aumentando:
- `n_surrogates`
- reduciendo `step`
- ampliando lookback de sincronización


In [None]:
def rolling_plv(price: np.ndarray, lookback: int = 252, step: int = 10, n_surrogates: int = 25, alpha: float = 0.01) -> Tuple[pd.Series, pd.Series]:
    """Devuelve (S_plv, p_worst) alineados al índice de prices (longitud len(price))."""
    n = len(price)
    s = np.full(n, np.nan)
    p = np.full(n, np.nan)
    for i in range(lookback, n, step):
        w = price[i-lookback:i+1]
        out = plv_surrogate_test(w, n_surrogates=n_surrogates, alpha=alpha)
        s[i] = out['observed_plv']
        p[i] = out['worst_p']
        if i+step < n:
            s[i:i+step] = s[i]
            p[i:i+step] = p[i]
    s = pd.Series(s).ffill().bfill()
    p = pd.Series(p).ffill().bfill()
    return s, p

results = {}

for name, df in proc.items():
    print(f"\n=== {name} ===")
    Hz = rolling_entropy_z(df['ret'], window=30)
    crisis = crisis_ground_truth(df['ret'], df['close'], horizon=5)

    # sincronización rolling (MUY costoso). Por defecto, se salta.
    if RUN_PLV:
        s_plv, p_worst = rolling_plv(df['close'].values, lookback=252, step=10, n_surrogates=20, alpha=0.01)
    else:
        s_plv = pd.Series(np.nan, index=np.arange(len(df)))
        p_worst = pd.Series(np.nan, index=np.arange(len(df)))

    out = df[['date','close','ret','vol']].copy().reset_index(drop=True)
    out['Hz'] = Hz.values
    out['S_plv'] = s_plv.values
    out['p_worst'] = p_worst.values
    out['crisis_t5'] = crisis.values

    # stats
    print('Hz mean/std:', float(np.nanmean(out['Hz'])), float(np.nanstd(out['Hz'])))
    print('S_plv mean/std:', float(np.nanmean(out['S_plv'])), float(np.nanstd(out['S_plv'])))
    print('Signif windows (%):', float((out['p_worst'] < 0.01).mean()*100))
    print('Crisis_t5 positives:', int(out['crisis_t5'].sum()))

    results[name] = out

list(results.keys())


## 8) Phase Space + Super-Criticality (Q1–Q4) con bootstrap CI

Definimos umbrales por medianas:
- High/Low entropy: vs mediana de \(H_{Sh,z}\)
- High/Low sync: vs mediana de \(S_{PLV}\)

Luego estimamos:
\[
P(Crisis_{t+5}=1\mid Q_k)
\]

Y añadimos **bootstrap CI (95%)**.


In [None]:
def quadrant_probs(df: pd.DataFrame, hz_col='Hz', s_col='S_plv', y_col='crisis_t5'):
    d = df[[hz_col, s_col, y_col]].dropna().copy()
    h_thr = d[hz_col].median()
    s_thr = d[s_col].median()

    q = pd.Series(index=d.index, dtype=str)
    q[(d[hz_col] >= h_thr) & (d[s_col] < s_thr)] = 'Q1'
    q[(d[hz_col] >= h_thr) & (d[s_col] >= s_thr)] = 'Q2'
    q[(d[hz_col] < h_thr) & (d[s_col] >= s_thr)] = 'Q3'
    q[(d[hz_col] < h_thr) & (d[s_col] < s_thr)] = 'Q4'

    out = {}
    for qq in ['Q1','Q2','Q3','Q4']:
        m = (q == qq)
        n = int(m.sum())
        k = int(d.loc[m, y_col].sum())
        p = k/n if n>0 else np.nan
        out[qq] = {'p': p, 'n': n, 'k': k}
    return out, q, (h_thr, s_thr)

def bootstrap_ci(df: pd.DataFrame, qmask: np.ndarray, y: np.ndarray, B: int = 500, alpha: float = 0.05):
    idx = np.where(qmask)[0]
    if len(idx) < 20:
        return (np.nan, np.nan)
    ps = []
    for _ in range(B):
        s = np.random.choice(idx, size=len(idx), replace=True)
        ps.append(float(y[s].mean()))
    lo = float(np.quantile(ps, alpha/2))
    hi = float(np.quantile(ps, 1-alpha/2))
    return lo, hi

for name, df in results.items():
    print(f"\n=== Quadrants: {name} ===")
    probs, q, (h_thr, s_thr) = quadrant_probs(df)
    print('thresholds:', 'Hz=', round(h_thr,4), 'S=', round(s_thr,4))

    d = df[['Hz','S_plv','crisis_t5']].dropna().copy()
    y = d['crisis_t5'].values
    # rebuild q for aligned d
    probs, q, _ = quadrant_probs(df)
    
    for qq in ['Q1','Q2','Q3','Q4']:
        # recompute mask on d
        h_thr = d['Hz'].median(); s_thr = d['S_plv'].median()
        if qq=='Q1': m = (d['Hz']>=h_thr) & (d['S_plv']<s_thr)
        if qq=='Q2': m = (d['Hz']>=h_thr) & (d['S_plv']>=s_thr)
        if qq=='Q3': m = (d['Hz']<h_thr) & (d['S_plv']>=s_thr)
        if qq=='Q4': m = (d['Hz']<h_thr) & (d['S_plv']<s_thr)
        p = float(d.loc[m,'crisis_t5'].mean()) if m.sum()>0 else np.nan
        lo, hi = bootstrap_ci(d, m.values, d['crisis_t5'].values, B=300)
        print(f"{qq}: p={p:.4f}  n={int(m.sum())}  CI95=[{lo:.4f},{hi:.4f}]")


## 9) Gráficos (ejemplo: SP500)

Ajusta `asset` si quieres graficar otro.


In [None]:
asset = 'SP500' if 'SP500' in results else list(results.keys())[0]
df = results[asset].dropna().copy()

fig, ax = plt.subplots(4,1, figsize=(14,10), sharex=True)
ax[0].plot(df['date'], df['close'], color='black'); ax[0].set_title(f"{asset} price")
ax[1].plot(df['date'], df['Hz'], color='purple'); ax[1].set_title("H_{Sh,z}")
ax[2].plot(df['date'], df['S_plv'], color='green'); ax[2].set_title("S_PLV")
ax[3].plot(df['date'], df['crisis_t5'], color='red'); ax[3].set_title("Crisis_{t+5} (ground truth)")
plt.tight_layout(); plt.show()

# Phase space
plt.figure(figsize=(8,6))
plt.scatter(df['Hz'], df['S_plv'], c=df['crisis_t5'], cmap='coolwarm', s=10, alpha=0.6)
plt.xlabel('H_{Sh,z}'); plt.ylabel('S_PLV'); plt.title(f"Phase space: {asset}")
plt.show()


## 10) Hysteresis / Path-dependence tests (your claim)

Vamos a probar formalmente tres cosas:

### (A) Phase-space trajectory + loop area
Calculamos el área firmada (trapezoidal) en el plano 
\((H_t,S_t)\) alrededor de eventos:

\[
A=\sum_{t=t_0}^{t_1-1}(H_{t+1}-H_t)\,\frac{S_{t+1}+S_t}{2}
\]

Si hay histéresis consistente, la distribución de \(A\) alrededor de eventos debería ser sistemáticamente distinta de 0.

### (B) Event-aligned curves (pre vs post)
Promediamos \(H\) y \(S\) en tiempo-evento \(k\in[-K,K]\) y medimos asimetría pre/post.

### (C) “Release-before-rupture”
Testea tu hipótesis específica:
- peak local de \(S\) con \(H\) bajo
- luego \(S\) cae (release)
- y después ocurre \(Crisis_{t+5}=1\)

**Controles:**
- Usar solo puntos con \(p_{worst}<0.01\) (ventanas con coupling significativo)
- Repetir con \(S_\perp\) (residualizando \(S\) contra volatilidad \(\sigma\))


In [None]:
from numpy.linalg import lstsq

def event_starts_from_groundtruth(crisis_t5: pd.Series, min_gap: int = 30) -> List[int]:
    """Indices where crisis_t5 flips 0→1, with a minimum gap between events."""
    x = crisis_t5.values.astype(int)
    starts = np.where((x[1:] == 1) & (x[:-1] == 0))[0] + 1
    if len(starts) == 0:
        return []
    filtered = [int(starts[0])]
    for s in starts[1:]:
        if int(s) - filtered[-1] >= min_gap:
            filtered.append(int(s))
    return filtered

def loop_area(H: np.ndarray, S: np.ndarray) -> float:
    """Signed area proxy via trapezoids in HS plane."""
    H = np.asarray(H); S = np.asarray(S)
    dH = np.diff(H)
    Sm = (S[1:] + S[:-1]) / 2.0
    return float(np.nansum(dH * Sm))

def residualize_S_against_vol(S: pd.Series, vol: pd.Series) -> pd.Series:
    """S_perp = S - beta*vol (simple control)."""
    d = pd.DataFrame({'S': S, 'vol': vol}).dropna()
    if len(d) < 50:
        return S * np.nan
    X = np.column_stack([np.ones(len(d)), d['vol'].values])
    y = d['S'].values
    beta, *_ = lstsq(X, y, rcond=None)
    yhat = X @ beta
    resid = y - yhat
    out = pd.Series(index=S.index, dtype=float)
    out.loc[d.index] = resid
    return out

def event_aligned_means(series: pd.Series, events: List[int], K: int = 60) -> Tuple[np.ndarray, np.ndarray]:
    """Return k-grid and mean curve for k in [-K,K]."""
    vals = []
    arr = series.values
    for e in events:
        if e-K < 0 or e+K >= len(arr):
            continue
        vals.append(arr[e-K:e+K+1])
    if len(vals) == 0:
        return np.arange(-K, K+1), np.full(2*K+1, np.nan)
    M = np.vstack(vals)
    return np.arange(-K, K+1), np.nanmean(M, axis=0)

def asymmetry(curve: np.ndarray, K: int) -> float:
    """Delta = sum_{k=1..K} post - sum_{k=1..K} pre."""
    pre = np.nansum(curve[:K])
    post = np.nansum(curve[K+1:])
    return float(post - pre)

def local_peaks(x: np.ndarray, m: int = 10) -> np.ndarray:
    """Boolean mask of local maxima in +/- m window."""
    x = np.asarray(x)
    peaks = np.zeros_like(x, dtype=bool)
    for i in range(m, len(x)-m):
        w = x[i-m:i+m+1]
        if np.isfinite(x[i]) and x[i] == np.nanmax(w):
            peaks[i] = True
    return peaks

def release_before_rupture(df: pd.DataFrame, m: int = 10, qS: float = 0.80, qH: float = 0.40, L: int = 20, delta: float = 0.02,
                           use_resid_S: bool = False, require_sig: bool = True) -> Dict:
    """Test P(Crisis|Peak(S)&H low&Release) vs baseline."""
    d = df[['Hz','S_plv','p_worst','vol','crisis_t5']].dropna().copy()
    if len(d) < 500:
        return {'n': 0}

    S = d['S_plv'].values
    if use_resid_S:
        S_perp = residualize_S_against_vol(d['S_plv'], d['vol']).loc[d.index].values
        S = S_perp

    H = d['Hz'].values
    Y = d['crisis_t5'].values.astype(int)

    if require_sig:
        sig_mask = (d['p_worst'].values < 0.01)
    else:
        sig_mask = np.ones(len(d), dtype=bool)

    S_thr = np.nanquantile(S[sig_mask], qS)
    H_thr = np.nanquantile(H[sig_mask], qH)

    pk = local_peaks(S, m=m)
    lowH = H < H_thr
    highS = S > S_thr

    # release: within L days, S drops by at least delta
    rel = np.zeros(len(S), dtype=bool)
    for i in range(len(S)):
        j1 = min(len(S), i+L+1)
        if i+1 < j1 and np.nanmin(S[i+1:j1]) < (S[i] - delta):
            rel[i] = True

    cond = pk & lowH & highS & rel & sig_mask

    baseline = float(np.mean(Y))
    p_cond = float(np.mean(Y[cond])) if cond.sum() > 0 else np.nan

    return {
        'baseline': baseline,
        'p_cond': p_cond,
        'n_cond': int(cond.sum()),
        'S_thr': float(S_thr),
        'H_thr': float(H_thr)
    }

# Run hysteresis suite for SP500
asset = 'SP500' if 'SP500' in results else list(results.keys())[0]
df = results[asset].dropna().reset_index(drop=True)

# Use only significant windows if you want (p_worst < 0.01)
mask_sig = (df['p_worst'] < 0.01)
print('Significant windows %:', float(mask_sig.mean()*100))

# Define events from Crisis_t5
events = event_starts_from_groundtruth(df['crisis_t5'], min_gap=30)
print('Event starts:', len(events))

# Loop areas around each event
K = 60
areas = []
for e in events:
    if e-K < 0 or e+K >= len(df):
        continue
    seg = df.iloc[e-K:e+K+1]
    areas.append(loop_area(seg['Hz'].values, seg['S_plv'].values))

areas = np.asarray(areas)
print('Loop area A: mean=', float(np.nanmean(areas)), 'median=', float(np.nanmedian(areas)), 'n=', int(np.isfinite(areas).sum()))

# Event-aligned curves
kgrid, Hbar = event_aligned_means(df['Hz'], events, K=K)
kgrid, Sbar = event_aligned_means(df['S_plv'], events, K=K)

print('Asymmetry ΔH:', asymmetry(Hbar, K))
print('Asymmetry ΔS:', asymmetry(Sbar, K))

plt.figure(figsize=(12,4))
plt.plot(kgrid, Hbar, label='H (mean)')
plt.axvline(0, color='k', linestyle='--')
plt.title(f'Event-aligned H around Crisis_{t+5} starts — {asset}')
plt.legend(); plt.show()

plt.figure(figsize=(12,4))
plt.plot(kgrid, Sbar, label='S_PLV (mean)', color='green')
plt.axvline(0, color='k', linestyle='--')
plt.title(f'Event-aligned S_PLV around Crisis_{t+5} starts — {asset}')
plt.legend(); plt.show()

# Release-before-rupture test
r1 = release_before_rupture(df, use_resid_S=False, require_sig=True)
r2 = release_before_rupture(df, use_resid_S=True, require_sig=True)
print('\nRelease-before-rupture (raw S):', r1)
print('Release-before-rupture (S residualized vs vol):', r2)


## 11) CARIA-SR (Structural): sincronización espectral cross-sectional + memoria (histeresis)

**Motivación:** PLV multi-escala dentro de una sola serie puede ser frágil en datos diarios. Para capturar la idea de *crowding / colapso de grados de libertad* de forma más robusta, medimos la estructura de la **matriz de correlación** entre muchos activos.

### Métricas
- **Absorption Ratio (AR)**: fracción de varianza explicada por los primeros componentes (eigenvalues)
- **Eigen-Entropy**: diversidad estructural de los eigenvalues
- **CARIA-SR**: combinación estandarizada de **AR alto** + **(1 − entropy) alto**
- **Peak Memory**: máximo rolling de CARIA-SR en horizonte H (plasticidad)

### Deep Calm
Usamos **VIX** para definir calma: `VIX < 15/18/20` y evaluamos si Peak Memory explica cola izquierda futura.



# === 11A) Load cross-sectional SP500 panel (local parquet) + VIX ===
import os
import numpy as np
import pandas as pd
import yfinance as yf

DATA_PATH = "data/sp500_universe_fmp.parquet"  # repo path; in Colab upload/mount to match

if not os.path.exists(DATA_PATH):
    raise FileNotFoundError(
        f"No encuentro {DATA_PATH}.\n"
        "En Colab: sube el parquet a /content/data/ o monta Google Drive y ajusta DATA_PATH."
    )

# Load wide panel: index=date, columns=tickers (prices)
px = pd.read_parquet(DATA_PATH)
if "date" in px.columns:
    px["date"] = pd.to_datetime(px["date"])
    px = px.set_index("date")
px.index = pd.to_datetime(px.index)
px = px.sort_index()

# Use 2000+
px = px.loc["2000-01-01":].copy()

# Log-returns
ret = np.log(px).diff()

# Stable universe selection (coverage)
coverage = 1.0 - ret.isna().mean()
COVERAGE_MIN = 0.90
keep = coverage[coverage >= COVERAGE_MIN].index.tolist()
ret = ret[keep]

print("Panel dates:", ret.index.min().date(), "→", ret.index.max().date())
print("Universe size (coverage>=", COVERAGE_MIN, "):", ret.shape[1])

# VIX from yfinance (robust)
START = str(ret.index.min().date())
END = str(ret.index.max().date() + pd.Timedelta(days=1))
vix_df = yf.download("^VIX", start=START, end=END, progress=False, auto_adjust=False)
if vix_df.empty:
    raise RuntimeError("No pude bajar ^VIX con yfinance.")
vix = vix_df["Adj Close"].rename("VIX").dropna()

# Align indices
common_idx = ret.index.intersection(vix.index)
ret_cs = ret.loc[common_idx].copy()
vix_cs = vix.loc[common_idx].copy()

print("Aligned:", ret_cs.shape, "VIX:", vix_cs.shape, "from", common_idx.min().date(), "to", common_idx.max().date())

# Target series for tail-risk (prefer SPY if available)
if "proc" in globals() and "SP500" in proc:
    spy = proc["SP500"].copy().set_index("date").sort_index()
    spy_close = spy["close"].reindex(common_idx).astype(float)
    spy_ret = np.log(spy_close).diff().rename("spy_ret")
    print("Using SPY from FMP download as target")
else:
    spy_df = yf.download("SPY", start=START, end=END, progress=False, auto_adjust=False)
    spy_close = spy_df["Adj Close"].rename("close").reindex(common_idx).astype(float)
    spy_ret = np.log(spy_close).diff().rename("spy_ret")
    print("Using SPY from yfinance as target")



# === 11B) CARIA-SR structural metrics + Peak Memory + Deep Calm robustness ===
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")

def eig_metrics_from_corr(C: np.ndarray, k_frac: float = 0.2):
    w = np.linalg.eigvalsh(C)
    w = np.sort(w)[::-1]
    n = len(w)
    k = max(1, int(np.ceil(k_frac * n)))
    ar = float(np.sum(w[:k]) / np.sum(w))
    p = w / np.sum(w)
    p = p[p > 0]
    ent = -np.sum(p * np.log(p))
    ent_norm = float(ent / np.log(n)) if n > 1 else np.nan
    return ar, ent_norm

def rolling_structural_metrics(ret_mat: pd.DataFrame, window: int = 252, k_frac: float = 0.2, min_assets: int = 120, step: int = 5):
    idx = ret_mat.index
    out = pd.DataFrame(index=idx, columns=["AR", "E_eig", "N_assets"], dtype=float)

    for t in range(window, len(idx), step):
        W = ret_mat.iloc[t-window+1:t+1]
        good = W.notna().mean() >= 0.90
        W = W.loc[:, good]
        if W.shape[1] < min_assets:
            continue
        W = W.apply(lambda s: s.fillna(s.mean()), axis=0)
        C = np.corrcoef(W.values, rowvar=False)
        C = np.nan_to_num(C, nan=0.0, posinf=0.0, neginf=0.0)
        ar, ent = eig_metrics_from_corr(C, k_frac=k_frac)
        out.iloc[t] = [ar, ent, W.shape[1]]

    # forward-fill to daily index
    out = out.ffill().bfill()
    return out

def zscore(s: pd.Series, w: int = 252):
    mu = s.rolling(w).mean()
    sd = s.rolling(w).std().replace(0, np.nan)
    return (s - mu) / sd

def forward_min(x: pd.Series, h: int = 22) -> pd.Series:
    a = x.to_numpy()
    out = np.full(len(a), np.nan)
    for i in range(len(a)):
        j0, j1 = i+1, min(len(a), i+1+h)
        if j0 < j1:
            out[i] = np.nanmin(a[j0:j1])
    return pd.Series(out, index=x.index)

# Compute structural metrics (fast via step)
struct = rolling_structural_metrics(ret_cs, window=252, k_frac=0.2, min_assets=120, step=5)
struct["AR_z"] = zscore(struct["AR"], 252)
struct["E_low_z"] = zscore(1.0 - struct["E_eig"], 252)
struct["CARIA_SR"] = struct["AR_z"] + struct["E_low_z"]

# Peak memory horizons
H_list = [20, 40, 60, 90, 120]
for H in H_list:
    struct[f"Peak{H}"] = struct["CARIA_SR"].rolling(H).max()

# Target outcomes on SPY
spy_ret_al = spy_ret.reindex(struct.index).astype(float)
spy_close_al = spy_close.reindex(struct.index).astype(float)

y = forward_min(spy_ret_al, h=22).rename("future_min_22d")
crisis_t5 = crisis_ground_truth(spy_ret_al, spy_close_al, horizon=5).rename("crisis_t5")

# Plot Peak60 vs VIX (quick sanity)
plt.figure(figsize=(14,5))
ax1 = plt.gca()
ax1.plot(struct.index, struct["Peak60"], color="crimson", lw=2, label="Peak60 (Caria-SR memory)")
ax1.set_ylabel("Peak60 (z units)")
ax1.legend(loc="upper left")
ax2 = ax1.twinx()
ax2.plot(vix_cs.index, vix_cs.values, color="grey", alpha=0.5, lw=1.5, label="VIX")
ax2.axhline(15, color="grey", ls="--", lw=1)
ax2.axhline(18, color="grey", ls=":", lw=1)
ax2.axhline(20, color="grey", ls="-.", lw=1)
ax2.set_ylabel("VIX")
plt.title("Structural Memory (Peak60) vs VIX")
plt.show()

# Δq05 heatmap: HIGH PeakMemory (top20%) − LOW
thr_list = [15, 18, 20, 22, 25]
heat_q = pd.DataFrame(index=H_list, columns=thr_list, dtype=float)
heat_n = pd.DataFrame(index=H_list, columns=thr_list, dtype=float)

for H in H_list:
    peakH = struct[f"Peak{H}"]
    for thr in thr_list:
        calm = (vix_cs < thr).reindex(struct.index).fillna(False)
        df = pd.concat([y, peakH.rename("peak")], axis=1).loc[calm].dropna()
        heat_n.loc[H, thr] = float(len(df))
        if len(df) < 400:
            heat_q.loc[H, thr] = np.nan
            continue
        cut = df["peak"].quantile(0.80)
        hi = df[df["peak"] >= cut]["future_min_22d"]
        lo = df[df["peak"] < cut]["future_min_22d"]
        heat_q.loc[H, thr] = float(hi.quantile(0.05) - lo.quantile(0.05))

plt.figure(figsize=(10,5))
sns.heatmap(heat_q.astype(float), annot=True, fmt=".4f", cmap="RdYlGn", center=0)
plt.title("Δq05(future_min_22d): HIGH PeakMemory − LOW (top20% vs rest)\nNegative = worse tail under structural memory")
plt.xlabel("Deep Calm Threshold (VIX < X)")
plt.ylabel("Memory Window H (days)")
plt.show()

print("n per cell (Deep Calm samples):")
display(heat_n)

# Optional: crisis probability difference in Deep Calm for Peak60 high vs low
for thr in [15, 18, 20]:
    calm = (vix_cs < thr).reindex(struct.index).fillna(False)
    df = pd.concat([crisis_t5, struct["Peak60"].rename("peak")], axis=1).loc[calm].dropna()
    if len(df) < 400:
        continue
    cut = df["peak"].quantile(0.80)
    p_hi = float(df[df["peak"] >= cut]["crisis_t5"].mean())
    p_lo = float(df[df["peak"] < cut]["crisis_t5"].mean())
    print(f"Deep Calm VIX<{thr}: P(crisis_t+5) highPeak={p_hi:.4f} lowPeak={p_lo:.4f} Δ={p_hi-p_lo:+.4f}")

