<a href="https://colab.research.google.com/github/tousifo/ml_notebooks/blob/main/ALS_QNN_PRO_ACT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Cell 1 — Imports & setup for the ALS QNN notebook

**Purpose:** bring in the sklearn utilities, pin/install Qiskit, and import the quantum pieces used by a variational regressor.

### What’s in this cell
- **Data utilities:** `train_test_split`, `MinMaxScaler`
- **Metrics:** `mean_squared_error`, `pearsonr`
- **Install (pinned):** `%pip install qiskit~=1.0 qiskit-machine-learning~=0.8.1 qiskit_algorithms`
- **Quantum stack:** `ZZFeatureMap`, `RealAmplitudes`, `COBYLA`, `VQR`, `Sampler`

> **Note:** If the install line complains about `qiskit_algorithms`, the PyPI package name is often `qiskit-algorithms` (hyphen). We’re not changing your code here—this is just a heads-up.

### Why scale features?
Angle encoders work better when inputs live in a tight range; min–max scaling avoids angle wraparound and makes training steadier.

<details>
<summary>Quick I/O expectations</summary>
After preprocessing later:
- Feature vectors will be scaled to a small range (often [0, 1]).
- The `ZZFeatureMap(feature_dimension=...)` should match your final number of features.
</details>

---


In [1]:
from sklearn.model_selection import train_test_split  # quick train/validation split
from sklearn.preprocessing import MinMaxScaler        # keep features in a compact range for angle encoding
from sklearn.metrics import mean_squared_error        # regression loss (lower is better)
from scipy.stats import pearsonr                      # correlation between predictions and targets (closer to 1 is better)
%pip install qiskit~=1.0 qiskit-machine-learning~=0.8.1 qiskit_algorithms  # pinned install; if it fails, try 'qiskit-algorithms' manually

# Qiskit Imports
from qiskit.circuit.library import ZZFeatureMap, RealAmplitudes  # feature map + ansatz for the variational circuit
from qiskit_algorithms.optimizers import COBYLA                  # gradient-free optimizer suited to noisy objectives
from qiskit_machine_learning.algorithms.regressors import VQR    # variational quantum regressor wrapper
from qiskit.primitives import Sampler                            # primitive that evaluates circuits (shot-based)

Collecting qiskit~=1.0
  Downloading qiskit-1.4.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting qiskit-machine-learning~=0.8.1
  Downloading qiskit_machine_learning-0.8.4-py3-none-any.whl.metadata (13 kB)
Collecting qiskit_algorithms
  Downloading qiskit_algorithms-0.4.0-py3-none-any.whl.metadata (4.7 kB)
Collecting rustworkx>=0.15.0 (from qiskit~=1.0)
  Downloading rustworkx-0.17.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting stevedore>=3.0.0 (from qiskit~=1.0)
  Downloading stevedore-5.5.0-py3-none-any.whl.metadata (2.2 kB)
Collecting symengine<0.14,>=0.11 (from qiskit~=1.0)
  Downloading symengine-0.13.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.2 kB)
Collecting scipy>=1.5 (from qiskit~=1.0)
  Downloading scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.0/62.0 kB[0m [31

## Cell 2 — CV-safe PRO-ACT preprocessing (`ALSDataProcessor`)

**Purpose:** Turn the raw PRO-ACT CSVs into a clean, cross-validation-safe feature matrix `X` and target `y` (ALSFRS Total slope), then save a single file you can feed into any model (classical or quantum).

### What this cell does
- Anchors timelines to each subject’s **first ALSFRS visit** (t = 0).
- Builds features from the **first 0–90 days** after that anchor across longitudinal tables (ALSFRS, FVC, vitals, labs, grip, muscle).
- Harmonizes **ALSFRS-R** subitems when present (Q10 from 10a; merges Q5a/Q5b).
- Collapses **FVC trials** to the **max per test time** before summarizing.
- Creates **7 summaries** per signal: `min, max, median, std, first, last, slope`.
- Drops columns with **>30% missing**, but does **no scaling/encoding/imputation** here (to avoid leakage).
- Keeps only **eligible subjects**: have an ALSFRS measure **>3 months** and **>12 months** after anchor.
- Saves `final_processed_als_data.csv` and returns a dict with `X`, `y`, `subject_ids`, and the raw joined frame.

### Expected inputs (filenames)
- `PROACT_ALSFRS.csv` *(required)*, plus any of:  
  `PROACT_FVC.csv`, `PROACT_VITALSIGNS.csv`, `PROACT_RILUZOLE.csv`, `PROACT_DEMOGRAPHICS.csv`,  
  `PROACT_LABS.csv`, `PROACT_DEATHDATA.csv`, `PROACT_HANDGRIPSTRENGTH.csv`,  
  `PROACT_MUSCLESTRENGTH.csv`, `PROACT_ALSHISTORY.csv`.

> **Key columns the code looks for**
> - `subject_id` (auto-detected if a similar name exists)
> - Any time field containing **“delta”** or **“day”** (used as days since that table’s baseline)
> - In ALSFRS: `ALSFRS_Total` (numeric)

### Outputs
- **File:** `final_processed_als_data.csv`  
- **Return (dict):**  
  - `X` → features (after 30% missing filter)  
  - `y` → `alsfrs_slope` (per-subject target)  
  - `subject_ids` → subject mapping  
  - `raw_frame` → `[subject_id, alsfrs_slope, X...]` (what gets saved)

> **Windowing:** all longitudinal features are computed over **0–90 days** *from the ALSFRS anchor*.  
> **Slope target:** computed between the **first point after 3 months** and the **first point after 12 months**.

### Gotchas / tips
- If a table has **no time column**, it’s skipped with a warning.
- **Pivoting** of wide/long lab/grip/muscle tables is “best effort” by name patterns; if that fails, it continues safely.
- **No encodings** are done here. Do one-hot/ordinal and scaling **inside your CV folds**.
- If your time unit isn’t in days, ensure the column names include `day`/`delta` or rename before loading.

<details>
<summary>How features are built (quick recipe)</summary>

1) **Anchor** each subject’s rows to the first ALSFRS visit (t=0).  
2) **Align** other tables to ALSFRS time by subtracting each subject’s ALSFRS anchor day.  
3) **Restrict** to days **0–90**.  
4) For every numeric signal per table, compute:  
   `min, max, median, std (ddof=0), first, last, slope(first→last over time)`  
   *(slope is NaN if only one observation or zero time span).*  
5) **Merge** all per-table feature blocks on `subject_id`.
</details>

<details>
<summary>Eligibility logic for y</summary>
A subject is kept if they have **any** ALSFRS measure strictly **>3.0 months** *and* strictly **>12.0 months** after the anchor.  
The slope target uses the very first record after 3 months and the very first after 12 months.
</details>

---

In [2]:
import pandas as pd
import numpy as np
import warnings
from typing import Dict, Optional, List

warnings.filterwarnings("ignore")  # keep notebook output tidy; data has mixed types/old columns


class ALSDataProcessor:
    """
    CV-safe preprocessing for PRO-ACT to reproduce the paper's EDA:
      - Anchor to FIRST ALSFRS visit (t=0) per subject
      - Inputs: first 3 months (0–90 days from anchor) for all longitudinal tables
      - Outcome: ALSFRS Total slope between FIRST-after-3mo and FIRST-after-12mo
      - ALSFRS-R harmonization hooks (Q10 from 10a; merge Q5a/Q5b)
      - FVC reduced to max-of-trials per test before summarization
      - Seven summaries: min, max, median, std, first, last, slope (slope=NaN if only 1 obs)
      - Drop features with >30% missing (no other transforms here — avoid leakage)
    """

    def __init__(self):
        # identifiers/time-like columns we should NOT summarize as numeric features
        self.id_and_delta_cols = {
            "subject_id",
            "alsfrs_delta",
            "fvc_delta",
            "vitals_delta",
            "labs_delta",
            "grip_delta",
            "muscle_delta",
            "onset_delta",
            "death_delta",
            "history_delta",
            "anchor_days",
            "days_from_alsfrs_anchor",
        }

    # --------- Utilities ---------

    @staticmethod
    def _find_time_col(df: pd.DataFrame) -> Optional[str]:
        """Find a time column that represents days since baseline in a table."""
        # Prefer delta
        for c in df.columns:
            lc = c.lower()
            if "delta" in lc:
                return c
        # Fallback to 'days' if present
        for c in df.columns:
            lc = c.lower()
            if "day" in lc:
                return c
        return None

    # --------- ALSFRS-R harmonization ---------

    def _convert_alsfrs_r(self, alsfrs_df: pd.DataFrame) -> pd.DataFrame:
        """
        Prepare ALSFRS table. If ALSFRS-R subitems exist, map per paper:
          - Q10 <- 10a (dyspnea). Ignore 10b/10c.
          - Merge Q5a/Q5b into Q5 if present.
        If only totals exist, this is a no-op aside from coercions.
        """
        df = alsfrs_df.copy()

        if "ALSFRS_Total" in df.columns:
            df["ALSFRS_Total"] = pd.to_numeric(df["ALSFRS_Total"], errors="coerce")

        # Try to locate subitems by loose names
        cols = {c.lower(): c for c in df.columns}

        # Q10 from 10a (dyspnea)
        for candidate in ["alsfrs_r_q10a", "q10a", "dyspnea", "alsfrs_q10a"]:
            if candidate in cols:
                df["Q10"] = pd.to_numeric(df[cols[candidate]], errors="coerce")
                break

        # Merge Q5a/Q5b
        q5a = next(
            (cols[k] for k in ["alsfrs_r_q5a", "q5a", "cutting_wout_gastrostomy"] if k in cols),
            None,
        )
        q5b = next(
            (cols[k] for k in ["alsfrs_r_q5b", "q5b", "cutting_with_gastrostomy"] if k in cols),
            None,
        )
        if q5a and q5b:
            q5a_vals = pd.to_numeric(df[q5a], errors="coerce").values
            q5b_vals = pd.to_numeric(df[q5b], errors="coerce").values
            df["Q5"] = np.nanmax(np.vstack([q5a_vals, q5b_vals]), axis=0)

        return df

    # --------- Anchoring ---------

    def _alsfrs_anchor_days(self, alsfrs_df: pd.DataFrame) -> pd.Series:
        """
        Compute per-subject anchor day = first ALSFRS visit (min delta/days).
        """
        df = alsfrs_df.copy()
        tcol = self._find_time_col(df)
        if tcol is None:
            raise ValueError("ALSFRS table lacks a time delta/days column.")

        df.rename(columns={tcol: "alsfrs_delta"}, inplace=True)
        anchor_map = df.groupby("subject_id")["alsfrs_delta"].min()
        return anchor_map

    # --------- Data I/O ---------

    def load_and_inspect_data(self, file_path: str = "") -> Dict[str, pd.DataFrame]:
        datasets: Dict[str, pd.DataFrame] = {}
        file_list = [
            "PROACT_ALSFRS.csv",
            "PROACT_FVC.csv",
            "PROACT_VITALSIGNS.csv",
            "PROACT_RILUZOLE.csv",
            "PROACT_DEMOGRAPHICS.csv",
            "PROACT_LABS.csv",
            "PROACT_DEATHDATA.csv",
            "PROACT_HANDGRIPSTRENGTH.csv",
            "PROACT_MUSCLESTRENGTH.csv",
            "PROACT_ALSHISTORY.csv",
        ]
        print("--- Loading and Inspecting Data ---")
        for file_name in file_list:
            try:
                df = pd.read_csv(file_path + file_name, on_bad_lines="skip")
                # normalize subject_id
                if "subject_id" not in df.columns:
                    potential = [c for c in df.columns if "subject" in c.lower()]
                    if potential:
                        df = df.rename(columns={potential[0]: "subject_id"})
                # coerce delta-like numeric columns
                for c in df.columns:
                    if "delta" in c.lower() or "day" in c.lower():
                        df[c] = pd.to_numeric(df[c], errors="coerce")
                datasets[file_name] = df
                print(f"✓ {file_name}: {df.shape}")
            except FileNotFoundError:
                print(f"✗ {file_name}: File not found (skipped).")
        return datasets

    # --------- Outcome ---------

    def calculate_alsfrs_slope(self, alsfrs_df: pd.DataFrame) -> pd.DataFrame:
        """
        Outcome = slope between FIRST-after-3mo and FIRST-after-12mo ALSFRS totals,
        with time anchored to first ALSFRS visit.
        """
        df = alsfrs_df.copy()
        tcol = self._find_time_col(df)
        if tcol is None:
            raise ValueError("ALSFRS table lacks a time delta/days column.")
        if "ALSFRS_Total" not in df.columns:
            raise ValueError("ALSFRS_Total missing in ALSFRS table.")

        df.rename(columns={tcol: "alsfrs_delta"}, inplace=True)
        # Anchor
        anchor_map = df.groupby("subject_id")["alsfrs_delta"].min()
        df["days_from_anchor"] = df["alsfrs_delta"] - df["subject_id"].map(anchor_map)
        df["months"] = df["days_from_anchor"] / 30.44

        df = df.sort_values(["subject_id", "months"])
        slopes = {}

        for sid, g in df.groupby("subject_id", sort=False):
            g = g.dropna(subset=["months", "ALSFRS_Total"])
            t1 = g[g["months"] > 3.0].head(1)
            t2 = g[g["months"] > 12.0].head(1)
            if not t1.empty and not t2.empty:
                t1m = float(t1["months"].iloc[0])
                t2m = float(t2["months"].iloc[0])
                t1v = float(t1["ALSFRS_Total"].iloc[0])
                t2v = float(t2["ALSFRS_Total"].iloc[0])
                if t2m > t1m:
                    slopes[sid] = (t2v - t1v) / (t2m - t1m)

        return pd.DataFrame({"subject_id": list(slopes.keys()), "alsfrs_slope": list(slopes.values())})

    # --------- FVC collapse ---------

    @staticmethod
    def _fvc_collapse_trials(df: pd.DataFrame, time_col: str) -> pd.DataFrame:
        """
        Reduce FVC per row/time to the max across trials before summarization.
        Tries to detect typical trial columns; falls back gracefully.
        """
        d = df.copy()
        # Find obvious trial columns
        trial_cols = [c for c in d.columns if "trial" in c.lower()]
        # Some datasets have explicit liters columns per trial name
        if trial_cols:
            d["FVC_Liters"] = pd.to_numeric(d[trial_cols].max(axis=1), errors="coerce")
            keep = ["subject_id", time_col, "FVC_Liters"]
            return d[keep]
        # Fallbacks: look for liters column names
        liter_like = [c for c in d.columns if "liter" in c.lower() or "fvc" in c.lower()]
        if liter_like:
            # If multiple, take row-wise max
            d["FVC_Liters"] = pd.to_numeric(d[liter_like].max(axis=1), errors="coerce")
            keep = ["subject_id", time_col, "FVC_Liters"]
            return d[keep]
        # Last resort: return as-is
        return d

    # --------- Longitudinal summarization ---------

    def create_longitudinal_features(self, df: pd.DataFrame, time_col: str, prefix: str) -> pd.DataFrame:
        """
        Create 7 summaries over [0, 90] days from ALSFRS anchor:
          min, max, median, std, first, last, slope(first→last)
        Slope remains NaN if only one observation or zero time span.
        """
        if time_col not in df.columns:
            return pd.DataFrame()

        d = df.copy()
        # Coerce numerics (but keep subject_id/time cols)
        for c in d.columns:
            if c not in {"subject_id", time_col}:
                d[c] = pd.to_numeric(d[c], errors="coerce")

        # Ensure window is 0..90 days from ALSFRS anchor (already anchored)
        d = d[(d[time_col] >= 0) & (d[time_col] <= 90)].copy()
        if d.empty:
            return pd.DataFrame()

        # Value columns (exclude identifiers/derived delta/time)
        val_cols = [
            c
            for c in d.select_dtypes(include=[np.number]).columns
            if c not in self.id_and_delta_cols and c not in {"subject_id", time_col}
        ]
        if not val_cols:
            return pd.DataFrame()

        out = []
        g = d.groupby("subject_id", as_index=True)
        for col in val_cols:
            agg = g[col].agg(["min", "max", "median", "first", "last"])
            std_ = g[col].std(ddof=0).rename("std")
            slope = g.apply(
                lambda x: (x[col].iloc[-1] - x[col].iloc[0]) / max(1e-9, (x[time_col].iloc[-1] - x[time_col].iloc[0]))
                if len(x) > 1 and (x[time_col].iloc[-1] - x[time_col].iloc[0]) > 0
                else np.nan
            ).rename("slope")
            feat = pd.concat([agg, std_, slope], axis=1)
            feat.columns = [f"{prefix}{col}_{cname}" for cname in feat.columns]
            out.append(feat)

        return pd.concat(out, axis=1).reset_index()

    # --------- Static table processing (no encoding here to avoid leakage) ---------

    @staticmethod
    def process_static_data(df: pd.DataFrame) -> pd.DataFrame:
        """
        CV-safe: DO NOT encode here. Just keep one row per subject.
        (Do categorical encoding in your modeling pipeline.)
        """
        if "subject_id" not in df.columns:
            return pd.DataFrame()
        # Keep first non-duplicated row per subject_id
        return df.drop_duplicates(subset=["subject_id"]).copy()

    # --------- Merge features ---------

    def merge_all_features(self, datasets: Dict[str, pd.DataFrame]) -> pd.DataFrame:
        if "PROACT_DEMOGRAPHICS.csv" not in datasets:
            raise ValueError("Demographics file is missing.")

        # Build ALSFRS anchor map
        alsfrs = datasets["PROACT_ALSFRS.csv"]
        anchor_map = self._alsfrs_anchor_days(alsfrs)

        # Start with demographics (static)
        final_df = self.process_static_data(datasets["PROACT_DEMOGRAPHICS.csv"])

        # Add static-ish other tables (keep CV-safe; no encodings)
        for file in ["PROACT_RILUZOLE.csv", "PROACT_ALSHISTORY.csv"]:
            if file in datasets:
                static_df = self.process_static_data(datasets[file])
                final_df = pd.merge(final_df, static_df, on="subject_id", how="left")

        # Longitudinal configs
        longitudinal = {
            "PROACT_ALSFRS.csv": "alsfrs_",
            "PROACT_FVC.csv": "fvc_",
            "PROACT_VITALSIGNS.csv": "vitals_",
            "PROACT_LABS.csv": "labs_",
            "PROACT_HANDGRIPSTRENGTH.csv": "grip_",
            "PROACT_MUSCLESTRENGTH.csv": "muscle_",
        }

        print("\n--- Generating Longitudinal Features (anchored to first ALSFRS; window = 0–90 days) ---")
        for file, prefix in longitudinal.items():
            if file not in datasets:
                continue

            df = datasets[file].copy()
            tcol = self._find_time_col(df)
            if tcol is None:
                print(f"Warning: No time delta/days column in {file}. Skipping.")
                continue

            # Anchor this table to ALSFRS first visit
            df["anchor_days"] = df["subject_id"].map(anchor_map)
            df = df[~df["anchor_days"].isna()].copy()
            df["days_from_alsfrs_anchor"] = pd.to_numeric(df[tcol], errors="coerce") - df["anchor_days"]

            # FVC special handling: collapse to max-of-trials BEFORE summarization
            if file == "PROACT_FVC.csv":
                df = self._fvc_collapse_trials(df, time_col="days_from_alsfrs_anchor")

            # Attempt to pivot long-form measurement tables (best effort)
            if file in {"PROACT_LABS.csv", "PROACT_MUSCLESTRENGTH.csv", "PROACT_HANDGRIPSTRENGTH.csv"}:
                try:
                    test_cols = [
                        c
                        for c in df.columns
                        if c not in {"subject_id", "days_from_alsfrs_anchor", "anchor_days"}
                        and any(k in c.lower() for k in ["test", "exam", "muscle", "site", "name", "strength_test"])
                    ]
                    value_cols = [
                        c
                        for c in df.columns
                        if c not in {"subject_id", "days_from_alsfrs_anchor", "anchor_days"}
                        and any(k in c.lower() for k in ["result", "value", "strength", "score"])
                    ]
                    if test_cols and value_cols:
                        tcol_name = test_cols[0]
                        vcol_name = value_cols[0]
                        df[vcol_name] = pd.to_numeric(df[vcol_name], errors="coerce")
                        df = (
                            df.pivot_table(
                                index=["subject_id", "days_from_alsfrs_anchor"],
                                columns=tcol_name,
                                values=vcol_name,
                                aggfunc="mean",
                            )
                            .reset_index()
                        )
                except Exception as e:
                    print(f"Warning: Pivoting failed for {file}: {e}")

            feats = self.create_longitudinal_features(df, "days_from_alsfrs_anchor", prefix)
            if not feats.empty:
                final_df = pd.merge(final_df, feats, on="subject_id", how="left")

        return final_df

    # --------- Eligibility ---------

    def filter_eligible_patients(self, feature_df: pd.DataFrame, alsfrs_df: pd.DataFrame) -> pd.DataFrame:
        """
        Keep subjects who have ANY ALSFRS >3 months AND >12 months AFTER the ALSFRS anchor.
        """
        df = alsfrs_df.copy()
        tcol = self._find_time_col(df)
        if tcol is None:
            raise ValueError("ALSFRS table lacks a time delta/days column.")

        df.rename(columns={tcol: "alsfrs_delta"}, inplace=True)
        anchor_map = df.groupby("subject_id")["alsfrs_delta"].min()
        df["days_from_anchor"] = df["alsfrs_delta"] - df["subject_id"].map(anchor_map)
        df["months"] = df["days_from_anchor"] / 30.44

        g = df.groupby("subject_id")["months"]
        has_t1 = g.apply(lambda s: (s > 3.0).any())
        has_t2 = g.apply(lambda s: (s > 12.0).any())
        eligible_ids = has_t1[has_t1].index.intersection(has_t2[has_t2].index)

        print(f"\nEligible patients: {len(eligible_ids)} / {df['subject_id'].nunique()}")
        return feature_df[feature_df["subject_id"].isin(eligible_ids)].copy()

    # --------- Orchestration ---------

    def run_pipeline(self, file_path: str = "") -> Optional[Dict[str, pd.DataFrame]]:
        """
        End-to-end EDA (CV-safe) that writes 'final_processed_als_data.csv'.
        No imputation/scaling/feature selection here — do that inside your CV pipeline.
        """
        print("====== Starting ALS Data Preprocessing Pipeline ======")
        datasets = self.load_and_inspect_data(file_path)
        if "PROACT_ALSFRS.csv" not in datasets:
            print("CRITICAL ERROR: PROACT_ALSFRS.csv not found. Aborting.")
            return None

        # ALSFRS prep + anchor
        datasets["PROACT_ALSFRS.csv"] = self._convert_alsfrs_r(datasets["PROACT_ALSFRS.csv"])

        # Outcome
        target_df = self.calculate_alsfrs_slope(datasets["PROACT_ALSFRS.csv"])
        print(f"\nCalculated ALSFRS slope for {len(target_df)} patients.")

        # Features
        full_features = self.merge_all_features(datasets)

        # Eligibility
        eligible_features = self.filter_eligible_patients(full_features, datasets["PROACT_ALSFRS.csv"])

        # Join features + target
        final_df = pd.merge(eligible_features, target_df, on="subject_id", how="inner")

        # Drop features with >30% missing
        print("\n--- Handling Missing Values (Dropping cols with >30% missing) ---")
        initial_cols = len(final_df.columns)
        missing_thresh = 0.30
        min_non_na = int(np.ceil(len(final_df) * (1 - missing_thresh)))
        final_df = final_df.dropna(axis=1, thresh=min_non_na)
        dropped = initial_cols - len(final_df.columns)
        print(f"Dropped {dropped} columns for >{int(missing_thresh*100)}% missingness.")

        # Separate X/y (no transforms here to avoid leakage)
        if "alsfrs_slope" not in final_df.columns:
            print("No target available after merges. Aborting.")
            return None

        y = final_df["alsfrs_slope"]
        valid = y.notna()
        final_df = final_df.loc[valid].reset_index(drop=True)

        subject_ids = final_df["subject_id"]
        y = final_df["alsfrs_slope"]
        X = final_df.drop(columns=["subject_id", "alsfrs_slope"])

        # Save CV-safe engineered dataset (raw features)
        out = pd.concat([subject_ids, y, X], axis=1)
        out.to_csv("final_processed_als_data.csv", index=False)
        print("\n✅ Saved CV-safe engineered data to 'final_processed_als_data.csv'")
        print(f"Feature matrix shape: {X.shape} | Target length: {len(y)}")

        return {"X": X, "y": y, "subject_ids": subject_ids, "raw_frame": out}


if __name__ == "__main__":
    # If your CSVs live elsewhere, set file_path accordingly (e.g., "C:/data/PROACT/")
    file_path = ""
    processor = ALSDataProcessor()
    processed = processor.run_pipeline(file_path=file_path)
    if processed is not None:
        print("\nPreview of columns:", list(processed["X"].columns)[:10])
        print("Done.")


--- Loading and Inspecting Data ---
✓ PROACT_ALSFRS.csv: (73845, 20)
✓ PROACT_FVC.csv: (49110, 10)
✓ PROACT_VITALSIGNS.csv: (84721, 36)
✓ PROACT_RILUZOLE.csv: (10363, 3)
✓ PROACT_DEMOGRAPHICS.csv: (12504, 14)
✓ PROACT_LABS.csv: (2937162, 5)
✓ PROACT_DEATHDATA.csv: (5043, 3)
✓ PROACT_HANDGRIPSTRENGTH.csv: (19032, 11)
✓ PROACT_MUSCLESTRENGTH.csv: (204875, 10)
✓ PROACT_ALSHISTORY.csv: (13765, 16)

Calculated ALSFRS slope for 1897 patients.

--- Generating Longitudinal Features (anchored to first ALSFRS; window = 0–90 days) ---

Eligible patients: 3317 / 8538

--- Handling Missing Values (Dropping cols with >30% missing) ---
Dropped 1413 columns for >30% missingness.

✅ Saved CV-safe engineered data to 'final_processed_als_data.csv'
Feature matrix shape: (1897, 346) | Target length: 1897

Preview of columns: ['Demographics_Delta', 'Age', 'Race_Caucasian', 'Sex', 'Subject_used_Riluzole', 'Riluzole_use_Delta', 'Subject_ALS_History_Delta', 'Site_of_Onset', 'alsfrs_Q1_Speech_min', 'alsfrs_Q1_S

## Cell 5 — FAST classical baselines (successive halving + pipeline caching)

**Purpose:** Train strong-but-quick RF & SVR baselines using **successive halving** (prunes weak configs early) and **joblib cache** (reuses preprocessing across CV folds).

### What this cell does
- Loads `final_processed_als_data.csv`, quick **80/20** split, optional shuffle for fold homogeneity.
- Auto-detects **numeric vs categorical** columns.
- Builds **in-pipeline** preprocessing (impute; scaling only for SVR; OHE for categoricals).
- Runs **HalvingGridSearchCV (cv=3)** for:
  - **RandomForestRegressor** (no scaling needed).
  - **SVR (RBF)** (with scaling).
- Evaluates on the held-out test set and prints **RMSE, PCC** with **95% bootstrap CIs**.
- Prints a quick **RF+SVR 50–50 ensemble** as a sanity check.

### Why it’s fast
- **Successive halving** cuts off underperformers early → far fewer total fits.
- **joblib.Memory** caches transformers inside the pipeline → repeated CV splits don’t recompute imputation/encoding from scratch.

### Outputs (printed)
- Best params for RF & SVR (FAST grids).
- **Test Set Performance (FAST mode)** table with RMSE & PCC + CIs.
- Optional **RF+SVR Ensemble** metrics.

> Tip: If you’re only benchmarking speed, comment out the SVR block; RF alone is often a strong baseline here.

---


In [5]:
import numpy as np
import pandas as pd
from typing import Tuple, Dict
import warnings
warnings.filterwarnings("ignore")

# sklearn
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler, MinMaxScaler
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
from sklearn.utils import shuffle

# Halving search (successive halving)
from sklearn.experimental import enable_halving_search_cv  # noqa: F401
from sklearn.model_selection import HalvingGridSearchCV

# caching
from joblib import Memory

np.random.seed(42)

# ---------- Metrics ----------
def rmse(y_true, y_pred) -> float:
    return float(np.sqrt(mean_squared_error(y_true, y_pred)))

def safe_pcc(y_true, y_pred) -> float:
    yt = np.asarray(y_true, dtype=float).ravel()
    yp = np.asarray(y_pred, dtype=float).ravel()
    if yt.std() < 1e-12 or yp.std() < 1e-12:
        return 0.0
    return float(np.corrcoef(yt, yp)[0, 1])

def bootstrap_ci(y_true, y_pred, metric_fn, n_boot=800, alpha=0.95, seed=42) -> Tuple[float, float]:
    rng = np.random.default_rng(seed)
    y_true = np.asarray(y_true).ravel()
    y_pred = np.asarray(y_pred).ravel()
    n = len(y_true)
    stats = []
    idx = np.arange(n)
    for _ in range(n_boot):
        b = rng.choice(idx, size=n, replace=True)
        stats.append(metric_fn(y_true[b], y_pred[b]))
    lo = float(np.percentile(stats, (1 - alpha) / 2 * 100))
    hi = float(np.percentile(stats, (1 + alpha) / 2 * 100))
    return lo, hi

# ---------- Main ----------
def run_classical_pipeline_fast() -> pd.DataFrame:
    print("====== FAST Classical Baselines (successive halving, cached) ======")

    # 1) Load engineered data
    df = pd.read_csv("final_processed_als_data.csv")
    print(f"✓ Loaded engineered dataset: {df.shape}")

    X = df.drop(columns=["subject_id", "alsfrs_slope"])
    y = df["alsfrs_slope"].astype(float)

    # Optional quick shuffle for better fold homogeneity
    X, y = shuffle(X, y, random_state=42)

    # 80/20 split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.20, random_state=42
    )
    print(f"Split: train={X_train.shape[0]}, test={X_test.shape[0]}")

    # 2) Column typing
    num_cols = X_train.select_dtypes(include=[np.number]).columns.tolist()
    cat_cols = X_train.select_dtypes(exclude=[np.number]).columns.tolist()
    print(f"Detected numeric={len(num_cols)}, categorical={len(cat_cols)}")

    # Pipeline cache
    memory = Memory(location="sk_cache", verbose=0)

    # Preprocessors
    # Numeric: impute → (optional scaler in SVR branch)
    num_rf = Pipeline(steps=[
        ("imputer", SimpleImputer(strategy="median")),
    ], memory=memory)

    num_svr = Pipeline(steps=[
        ("imputer", SimpleImputer(strategy="median")),
        ("scaler", StandardScaler()),
    ], memory=memory)

    if len(cat_cols) > 0:
        cat_common = Pipeline(steps=[
            ("imputer", SimpleImputer(strategy="most_frequent")),
            ("onehot", OneHotEncoder(handle_unknown="ignore", sparse_output=False)),
        ], memory=memory)
        preproc_rf = ColumnTransformer(
            transformers=[("num", num_rf, num_cols), ("cat", cat_common, cat_cols)],
            remainder="drop"
        )
        preproc_svr = ColumnTransformer(
            transformers=[("num", num_svr, num_cols), ("cat", cat_common, cat_cols)],
            remainder="drop"
        )
    else:
        # No categoricals → simpler (faster) preprocessors
        preproc_rf = num_rf
        preproc_svr = num_svr

    # 3) Pipelines (with small but effective grids)
    rf_pipe = Pipeline(steps=[
        ("preprocess", preproc_rf),
        ("select", SelectKBest(score_func=f_regression, k="all")),
        ("model", RandomForestRegressor(random_state=42, n_jobs=-1))
    ], memory=memory)

    rf_grid: Dict[str, list] = {
        # keep imputer fixed (median) to avoid recomputing transforms
        "select__k": ["all", 50],              # feature count toggle
        "model__n_estimators": [250],          # enough trees, faster than 500
        "model__max_depth": [None, 12],
        "model__min_samples_leaf": [1, 2],
        "model__max_features": ["sqrt"],       # stable setting
    }

    # Successive halving (aggressive elimination reduces fits)
    rf_search = HalvingGridSearchCV(
        rf_pipe,
        rf_grid,
        factor=3,
        resource="n_samples",
        min_resources="exhaust",
        cv=3,
        scoring="neg_root_mean_squared_error",   # optimize RMSE directly
        n_jobs=-1,
        verbose=0,
        refit=True
    )
    print("\n--- Fitting RandomForest (HalvingGridSearchCV, cv=3) ---")
    rf_search.fit(X_train, y_train)
    print(f"RF best params: {rf_search.best_params_}")

    # SVR (trimmed grid; if you need even faster, comment this whole block)
    svr_pipe = Pipeline(steps=[
        ("preprocess", preproc_svr),
        ("select", SelectKBest(score_func=f_regression, k="all")),
        ("model", SVR(kernel="rbf"))
    ], memory=memory)

    svr_grid: Dict[str, list] = {
        "select__k": ["all", 50],
        "model__C": [1.0, 3.0],
        "model__epsilon": [0.1],
        "model__gamma": ["scale"],
    }

    svr_search = HalvingGridSearchCV(
        svr_pipe,
        svr_grid,
        factor=3,
        resource="n_samples",
        min_resources="exhaust",
        cv=3,
        scoring="neg_root_mean_squared_error",
        n_jobs=-1,
        verbose=0,
        refit=True
    )
    print("\n--- Fitting SVR (HalvingGridSearchCV, cv=3) ---")
    svr_search.fit(X_train, y_train)
    print(f"SVR best params: {svr_search.best_params_}")

    # 4) Test-set evaluation + (faster) bootstrap CIs
    results = []

    for name, est in [("Random Forest", rf_search), ("SVR (RBF)", svr_search)]:
        y_pred = est.best_estimator_.predict(X_test)
        test_rmse = rmse(y_test, y_pred)
        test_pcc  = safe_pcc(y_test.values, y_pred)

        rmse_lo, rmse_hi = bootstrap_ci(y_test.values, y_pred, rmse, n_boot=800, alpha=0.95, seed=123)
        pcc_lo,  pcc_hi  = bootstrap_ci(y_test.values, y_pred, safe_pcc, n_boot=800, alpha=0.95, seed=456)

        results.append({
            "Model": name,
            "RMSE": test_rmse,
            "RMSE 95% CI Low": rmse_lo,
            "RMSE 95% CI High": rmse_hi,
            "PCC": test_pcc,
            "PCC 95% CI Low": pcc_lo,
            "PCC 95% CI High": pcc_hi,
        })

    results_df = pd.DataFrame(results).set_index("Model")
    print("\n====== Test Set Performance (FAST mode) ======")
    print(results_df.round(4))

    # Optional quick 50–50 blend (no extra CV)
    rf_pred = rf_search.best_estimator_.predict(X_test)
    svr_pred = svr_search.best_estimator_.predict(X_test)
    ens_pred = 0.5 * (rf_pred + svr_pred)

    ens_rmse = rmse(y_test, ens_pred)
    ens_pcc  = safe_pcc(y_test.values, ens_pred)
    ens_rmse_ci = bootstrap_ci(y_test.values, ens_pred, rmse, n_boot=800, alpha=0.95, seed=789)
    ens_pcc_ci  = bootstrap_ci(y_test.values, ens_pred, safe_pcc, n_boot=800, alpha=0.95, seed=101112)

    print("\n--- Simple RF+SVR Avg Ensemble (FAST) ---")
    print(pd.DataFrame({
        "RMSE": [ens_rmse],
        "RMSE 95% CI Low": [ens_rmse_ci[0]],
        "RMSE 95% CI High": [ens_rmse_ci[1]],
        "PCC": [ens_pcc],
        "PCC 95% CI Low": [ens_pcc_ci[0]],
        "PCC 95% CI High": [ens_pcc_ci[1]],
    }, index=["RF+SVR Ensemble"]).round(4))

    return results_df


if __name__ == "__main__":
    run_classical_pipeline_fast()


✓ Loaded engineered dataset: (1897, 348)
Split: train=1517, test=380
Detected numeric=343, categorical=3

--- Fitting RandomForest (HalvingGridSearchCV, cv=3) ---
RF best params: {'model__max_depth': None, 'model__max_features': 'sqrt', 'model__min_samples_leaf': 2, 'model__n_estimators': 250, 'select__k': 50}

--- Fitting SVR (HalvingGridSearchCV, cv=3) ---
SVR best params: {'model__C': 1.0, 'model__epsilon': 0.1, 'model__gamma': 'scale', 'select__k': 'all'}

                 RMSE  RMSE 95% CI Low  RMSE 95% CI High     PCC  \
Model                                                              
Random Forest  0.5905           0.5467            0.6386  0.1918   
SVR (RBF)      0.5907           0.5405            0.6404  0.2131   

               PCC 95% CI Low  PCC 95% CI High  
Model                                           
Random Forest          0.0909           0.2972  
SVR (RBF)              0.1179           0.3153  

--- Simple RF+SVR Avg Ensemble (FAST) ---
                   RMSE

## Cell 4 — Variational Quantum Regressor (VQC) trained with SPSA

**Purpose:** End-to-end quantum regressor that predicts ALSFRS slope from engineered features.  
It builds a circuit **ZZFeatureMap → EfficientSU2**, measures the **average Z** observable, and fits a tiny linear head `ŷ = α·⟨O⟩ + β`. Parameters **θ (circuit), α, β** are trained with **SPSA** on MSE using an Aer/Estimator backend.

### Pipeline (quick map)
1) **Load** `final_processed_als_data.csv` → keep **numeric** columns only.  
2) **Train/test split** (80/20).  
3) **Feature pick** inside train: **RF top-K** numeric features (default **K=16**).  
4) **Train/val split** within training set for **early stopping**.  
5) **Impute + Standardize** (train-only fit) → **PLSRegression** → reduce to `n_qubits`.  
6) **Min-max to angles** `[0, π]` per component (angle encoding).  
7) **Circuit**: `ZZFeatureMap(n_qubits, reps) ∘ EfficientSU2(n_qubits, reps, entanglement="linear")`.  
8) **Observable**: mean of Z on each qubit: \( O = \frac{1}{n}\sum_i Z_i \).  
9) **Train with SPSA** on MSE, early stop on `val_rmse + (1 - val_pcc)`.  
10) **Test**: RMSE, PCC + **95% bootstrap CIs**.

### Key knobs (you can pass via `run_vqc`)
- `n_qubits` (e.g., 4) & `pls_components` (**must match** `n_qubits`).
- `topk` features before PLS (default 16).
- Circuit depth: `fmap_reps`, `ansatz_reps`.
- SPSA schedule: `a, c, A, alpha, gamma`; steps & `batch_size`.
- Early stopping `patience`. Backend auto-selects **AerEstimator** if available.

> **Tip:** For faster smoke tests, try `spsa_steps=150` and `batch_size=64`.

<details>
<summary>SPSA in 30 seconds</summary>
At step *k*, build **one random ± perturbation** of the whole parameter vector, compute two losses `L+` and `L−`, estimate a stochastic gradient with a finite difference, and apply a scheduled step size. It needs **2 objective calls per step**, independent of parameter count.
</details>

<details>
<summary>Encoding & Observable</summary>
- **Encoding:** PLS projects scaled features into `n_qubits` components, then min-max maps them into **[0, π]** angles to feed the `ZZFeatureMap`.  
- **Observable:** ⟨O⟩ is between −1 and 1; the linear head `(α, β)` rescales it to the target range.
</details>

<details>
<summary>Gotchas</summary>
- **`pls_components == n_qubits`** is asserted.  
- If **qiskit-aer** isn’t present, it falls back to the built-in `Estimator` (slower).  
- `topk` must be ≤ number of numeric features in the training split.  
- Seeds are set for numpy and RF; quantum backends may still introduce sampling noise.
</details>

**Printed outputs:** split sizes, top-K list, live train/val metrics during SPSA, and final test **RMSE/PCC + 95% CIs**.

---


In [6]:
%pip install pennylane pennylane-lightning[gpu] torch torchvision torchaudio

Collecting pennylane
  Downloading pennylane-0.42.3-py3-none-any.whl.metadata (11 kB)
Collecting pennylane-lightning[gpu]
  Downloading pennylane_lightning-0.42.0-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (11 kB)
Collecting appdirs (from pennylane)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting autoray<0.8,>=0.6.11 (from pennylane)
  Downloading autoray-0.7.2-py3-none-any.whl.metadata (5.8 kB)
Collecting diastatic-malt (from pennylane)
  Downloading diastatic_malt-2.15.2-py3-none-any.whl.metadata (2.6 kB)
Collecting scipy-openblas32>=0.3.26 (from pennylane-lightning[gpu])
  Downloading scipy_openblas32-0.3.30.0.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.1/57.1 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pennylane-lightning-gpu (from pennylane-lightning[gpu])
  Downloading pennylane_lightning_gpu-0.42.0-cp312-cp312-manylinux_2_28_x86_6

In [7]:
# arqnn_cf_meta_blend.py
import os, time, warnings, numpy as np, pandas as pd
from tqdm.auto import tqdm
warnings.filterwarnings("ignore")

# stable CPU
os.environ.setdefault("OMP_NUM_THREADS", "1")
os.environ.setdefault("OPENBLAS_NUM_THREADS", "1")
os.environ.setdefault("MKL_NUM_THREADS", "1")
os.environ.setdefault("NUMEXPR_NUM_THREADS", "1")

import torch
torch.set_num_threads(1)
torch.set_default_dtype(torch.float32)
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

import pennylane as qml

from sklearn.model_selection import train_test_split, KFold
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import RobustScaler, MinMaxScaler
from sklearn.cross_decomposition import PLSRegression
from sklearn.linear_model import RidgeCV
from sklearn.ensemble import RandomForestRegressor, HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score
from scipy.stats import pearsonr

np.random.seed(42); torch.manual_seed(42)

# ---------------- utils & metrics ----------------
def rmse(y_true, y_pred):
    return float(np.sqrt(mean_squared_error(y_true, y_pred)))

def safe_pcc(a, b):
    a, b = np.asarray(a).ravel(), np.asarray(b).ravel()
    if a.std()==0 or b.std()==0: return 0.0
    v = pearsonr(a, b)[0]
    return float(v) if np.isfinite(v) else 0.0

def pearson_loss_torch(x, y, eps=1e-8):
    x = x - x.mean()
    y = y - y.mean()
    denom = (x.std(unbiased=False) * y.std(unbiased=False) + eps)
    corr = (x * y).mean() / denom
    return 1.0 - corr

def load_als_data(path="final_processed_als_data.csv"):
    df = pd.read_csv(path)
    X = df.drop(columns=["subject_id", "alsfrs_slope"], errors="ignore")
    y = df["alsfrs_slope"].astype(float).values
    m = ~np.isnan(y)
    X, y = X.loc[m].reset_index(drop=True), y[m]
    print(f"✓ Loaded data: X={X.shape}, y={y.shape}")
    print(f"Target stats: mean={y.mean():.3f}, std={y.std():.3f}, range=[{y.min():.3f},{y.max():.3f}]")
    return X, y

def sanitize_features(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    for c in df.columns:
        if df[c].dtype == object:
            try: df[c] = pd.to_numeric(df[c])
            except Exception: df[c] = pd.factorize(df[c].astype(str))[0]
    return df

def select_features(X_df, y, k=12):
    imp = SimpleImputer(strategy="median")
    Xn = imp.fit_transform(X_df)
    rf = RandomForestRegressor(n_estimators=400, random_state=42, n_jobs=-1).fit(Xn, y)
    rf_s = rf.feature_importances_
    corr = np.array([abs(np.corrcoef(Xn[:, i], y)[0,1]) if Xn[:, i].std()>0 else 0.0 for i in range(Xn.shape[1])])
    def nz(v): m=v.max(); return v/(m+1e-8) if m>0 else v
    score = 0.7*nz(rf_s) + 0.3*nz(corr)
    idx = np.argsort(score)[::-1][:k]
    cols = [X_df.columns[i] for i in idx]
    print(f"✓ Selected top-{k}: {cols}")
    return idx.tolist(), cols

# ---------------- baseline (RF+HGB) with OOF ----------------
def oof_blend(X, y, n_splits=5, seed=42):
    kf = KFold(n_splits=n_splits, shuffle=True, random_state=seed)
    oof_rf  = np.zeros(len(y), dtype=np.float32)
    oof_hgb = np.zeros(len(y), dtype=np.float32)

    def new_rf():
        return RandomForestRegressor(
            n_estimators=800, max_features="sqrt", min_samples_leaf=2,
            random_state=seed, n_jobs=-1
        )
    def new_hgb():
        return HistGradientBoostingRegressor(
            loss="squared_error", learning_rate=0.06, max_iter=600, max_bins=255,
            l2_regularization=0.0, random_state=seed
        )

    for tr, va in kf.split(X):
        rf  = new_rf().fit(X[tr], y[tr])
        hgb = new_hgb().fit(X[tr], y[tr])
        oof_rf[va]  = rf.predict(X[va]).astype(np.float32)
        oof_hgb[va] = hgb.predict(X[va]).astype(np.float32)

    rf_full  = new_rf().fit(X, y)
    hgb_full = new_hgb().fit(X, y)
    oof_blended = 0.5*oof_rf + 0.5*oof_hgb
    return oof_blended, rf_full, hgb_full

# ---------------- dataset ----------------
class ResidualDataset(Dataset):
    def __init__(self, Xq, base_pred, y_true, res_scaled):
        self.Xq   = torch.tensor(Xq, dtype=torch.float32)
        self.base = torch.tensor(base_pred, dtype=torch.float32).view(-1,1)
        self.y    = torch.tensor(y_true, dtype=torch.float32)
        self.rz   = torch.tensor(res_scaled, dtype=torch.float32)  # residual / s
    def __len__(self): return len(self.y)
    def __getitem__(self, i): return self.Xq[i], self.base[i], self.y[i], self.rz[i]

# ---------------- tiny gate ----------------
class SqueezeGate(nn.Module):
    def __init__(self, d=4, h=8):
        super().__init__()
        self.net = nn.Sequential(nn.Linear(d, h), nn.ReLU(), nn.Linear(h, d), nn.Sigmoid())
    def forward(self, x): return x * self.net(x)

# ---------------- quantum block (40 obs) ----------------
class QuantumBlock(nn.Module):
    """
    4-qubit re-uploading, ring entanglement.
    Observables per edge i:(i+1):
      single: X_i, Y_i, Z_i
      pairs : XX, YY, ZZ, XZ, ZX, YZ, ZY
    => out_dim = (3 + 7) * n_qubits = 10 * n_qubits
    """
    def __init__(self, n_qubits=4, n_layers=2):
        super().__init__()
        self.n_qubits, self.n_layers = n_qubits, n_layers

        self.weights = nn.Parameter(torch.randn(n_layers, n_qubits, 2, dtype=torch.float32)*0.07)
        self.alpha   = nn.Parameter(torch.ones(n_qubits, dtype=torch.float32))
        self.beta    = nn.Parameter(torch.zeros(n_qubits, dtype=torch.float32))

        try:
            self.dev = qml.device("lightning.qubit", wires=n_qubits)
            prefer_adjoint = True
        except Exception:
            self.dev = qml.device("default.qubit", wires=n_qubits)
            prefer_adjoint = False

        X = [qml.PauliX(i) for i in range(n_qubits)]
        Y = [qml.PauliY(i) for i in range(n_qubits)]
        Z = [qml.PauliZ(i) for i in range(n_qubits)]
        pairs = []
        for i in range(n_qubits):
            j = (i+1) % n_qubits
            pairs += [
                qml.PauliX(i) @ qml.PauliX(j),
                qml.PauliY(i) @ qml.PauliY(j),
                qml.PauliZ(i) @ qml.PauliZ(j),
                qml.PauliX(i) @ qml.PauliZ(j),
                qml.PauliZ(i) @ qml.PauliX(j),
                qml.PauliY(i) @ qml.PauliZ(j),
                qml.PauliZ(i) @ qml.PauliY(j),
            ]
        self.obs = X + Y + Z + pairs  # 10*nq

        def make_qnode(diff_method):
            @qml.qnode(self.dev, interface="torch", diff_method=diff_method)
            def circuit(angles, weights, alpha, beta):
                for i in range(n_qubits):
                    qml.RY(np.pi * (alpha[i]*angles[i] + beta[i]), wires=i)
                for l in range(n_layers):
                    for i in range(n_qubits):
                        qml.RY(weights[l, i, 0], wires=i)
                        qml.RZ(weights[l, i, 1], wires=i)
                    for i in range(n_qubits):
                        qml.CNOT(wires=[i, (i+1)%n_qubits])
                    for i in range(n_qubits):
                        qml.RY(0.5*np.pi * (alpha[i]*angles[i] + beta[i]), wires=i)
                return [qml.expval(op) for op in self.obs]
            return circuit

        self.qnode = None
        # try adjoint
        try:
            if prefer_adjoint:
                self.qnode = make_qnode("adjoint")
                _ = self.qnode(torch.zeros(n_qubits), torch.zeros_like(self.weights),
                               torch.ones_like(self.alpha), torch.zeros_like(self.beta))
        except Exception:
            self.qnode = None
        # fallback backprop
        if self.qnode is None:
            try:
                self.dev = qml.device("default.qubit", wires=n_qubits)
                self.qnode = make_qnode("backprop")
                _ = self.qnode(torch.zeros(n_qubits), torch.zeros_like(self.weights),
                               torch.ones_like(self.alpha), torch.zeros_like(self.beta))
            except Exception:
                self.qnode = None
        # fallback param-shift
        if self.qnode is None:
            self.dev = qml.device("default.qubit", wires=n_qubits)
            self.qnode = make_qnode("parameter-shift")

        self.out_dim = 10 * n_qubits

    def forward(self, angles):
        if angles.dim() == 1:
            out = self.qnode(angles, self.weights, self.alpha, self.beta)
            return torch.stack(out).view(1, -1).to(angles.dtype)
        outs = []
        for a in angles:
            v = self.qnode(a, self.weights, self.alpha, self.beta)
            outs.append(torch.stack(v).to(a.dtype))
        return torch.stack(outs, dim=0)

# ---------------- ARQNN (adaptive residual) ----------------
class ARQNN(nn.Module):
    def __init__(self, in_dim=4, n_qubits=4, n_layers=2, hidden=32, dropout=0.10):
        super().__init__()
        self.pre = nn.Sequential(
            nn.Linear(in_dim, 16), nn.ReLU(),
            nn.Linear(16, n_qubits)
        )
        self.gate = SqueezeGate(d=n_qubits, h=8)
        self.ang_norm = nn.Tanh()

        self.qblock = QuantumBlock(n_qubits=n_qubits, n_layers=n_layers)
        self.norm = nn.LayerNorm(self.qblock.out_dim)

        self.head = nn.Sequential(
            nn.Linear(self.qblock.out_dim + 1, hidden),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden, 1)
        )
        self.blend = nn.Sequential(
            nn.Linear(self.qblock.out_dim + 1 + in_dim, 16),
            nn.ReLU(),
            nn.Linear(16, 1),
            nn.Sigmoid()
        )
        self.gamma = nn.Parameter(torch.tensor(1.0, dtype=torch.float32))

    def forward(self, xq, base, angle_jitter_std=0.0):
        xq   = xq.float()
        base = base.float()
        angles = self.ang_norm(self.gate(self.pre(xq)))
        if angle_jitter_std > 0:
            angles = torch.clamp(angles + angle_jitter_std * torch.randn_like(angles), -1.0, 1.0)
        qfeat  = self.norm(self.qblock(angles))
        rz_hat = self.head(torch.cat([qfeat, base], 1)).squeeze(1) * self.gamma
        w      = self.blend(torch.cat([qfeat, base, angles], 1)).squeeze(1)
        return rz_hat, w, angles, qfeat

# ---------------- train helpers ----------------
def train_epoch(model, loader, opt, res_scale, lamb, device="cpu", clip=1.0, angle_jitter=0.02):
    model.train(); tot=0.0
    (lam_y, lam_res, lam_corr, lam_w) = lamb
    huber = nn.SmoothL1Loss(beta=0.5)
    for xb, baseb, yb, rzb in tqdm(loader, desc="Train", leave=False):
        xb, baseb, yb, rzb = xb.to(device), baseb.to(device), yb.to(device), rzb.to(device)
        opt.zero_grad()
        rz_hat, w, angles, qfeat = model(xb, baseb, angle_jitter_std=angle_jitter)
        y_hat = baseb.squeeze(1) + w * (rz_hat * res_scale)
        loss_y   = huber(y_hat, yb)
        loss_res = huber(rz_hat, rzb)
        loss_corr= pearson_loss_torch(y_hat, yb)
        loss_w   = (w*w).mean() * 0.1
        loss = lam_y*loss_y + lam_res*loss_res + lam_corr*loss_corr + lam_w*loss_w
        loss.backward()
        if clip: nn.utils.clip_grad_norm_(model.parameters(), clip)
        opt.step()
        tot += loss.item() * xb.size(0)
    return tot/len(loader.dataset)

@torch.no_grad()
def eval_epoch(model, loader, res_scale, device="cpu"):
    model.eval(); tot_y=0.0; preds=[]; tgts=[]
    for xb, baseb, yb, rzb in loader:
        xb, baseb, yb, rzb = xb.to(device), baseb.to(device), yb.to(device), rzb.to(device)
        rz_hat, w, _, _ = model(xb, baseb, angle_jitter_std=0.0)
        y_hat = baseb.squeeze(1) + w * (rz_hat * res_scale)
        loss = nn.functional.mse_loss(y_hat, yb)
        tot_y += loss.item() * xb.size(0)
        preds.append(y_hat.cpu().numpy()); tgts.append(yb.cpu().numpy())
    preds = np.concatenate(preds); tgts = np.concatenate(tgts)
    return tot_y/len(loader.dataset), preds, tgts

# ---------------- angle prep ----------------
def random_orthogonal_matrix(d, rng):
    A = rng.normal(size=(d, d))
    Q, _ = np.linalg.qr(A)
    return Q.astype(np.float32)

def fit_pls_angles(X_src, y_src, X_tr_raw, X_va_raw, X_te_raw, feat_idx, n_pls, seed):
    imp = SimpleImputer(strategy="median")
    rb  = RobustScaler()
    pls = PLSRegression(n_components=n_pls, scale=False)
    rng = np.random.default_rng(seed)

    def sel(df, cols): return pd.DataFrame(df, columns=X_src.columns).iloc[:, cols]
    Xtr_s = rb.fit_transform(imp.fit_transform(sel(X_tr_raw, feat_idx))).astype(np.float32)
    Xva_s = rb.transform(imp.transform(sel(X_va_raw, feat_idx))).astype(np.float32)
    Xte_s = rb.transform(imp.transform(sel(X_te_raw, feat_idx))).astype(np.float32)

    # bootstrap seed for mild diversity
    boot_idx = rng.choice(np.arange(len(y_src)), size=len(y_src), replace=True)
    pls.fit(Xtr_s[boot_idx], y_src[boot_idx].reshape(-1,1))
    Xtr_pls = pls.transform(Xtr_s).astype(np.float32)
    Xva_pls = pls.transform(Xva_s).astype(np.float32)
    Xte_pls = pls.transform(Xte_s).astype(np.float32)

    # random orthogonal rotation in angle space
    R = random_orthogonal_matrix(n_pls, rng)
    Xtr_pls = (Xtr_pls @ R).astype(np.float32)
    Xva_pls = (Xva_pls @ R).astype(np.float32)
    Xte_pls = (Xte_pls @ R).astype(np.float32)

    mm = MinMaxScaler(feature_range=(-1.0, 1.0))
    Xtr_q = mm.fit_transform(Xtr_pls).astype(np.float32)
    Xva_q = mm.transform(Xva_pls).astype(np.float32)
    Xte_q = mm.transform(Xte_pls).astype(np.float32)
    return Xtr_q, Xva_q, Xte_q

# ---------------- one ARQNN train/predict ----------------
def train_single_arqnn(
    X_src, y_src,
    X_tr_raw, X_va_raw, X_te_raw, y_tr, y_va, y_te,
    feat_idx, n_pls,
    base_tr, base_va, base_te,
    n_qubits=4, n_layers=2, epochs=32, warmup_epochs=6,
    batch_size=128, patience=8, lr_head=1.0e-2, lr_q=3.0e-3,
    weight_decay=1e-4, lamb=(1.0, 0.25, 0.15, 1e-3), seed=123,
    mc_val=2, mc_test=3
):
    Xtr_q, Xva_q, Xte_q = fit_pls_angles(X_src, y_tr, X_tr_raw, X_va_raw, X_te_raw, feat_idx, n_pls, seed)

    # residual targets
    res_tr = (y_tr - base_tr).astype(np.float32)
    res_va = (y_va - base_va).astype(np.float32)
    s_res  = float(res_tr.std() + 1e-8)
    rz_tr  = (res_tr / s_res).astype(np.float32)
    rz_va  = (res_va / s_res).astype(np.float32)

    tr_ld = DataLoader(ResidualDataset(Xtr_q, base_tr, y_tr, rz_tr), batch_size=batch_size, shuffle=True)
    va_ld = DataLoader(ResidualDataset(Xva_q, base_va, y_va, rz_va), batch_size=batch_size, shuffle=False)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = ARQNN(in_dim=n_pls, n_qubits=n_qubits, n_layers=n_layers,
                  hidden=32, dropout=0.10).to(device).float()

    # two LR groups
    params_head = list(model.head.parameters()) + list(model.pre.parameters()) + list(model.gate.parameters()) + list(model.blend.parameters()) + [model.gamma]
    params_q    = list(model.qblock.parameters())
    opt = optim.Adam([
        {"params": params_head, "lr": lr_head, "weight_decay": weight_decay},
        {"params": params_q,    "lr": lr_q,    "weight_decay": 0.0}
    ])
    sch = optim.lr_scheduler.CosineAnnealingLR(opt, T_max=max(epochs, 50))

    # warmup freeze q
    for p in params_q: p.requires_grad = False

    best, bad, best_state = np.inf, 0, None
    for ep in range(1, epochs+1):
        if ep == warmup_epochs+1:
            for p in params_q: p.requires_grad = True

        tr_loss = train_epoch(model, tr_ld, opt, res_scale=s_res, lamb=lamb, device=device, clip=1.0, angle_jitter=0.02)
        va_mse, yva_pred, yva_true = eval_epoch(model, va_ld, res_scale=s_res, device=device)
        va_rmse = float(np.sqrt(va_mse)); va_pcc = safe_pcc(yva_true, yva_pred)
        tqdm.write(f"[seed {seed:4d} | ep {ep:02d}] train={tr_loss:.5f}  val_RMSE={va_rmse:.5f}  val_PCC={va_pcc:.3f}  lr={sch.get_last_lr()[0]:.4g}")
        if va_mse + 1e-8 < best:
            best, bad = va_mse, 0
            best_state = {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
        else:
            bad += 1
            if bad >= patience:
                tqdm.write(f"  early stop (patience={patience})")
                break
        sch.step()
    if best_state is not None:
        model.load_state_dict(best_state)

    @torch.no_grad()
    def predict_with_unc(Xq, base, batch=256, mc=2):
        ds = ResidualDataset(Xq, base, np.zeros_like(base, dtype=np.float32), np.zeros_like(base, dtype=np.float32))
        ld = DataLoader(ds, batch_size=batch, shuffle=False)
        samples = []
        for _ in range(max(1, mc)):
            model.train()
            for m in model.modules():
                if isinstance(m, nn.Dropout): m.train()
            cur = []
            for xb, baseb, _, _ in ld:
                xb, baseb = xb.to(device), baseb.to(device)
                rz_hat, w, _, _ = model(xb, baseb, angle_jitter_std=0.0)
                y_hat = baseb.squeeze(1) + w * (rz_hat * s_res)
                cur.append(y_hat.cpu().numpy())
            samples.append(np.concatenate(cur))
        samples = np.stack(samples, axis=0)  # (mc, N)
        return samples.mean(axis=0).astype(np.float32), samples.std(axis=0).astype(np.float32)

    # return test/val predictors
    y_va_mean, y_va_std = predict_with_unc(Xva_q, base_va, mc=mc_val)
    y_te_mean, y_te_std = predict_with_unc(Xte_q, base_te, mc=mc_test)
    return dict(
        y_va_mean=y_va_mean, y_va_std=y_va_std,
        y_te_mean=y_te_mean, y_te_std=y_te_std
    )

# ---------------- MAIN: cross-fitted ARQNN + meta blender ----------------
def run_arqnn_cf_meta(
    data_path="final_processed_als_data.csv",
    n_features=12, n_pls=4, n_qubits=4,
    folds=3,                 # 3 is a good CPU compromise
    oof_seeds=(101,),        # seeds per fold model (OOF stage)
    final_seeds=(202, 303),  # seeds for final full-train bag on test
    n_layers=2,
    epochs=32,
    warmup_epochs=6,
    batch_size=128,
    patience=8,
    lr_head=1.0e-2,
    lr_q=3.0e-3,
    weight_decay=1e-4,
    lamb=(1.0, 0.25, 0.10, 1e-3)  # a touch less corr weight for stability
):
    t0 = time.time()
    # load & split
    X_raw, y = load_als_data(data_path)
    X_raw = sanitize_features(X_raw)
    X_tr_raw, X_te_raw, y_tr, y_te = train_test_split(X_raw, y, test_size=0.20, random_state=42)
    print(f"Splits: train={len(y_tr)}, test={len(y_te)}")

    # --- baseline on FULL features with OOF for training part
    imp_b = SimpleImputer(strategy="median")
    Xtr_b = imp_b.fit_transform(X_tr_raw).astype(np.float32)
    Xte_b = imp_b.transform(X_te_raw).astype(np.float32)

    oof_bl, rf_full, hgb_full = oof_blend(Xtr_b, y_tr, n_splits=5, seed=42)
    base_tr = oof_bl.astype(np.float32)
    base_te = (0.5*rf_full.predict(Xte_b) + 0.5*hgb_full.predict(Xte_b)).astype(np.float32)

    # --- feature selection for PLS angles (fit on ALL training to pick columns)
    feat_idx, feat_cols = select_features(pd.DataFrame(X_tr_raw, columns=X_raw.columns), y_tr, k=n_features)
    print(f"✓ PLS→{n_pls} angles from top-{n_features} features: {feat_cols}")

    # --- cross-fitted ARQNN (OOF predictions)
    kf = KFold(n_splits=folds, shuffle=True, random_state=123)
    oof_q_mean = np.zeros_like(base_tr, dtype=np.float32)
    oof_q_std  = np.zeros_like(base_tr, dtype=np.float32)

    X_tr_raw_np = np.asarray(X_tr_raw)
    for k, (tr_idx, va_idx) in enumerate(kf.split(X_tr_raw_np), 1):
        X_tr_fold, X_va_fold = X_tr_raw_np[tr_idx], X_tr_raw_np[va_idx]
        y_tr_fold, y_va_fold = y_tr[tr_idx], y_tr[va_idx]
        base_tr_fold, base_va_fold = base_tr[tr_idx], base_tr[va_idx]
        print(f"\n-- Fold {k}/{folds}: train={len(tr_idx)}, val={len(va_idx)} --")

        # train 1 (or more) ARQNNs on this fold and average their val predictions
        fold_val_means, fold_val_stds = [], []
        for s in oof_seeds:
            art = train_single_arqnn(
                X_src=pd.DataFrame(X_tr_raw, columns=X_raw.columns),
                y_src=y_tr_fold,
                X_tr_raw=X_tr_fold, X_va_raw=X_va_fold, X_te_raw=X_va_fold,  # test arg unused here
                y_tr=y_tr_fold, y_va=y_va_fold, y_te=y_va_fold,
                feat_idx=feat_idx, n_pls=n_pls,
                base_tr=base_tr_fold, base_va=base_va_fold, base_te=base_va_fold,
                n_qubits=n_qubits, n_layers=n_layers, epochs=epochs, warmup_epochs=warmup_epochs,
                batch_size=batch_size, patience=patience, lr_head=lr_head, lr_q=lr_q,
                weight_decay=weight_decay, lamb=lamb, seed=int(s+1000*k),
                mc_val=3, mc_test=3
            )
            fold_val_means.append(art["y_va_mean"])
            fold_val_stds.append(art["y_va_std"])
        oof_q_mean[va_idx] = np.mean(np.stack(fold_val_means, axis=0), axis=0)
        oof_q_std[va_idx]  = np.mean(np.stack(fold_val_stds,  axis=0), axis=0)

    # --- meta blender on OOF
    X_meta_tr = np.column_stack([
        base_tr,
        oof_q_mean,
        np.abs(oof_q_mean - base_tr),
        oof_q_std
    ])
    meta = RidgeCV(alphas=np.logspace(-3, 3, 13), cv=5).fit(X_meta_tr, y_tr)
    print(f"\nMeta coefficients: {dict(zip(['base','qmean','|diff|','qstd'], meta.coef_.ravel()))}  | alpha={meta.alpha_:.3g}")

    # --- final ARQNNs on FULL training, predict test (+uncertainty), then meta-blend
    # reuse same seeds as final_seeds for diversity
    test_means, test_stds = [], []
    for s in final_seeds:
        art_te = train_single_arqnn(
            X_src=pd.DataFrame(X_tr_raw, columns=X_raw.columns),
            y_src=y_tr,
            X_tr_raw=np.asarray(X_tr_raw), X_va_raw=np.asarray(X_tr_raw), X_te_raw=np.asarray(X_te_raw),
            y_tr=y_tr, y_va=y_tr, y_te=y_te,   # val args unused (we won't read them)
            feat_idx=feat_idx, n_pls=n_pls,
            base_tr=base_tr, base_va=base_tr, base_te=base_te,
            n_qubits=n_qubits, n_layers=n_layers, epochs=epochs, warmup_epochs=warmup_epochs,
            batch_size=batch_size, patience=patience, lr_head=lr_head, lr_q=lr_q,
            weight_decay=weight_decay, lamb=lamb, seed=int(s),
            mc_val=2, mc_test=5
        )
        test_means.append(art_te["y_te_mean"])
        test_stds.append(art_te["y_te_std"])
    y_te_qmean = np.mean(np.stack(test_means, axis=0), axis=0)
    y_te_qstd  = np.mean(np.stack(test_stds,  axis=0), axis=0)

    X_meta_te = np.column_stack([
        base_te,
        y_te_qmean,
        np.abs(y_te_qmean - base_te),
        y_te_qstd
    ])
    y_te_meta = meta.predict(X_meta_te).astype(np.float32)

    # --- metrics
    base_rmse = rmse(y_te, base_te); base_pcc = safe_pcc(y_te, base_te); base_r2 = r2_score(y_te, base_te)
    q_rmse    = rmse(y_te, y_te_qmean); q_pcc  = safe_pcc(y_te, y_te_qmean); q_r2  = r2_score(y_te, y_te_qmean)
    m_rmse    = rmse(y_te, y_te_meta);  m_pcc  = safe_pcc(y_te, y_te_meta);  m_r2  = r2_score(y_te, y_te_meta)

    print("\n===== TEST METRICS =====")
    print(f"Baseline (RF+HGB)      → RMSE={base_rmse:.4f}  PCC={base_pcc:.4f}  R²={base_r2:.4f}")
    print(f"ARQNN mean (bag)       → RMSE={q_rmse:.4f}  PCC={q_pcc:.4f}  R²={q_r2:.4f}")
    print(f"Meta-blend (OOF-trained)→ RMSE={m_rmse:.4f}  PCC={m_pcc:.4f}  R²={m_r2:.4f}")
    print(f"\nTotal time: {time.time()-t0:.1f}s")
    return dict(base=dict(rmse=base_rmse,pcc=base_pcc,r2=base_r2),
                qnn =dict(rmse=q_rmse,pcc=q_pcc,r2=q_r2),
                meta=dict(rmse=m_rmse,pcc=m_pcc,r2=m_r2))

if __name__ == "__main__":
    _ = run_arqnn_cf_meta(
        data_path="final_processed_als_data.csv",
        n_features=12,   # 12–14 usually best here
        n_pls=4,         # = n_qubits (keep 4 for CPU)
        n_qubits=4,
        folds=3,
        oof_seeds=(101,),      # 1 model per fold for OOF (keeps time down)
        final_seeds=(202,303), # 2-model bag on full train for test
        n_layers=2,            # 3 if you can afford more time
        epochs=32,
        warmup_epochs=6,
        batch_size=128,
        patience=8,
        lr_head=1.0e-2,
        lr_q=3.0e-3,
        weight_decay=1e-4,
        lamb=(1.0, 0.25, 0.10, 1e-3)
    )


✓ Loaded data: X=(1897, 346), y=(1897,)
Target stats: mean=-0.667, std=0.572, range=[-3.628,1.208]
Splits: train=1517, test=380
✓ Selected top-12: ['alsfrs_ALSFRS_Total_slope', 'fvc_FVC_Liters_slope', 'fvc_FVC_Liters_std', 'alsfrs_ALSFRS_Total_std', 'alsfrs_Q1_Speech_min', 'alsfrs_Q1_Speech_last', 'alsfrs_Q1_Speech_max', 'alsfrs_Q1_Speech_median', 'alsfrs_Q3_Swallowing_min', 'alsfrs_Q1_Speech_first', 'labs_Phosphorus_median', 'alsfrs_Q3_Swallowing_last']
✓ PLS→4 angles from top-12 features: ['alsfrs_ALSFRS_Total_slope', 'fvc_FVC_Liters_slope', 'fvc_FVC_Liters_std', 'alsfrs_ALSFRS_Total_std', 'alsfrs_Q1_Speech_min', 'alsfrs_Q1_Speech_last', 'alsfrs_Q1_Speech_max', 'alsfrs_Q1_Speech_median', 'alsfrs_Q3_Swallowing_min', 'alsfrs_Q1_Speech_first', 'labs_Phosphorus_median', 'alsfrs_Q3_Swallowing_last']

-- Fold 1/3: train=1011, val=506 --


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 1101 | ep 01] train=0.44173  val_RMSE=0.54039  val_PCC=0.275  lr=0.01


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 1101 | ep 02] train=0.44103  val_RMSE=0.53483  val_PCC=0.274  lr=0.00999


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 1101 | ep 03] train=0.43897  val_RMSE=0.53834  val_PCC=0.274  lr=0.009961


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 1101 | ep 04] train=0.43828  val_RMSE=0.53504  val_PCC=0.274  lr=0.009911


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 1101 | ep 05] train=0.43940  val_RMSE=0.53628  val_PCC=0.275  lr=0.009843


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 1101 | ep 06] train=0.43696  val_RMSE=0.53695  val_PCC=0.275  lr=0.009755


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 1101 | ep 07] train=0.43659  val_RMSE=0.53745  val_PCC=0.276  lr=0.009649


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 1101 | ep 08] train=0.43756  val_RMSE=0.53608  val_PCC=0.276  lr=0.009524


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 1101 | ep 09] train=0.43827  val_RMSE=0.53806  val_PCC=0.278  lr=0.009382


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 1101 | ep 10] train=0.43416  val_RMSE=0.53728  val_PCC=0.279  lr=0.009222
  early stop (patience=8)

-- Fold 2/3: train=1011, val=506 --


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 01] train=0.44879  val_RMSE=0.52463  val_PCC=0.305  lr=0.01


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 02] train=0.44181  val_RMSE=0.52479  val_PCC=0.306  lr=0.00999


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 03] train=0.44132  val_RMSE=0.52521  val_PCC=0.306  lr=0.009961


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 04] train=0.44066  val_RMSE=0.52620  val_PCC=0.307  lr=0.009911


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 05] train=0.43785  val_RMSE=0.52643  val_PCC=0.308  lr=0.009843


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 06] train=0.44184  val_RMSE=0.52464  val_PCC=0.308  lr=0.009755


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 07] train=0.43945  val_RMSE=0.52715  val_PCC=0.310  lr=0.009649


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 08] train=0.44112  val_RMSE=0.52418  val_PCC=0.310  lr=0.009524


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 09] train=0.44243  val_RMSE=0.52417  val_PCC=0.312  lr=0.009382


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 10] train=0.43989  val_RMSE=0.52621  val_PCC=0.318  lr=0.009222


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 11] train=0.43916  val_RMSE=0.52248  val_PCC=0.320  lr=0.009045


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 12] train=0.43842  val_RMSE=0.52356  val_PCC=0.321  lr=0.008853


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 13] train=0.43744  val_RMSE=0.52542  val_PCC=0.320  lr=0.008645


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 14] train=0.43749  val_RMSE=0.52471  val_PCC=0.317  lr=0.008423


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 15] train=0.43758  val_RMSE=0.52494  val_PCC=0.318  lr=0.008187


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 16] train=0.43522  val_RMSE=0.52484  val_PCC=0.318  lr=0.007939


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 17] train=0.43384  val_RMSE=0.52456  val_PCC=0.318  lr=0.007679


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 18] train=0.43272  val_RMSE=0.52705  val_PCC=0.314  lr=0.007409


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 2101 | ep 19] train=0.43171  val_RMSE=0.52973  val_PCC=0.316  lr=0.007129
  early stop (patience=8)

-- Fold 3/3: train=1012, val=505 --


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 01] train=0.44160  val_RMSE=0.54621  val_PCC=0.296  lr=0.01


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 02] train=0.43890  val_RMSE=0.54525  val_PCC=0.296  lr=0.00999


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 03] train=0.43794  val_RMSE=0.54526  val_PCC=0.296  lr=0.009961


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 04] train=0.43899  val_RMSE=0.54573  val_PCC=0.296  lr=0.009911


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 05] train=0.43676  val_RMSE=0.54519  val_PCC=0.296  lr=0.009843


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 06] train=0.43633  val_RMSE=0.54554  val_PCC=0.296  lr=0.009755


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 07] train=0.43598  val_RMSE=0.54527  val_PCC=0.297  lr=0.009649


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 08] train=0.43529  val_RMSE=0.54509  val_PCC=0.297  lr=0.009524


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 09] train=0.43485  val_RMSE=0.54497  val_PCC=0.298  lr=0.009382


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 10] train=0.43359  val_RMSE=0.54496  val_PCC=0.299  lr=0.009222


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 11] train=0.43518  val_RMSE=0.54658  val_PCC=0.298  lr=0.009045


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 12] train=0.43387  val_RMSE=0.54558  val_PCC=0.296  lr=0.008853


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 13] train=0.43174  val_RMSE=0.54535  val_PCC=0.296  lr=0.008645


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 14] train=0.43146  val_RMSE=0.54545  val_PCC=0.295  lr=0.008423


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 15] train=0.43194  val_RMSE=0.54628  val_PCC=0.293  lr=0.008187


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 16] train=0.43125  val_RMSE=0.54618  val_PCC=0.294  lr=0.007939


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 17] train=0.43120  val_RMSE=0.54701  val_PCC=0.292  lr=0.007679


Train:   0%|          | 0/8 [00:00<?, ?it/s]

[seed 3101 | ep 18] train=0.43259  val_RMSE=0.54615  val_PCC=0.294  lr=0.007409
  early stop (patience=8)

Meta coefficients: {'base': np.float32(0.2644455), 'qmean': np.float32(0.6120928), '|diff|': np.float32(-0.005869236), 'qstd': np.float32(-0.031511776)}  | alpha=3.16


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 01] train=0.45330  val_RMSE=0.53539  val_PCC=0.291  lr=0.01


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 02] train=0.44206  val_RMSE=0.53514  val_PCC=0.291  lr=0.00999


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 03] train=0.44104  val_RMSE=0.53513  val_PCC=0.291  lr=0.009961


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 04] train=0.43893  val_RMSE=0.53688  val_PCC=0.291  lr=0.009911


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 05] train=0.44061  val_RMSE=0.53513  val_PCC=0.291  lr=0.009843


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 06] train=0.43921  val_RMSE=0.53702  val_PCC=0.292  lr=0.009755


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 07] train=0.43821  val_RMSE=0.53541  val_PCC=0.292  lr=0.009649


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 08] train=0.43764  val_RMSE=0.53674  val_PCC=0.292  lr=0.009524


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 09] train=0.43718  val_RMSE=0.53642  val_PCC=0.292  lr=0.009382


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 10] train=0.43804  val_RMSE=0.53980  val_PCC=0.293  lr=0.009222


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 11] train=0.43858  val_RMSE=0.53571  val_PCC=0.293  lr=0.009045


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 12] train=0.43627  val_RMSE=0.53724  val_PCC=0.295  lr=0.008853


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  202 | ep 13] train=0.43649  val_RMSE=0.53845  val_PCC=0.296  lr=0.008645
  early stop (patience=8)


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 01] train=0.44248  val_RMSE=0.53527  val_PCC=0.291  lr=0.01


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 02] train=0.44072  val_RMSE=0.53556  val_PCC=0.292  lr=0.00999


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 03] train=0.43962  val_RMSE=0.53667  val_PCC=0.292  lr=0.009961


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 04] train=0.43836  val_RMSE=0.53566  val_PCC=0.292  lr=0.009911


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 05] train=0.44006  val_RMSE=0.53547  val_PCC=0.292  lr=0.009843


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 06] train=0.43874  val_RMSE=0.53538  val_PCC=0.292  lr=0.009755


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 07] train=0.43741  val_RMSE=0.53624  val_PCC=0.295  lr=0.009649


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 08] train=0.43899  val_RMSE=0.53680  val_PCC=0.297  lr=0.009524


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 09] train=0.43809  val_RMSE=0.53396  val_PCC=0.299  lr=0.009382


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 10] train=0.43553  val_RMSE=0.53531  val_PCC=0.300  lr=0.009222


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 11] train=0.43536  val_RMSE=0.53379  val_PCC=0.305  lr=0.009045


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 12] train=0.43535  val_RMSE=0.53466  val_PCC=0.302  lr=0.008853


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 13] train=0.43529  val_RMSE=0.53640  val_PCC=0.302  lr=0.008645


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 14] train=0.43381  val_RMSE=0.53286  val_PCC=0.308  lr=0.008423


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 15] train=0.43823  val_RMSE=0.53282  val_PCC=0.312  lr=0.008187


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 16] train=0.43577  val_RMSE=0.53467  val_PCC=0.299  lr=0.007939


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 17] train=0.43514  val_RMSE=0.53540  val_PCC=0.302  lr=0.007679


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 18] train=0.43428  val_RMSE=0.53509  val_PCC=0.305  lr=0.007409


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 19] train=0.43487  val_RMSE=0.53394  val_PCC=0.308  lr=0.007129


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 20] train=0.43392  val_RMSE=0.53269  val_PCC=0.311  lr=0.006841


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 21] train=0.43418  val_RMSE=0.53450  val_PCC=0.307  lr=0.006545


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 22] train=0.43420  val_RMSE=0.53456  val_PCC=0.309  lr=0.006243


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 23] train=0.43408  val_RMSE=0.53318  val_PCC=0.310  lr=0.005937


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 24] train=0.43399  val_RMSE=0.53303  val_PCC=0.310  lr=0.005627


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 25] train=0.43275  val_RMSE=0.53396  val_PCC=0.312  lr=0.005314


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 26] train=0.43298  val_RMSE=0.53399  val_PCC=0.306  lr=0.005


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 27] train=0.43456  val_RMSE=0.53339  val_PCC=0.312  lr=0.004686


Train:   0%|          | 0/12 [00:00<?, ?it/s]

[seed  303 | ep 28] train=0.43298  val_RMSE=0.53309  val_PCC=0.312  lr=0.004373
  early stop (patience=8)

===== TEST METRICS =====
Baseline (RF+HGB)      → RMSE=0.5914  PCC=0.2935  R²=0.0855
ARQNN mean (bag)       → RMSE=0.5920  PCC=0.2900  R²=0.0837
Meta-blend (OOF-trained)→ RMSE=0.5920  PCC=0.2913  R²=0.0839

Total time: 2762.6s
