# Exoplanet Pipeline (Colab) — Real Data → ~900 TSFresh Features → Baseline ML → Siamese → Outputs

This notebook installs dependencies, loads your data, extracts ~900+ TSFresh features, trains/evaluates a baseline model and a Siamese deep model, and lets you download outputs.

Instructions:
- Run each cell top-to-bottom.
- When prompted to upload files, select your light curves (NPZ/CSV/FITS).
- If you have labels, upload a CSV with columns: `source,label`.


## 0) Runtime
Optional: Runtime → Change runtime type → GPU. CPU also works.

In [None]:
# 1) Install dependencies
!pip -q install numpy pandas scipy pyyaml matplotlib seaborn tsfresh statsmodels scikit-learn optuna streamlit
# Torch CPU wheel (works everywhere). If GPU torch is already present, you can skip or adjust.
!pip -q install torch --index-url https://download.pytorch.org/whl/cpu

## 2) Get the code
Set your repository URL below.

In [None]:
REPO_URL = 'YOUR_REPO_URL'  # e.g., https://github.com/you/exocode.git

import os, shutil
if os.path.exists('exocode'):
    shutil.rmtree('exocode')
!git clone $REPO_URL exocode
%cd exocode
!python simple_test.py || true

## 3) Add your real data
Upload NPZ/CSV/FITS to the `data/` folder.

In [None]:
from google.colab import files
from pathlib import Path
import shutil
Path('data').mkdir(exist_ok=True)
print('Upload your .npz/.csv/.fits files now:')
uploaded = files.upload()
for name in uploaded:
    shutil.move(name, f'data/{name}')
print('Files in data/:', list(Path('data').glob('*'))[:5], '...')

## 4) Extract TSFresh comprehensive features (~900+)
This uses the CLI with the comprehensive preset.

In [None]:
!python -m exodet.cli extract \
--input "data/*" \
--output outputs/features_tsfresh_comprehensive.csv \
--tier tsfresh \
--tsfresh-params comprehensive \
--workers 4

import pandas as pd
df = pd.read_csv('outputs/features_tsfresh_comprehensive.csv')
print('Features shape:', df.shape)
df.head(3)

## 5) (Optional) Merge labels for supervised training
Upload a CSV with columns: `source,label`. `source` must match the `source` column in features.

In [None]:
from google.colab import files
from pathlib import Path
import pandas as pd
print('Upload labels.csv with columns: source,label (optional)')
labels_up = files.upload()
if labels_up:
    Path('labels').mkdir(exist_ok=True)
    for name in labels_up:
        shutil.move(name, f'labels/{name}')
    features = pd.read_csv('outputs/features_tsfresh_comprehensive.csv')
    labels = pd.read_csv('labels/labels.csv')
    merged = features.merge(labels, on='source', how='inner')
    Path('outputs').mkdir(exist_ok=True)
    merged.to_csv('outputs/features_labeled.csv', index=False)
    print('Labeled features shape:', merged.shape)
else:
    print('No labels uploaded; you can still inspect features or skip supervised steps.')


## 6) Baseline ML training/evaluation (RandomForest)

In [None]:
# Train (requires outputs/features_labeled.csv with a label column)
!test -f outputs/features_labeled.csv && \
python -m exodet.cli train --features outputs/features_labeled.csv --target label --model rf --output runs/rf.joblib || echo 'Skip: no labeled features'

# Evaluate on same file (quick check)
!test -f runs/rf.joblib && \
python -m exodet.cli evaluate --model runs/rf.joblib --features outputs/features_labeled.csv --target label || echo 'Skip: no model or labeled features'

## 7) Siamese model training/evaluation (deep)

In [None]:
# Train Siamese (uses labeled features)
!test -f outputs/features_labeled.csv && \
python -m exodet.cli train-siamese --features outputs/features_labeled.csv --target label --epochs 10 --embedding 32 --device auto --output runs/siamese.pt || echo 'Skip: no labeled features'

# Evaluate Siamese
!test -f runs/siamese.pt && \
python -m exodet.cli evaluate-siamese --model runs/siamese.pt --features outputs/features_labeled.csv --target label --device auto || echo 'Skip: no Siamese model or labeled features'

## 8) Download outputs

In [None]:
from google.colab import files
from pathlib import Path
for p in ['outputs/features_tsfresh_comprehensive.csv', 'outputs/features_labeled.csv', 'runs/rf.joblib', 'runs/siamese.pt']:
    path = Path(p)
    if path.exists():
        files.download(str(path))
    else:
        print('Not found (skipped):', p)