# 01 — Data Cleaning + Feature Engineering

Implements Part 1 of the task and Phase I of the Plan:
- Ingest OHLCV
- Ghost filter (>10% missing)
- Forward-fill only (no bfill)
- Fat-finger check (clip ±50% daily returns)
- Tradeable universe filters (price>=5, dollar volume>=1,000,000)
- Feature engineering (momentum/vol/volume/interaction)

Outputs: `data/processed/features.parquet`

In [None]:
import pandas as pd

from at.data.ingest import load_ohlcv_csv
from at.data.cleaning import clean_ohlcv, CleaningConfig
from at.features.build import build_feature_frame
from at.utils.paths import get_paths

In [None]:
paths = get_paths()
raw_path = paths.data_raw / 'daily_prices.csv'
raw_path

In [None]:
df, schema = load_ohlcv_csv(raw_path)
df.head()

In [None]:
cfg = CleaningConfig()
df_clean = clean_ohlcv(df, schema, cfg)
df_clean.shape

In [None]:
features = build_feature_frame(df_clean, schema)
features.head()

In [None]:
out_path = paths.data_processed / 'features.parquet'
out_path.parent.mkdir(parents=True, exist_ok=True)
features.to_parquet(out_path, index=False)
out_path