diff --git a/PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md b/PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md new file mode 100644 index 00000000..23309f57 --- /dev/null +++ b/PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md @@ -0,0 +1,124 @@ +# INITIAL-MLZOO-A-foundation-feature-frames.md - Feature-Aware Forecasting Foundation + +## FEATURE: + +Create the foundation for feature-aware forecasting in ForecastLabAI. + +This is the first MLZOO PRP input and should become PRP-29. It must not implement LightGBM, XGBoost, Prophet-like models, frontend UI, explainability UI, hyperparameter search, or portfolio/global orchestration. Its job is to make the existing forecasting layer capable of supporting future advanced ML models without breaking current baseline forecasters. + +Goals: + +- Define a feature-aware forecasting contract that supports `fit(y, X=None)` and `predict(horizon, X=None)`. +- Preserve existing target-only baseline models: `naive`, `seasonal_naive`, and `moving_average`. +- Define historical training feature-frame requirements. +- Define future prediction feature-frame requirements. +- Add or document leakage-safe feature-frame generation rules. +- Add load-bearing leakage tests that prove future rows do not use future target values. +- Make future advanced models possible without adding their dependencies yet. + +Expected user value: + +- ForecastLabAI gains a safe foundation for serious ML forecasting. +- Future LightGBM/XGBoost/Prophet-like work can build on a tested frame contract. +- Scenario simulation and explainability can later depend on a consistent feature-frame interface. + +Recommended user story: + +As a forecasting engineer, +I want a leakage-safe feature-frame contract for training and prediction, +So that advanced ML models can be added without breaking baseline models or leaking future data. + +Out of scope: + +- LightGBM implementation. +- XGBoost implementation. +- Prophet-like implementation. +- New database migrations unless absolutely required. +- Frontend pages. +- Agent tools. +- Hyperparameter search. + +## EXAMPLES: + +Read these before PRP creation: + +- `docs/optional-features/05-advanced-ml-model-zoo.md` + - Full feature vision and risks. + +- `PRPs/INITIAL/INITIAL-5.md` + - Existing forecasting model brief. + +- `docs/PHASE/4-FORECASTING.md` + - Current forecasting layer documentation. + +- `app/features/forecasting/models.py` + - Existing `BaseForecaster` and baseline model implementations. + +- `app/features/forecasting/schemas.py` + - Existing model config schemas and discriminated union pattern. + +- `app/features/forecasting/service.py` + - Existing train/predict orchestration. + +- `app/features/forecasting/persistence.py` + - Existing `ModelBundle` persistence. + +- `app/features/featuresets/service.py` + - Existing time-safe feature computation. + +- `app/features/featuresets/schemas.py` + - Feature configuration schemas. + +- `app/features/featuresets/tests/test_leakage.py` + - Existing leakage tests to mirror and extend. + +- `app/features/backtesting/service.py` + - Current backtesting integration points. + +Potential example artifacts: + +- `examples/models/feature_frame_contract.md` + - Describes historical and future frame shape, required columns, safe/unsafe feature classes. + +## DOCUMENTATION: + +- scikit-learn estimator interface conventions: https://scikit-learn.org/stable/developers/develop.html +- scikit-learn Pipeline composition: https://scikit-learn.org/stable/modules/compose.html +- scikit-learn TimeSeriesSplit: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html +- Pandas time series documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html +- Pydantic documentation: https://docs.pydantic.dev/latest/ +- Joblib persistence documentation: https://joblib.readthedocs.io/en/stable/persistence.html + +## OTHER CONSIDERATIONS: + +This PRP is primarily about contracts and leakage safety. + +Required decisions: + +- How to represent feature-aware models without forcing every baseline model to require `X`. +- Whether to introduce a `FeatureAwareForecaster` protocol/base class or extend the existing base interface only. +- Where historical training frames are built. +- Where future prediction frames are built. +- Which feature classes are safe for future frames: + - Safe: calendar features known in advance. + - Conditionally safe: lag/rolling features generated from historical tail and prior predictions. + - Unsafe unless explicitly supplied: future price, promotion, inventory, markdown, exogenous signals. +- How to reject missing future features instead of silently filling misleading defaults. + +Validation expectations: + +- Existing baseline forecasting tests still pass. +- New feature-frame contract tests exist. +- New leakage tests prove future target values are not used. +- Backtesting remains time-safe. +- `uv run pytest -q -m "not integration"` should pass. +- `uv run ruff check app tests` should pass for touched Python code. + +Important gotchas: + +- Do not break current target-only baseline forecasters. +- Do not add LightGBM or other heavy ML dependencies in this PRP. +- Do not silently convert unknown future exogenous values into zeros. +- Do not let training frames include rows after the cutoff date. +- Do not let future prediction frames read true future targets. + diff --git a/PRPs/INITIAL/INITIAL-MLZOO-B-lightgbm-first-model.md b/PRPs/INITIAL/INITIAL-MLZOO-B-lightgbm-first-model.md new file mode 100644 index 00000000..0131295d --- /dev/null +++ b/PRPs/INITIAL/INITIAL-MLZOO-B-lightgbm-first-model.md @@ -0,0 +1,101 @@ +# INITIAL-MLZOO-B-lightgbm-first-model.md - LightGBM First Advanced Model + +## FEATURE: + +Add the first advanced feature-aware model to ForecastLabAI after the MLZOO foundation is merged. + +Preferred model: LightGBM. + +Fallback model: sklearn `HistGradientBoostingRegressor` or another sklearn-native gradient boosting model if LightGBM creates unacceptable dependency or CI risk. + +This PRP must depend on `INITIAL-MLZOO-A-foundation-feature-frames.md` being implemented first. + +Goals: + +- Add one advanced model config schema. +- Add one feature-aware model implementation. +- Support deterministic training. +- Integrate with forecasting train/predict. +- Integrate with backtesting. +- Persist model metadata needed for reproducibility. +- Preserve all existing baseline model behavior. + +Out of scope: + +- XGBoost. +- Prophet-like models. +- Hyperparameter search. +- Portfolio/global models. +- Frontend model administration. +- Explainability UI. + +## EXAMPLES: + +Read these before PRP creation: + +- `PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md` + - Required prerequisite. + +- `docs/optional-features/05-advanced-ml-model-zoo.md` + - Full advanced model vision. + +- `app/features/forecasting/models.py` + - Model factory and baseline model patterns. + +- `app/features/forecasting/schemas.py` + - Model config schema patterns. + +- `app/features/forecasting/service.py` + - Training/prediction service integration. + +- `app/features/forecasting/persistence.py` + - Model bundle save/load behavior. + +- `app/features/backtesting/service.py` + - Backtesting orchestration. + +- `app/features/registry/service.py` + - Registry run metadata patterns. + +Potential example artifacts: + +- `examples/models/advanced_lightgbm.py` + - Minimal training/prediction example. + +## DOCUMENTATION: + +- LightGBM documentation: https://lightgbm.readthedocs.io/ +- LightGBM Python API: https://lightgbm.readthedocs.io/en/stable/Python-API.html +- LightGBM parameters: https://lightgbm.readthedocs.io/en/stable/Parameters.html +- scikit-learn HistGradientBoostingRegressor: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html +- scikit-learn model persistence: https://scikit-learn.org/stable/model_persistence.html +- Joblib persistence documentation: https://joblib.readthedocs.io/en/stable/persistence.html +- Pydantic documentation: https://docs.pydantic.dev/latest/ + +## OTHER CONSIDERATIONS: + +Dependency strategy is the main open risk. + +Required decisions: + +- Whether to add LightGBM as a hard dependency, optional dependency group, or defer to sklearn fallback. +- Exact advanced model config fields. +- How model dependency versions are captured in registry/runtime metadata. +- How prediction rejects missing future feature frames. + +Recommended defaults: + +- Use fixed `random_state` from settings. +- Start with single store/product training. +- Keep the first config conservative. +- Avoid hyperparameter search. +- Persist feature column order. + +Validation expectations: + +- Config schema tests. +- Deterministic training tests. +- Save/load persistence tests. +- Backtesting integration test comparing baseline and advanced model path. +- Tests proving baselines still work unchanged. + diff --git a/PRPs/INITIAL/INITIAL-MLZOO-B.2-feature-aware-backtesting.md b/PRPs/INITIAL/INITIAL-MLZOO-B.2-feature-aware-backtesting.md new file mode 100644 index 00000000..e160a0f7 --- /dev/null +++ b/PRPs/INITIAL/INITIAL-MLZOO-B.2-feature-aware-backtesting.md @@ -0,0 +1,215 @@ +# INITIAL-MLZOO-B.2-feature-aware-backtesting.md - Feature-Aware Backtesting Wiring + +## FEATURE: + +Wire **feature-aware forecasting models** (`RegressionForecaster`, `LightGBMForecaster` — +every model with `requires_features=True`) into the backtesting fold loop so they can be +evaluated by `POST /backtesting/run` and `train`/`backtest` jobs — **without** introducing +target leakage and **without** train/serve skew. + +This is the explicit follow-up deferred by **PRP-30 DECISIONS LOCKED #6** and named there as +`PRP-MLZOO-B.2`. It depends on MLZOO-A (PRP-29, merged `b116489`) and MLZOO-B (PRP-30, merged +`2f1b8a5` / PR #243) being in place. It is the third MLZOO unit, after A (foundation) and +B (first advanced model), and before C (XGBoost/Prophet) and D (frontend/registry). + +### Why current backtesting is incompatible with feature-aware models + +The backtesting slice was built for target-only baseline models and has three structural +gaps that block feature-aware models: + +1. **No exogenous data is loaded.** `BacktestingService._load_series_data` + (`app/features/backtesting/service.py`) selects only `(date, quantity)` from `sales_daily`. + `SeriesData` carries only `dates` + `values`. A feature-aware model needs the canonical + 14-column frame (`canonical_feature_columns()`): target lags, calendar columns, and the + exogenous columns `price_factor` / `promo_active` / `is_holiday` / `days_since_launch`. + +2. **The fold loop is target-only.** `_run_model_backtest` calls `model.fit(y_train)` and + `model.predict(horizon)` with **no `X`**. `RegressionForecaster` / `LightGBMForecaster` + (`requires_features=True`) raise `ValueError("… requires exogenous features X …")` at + `fit()`. That loud failure is the *current interim contract*, pinned by + `app/features/backtesting/tests/test_service.py::test_feature_aware_model_fails_loud_in_backtest` + (cites PRP-29 DECISIONS LOCKED #7). This PRP **supersedes** that interim contract. + +3. **`JobService._execute_backtest` hard-rejects** any `model_type` other than + `naive` / `seasonal_naive` / `moving_average` with `ValueError("Unsupported model_type…")`. + +A backtest of a feature-aware model is the only honest way to compare an advanced model +against the naive/seasonal baselines — without it, PRP-30's LightGBM model can be trained +and scenario-re-forecast but never *evaluated*. + +### Per-fold X_train and X_future construction + +The backtest must build, **per fold**, two leakage-safe feature matrices: + +- **`X_train`** — the historical feature matrix restricted to the fold's train rows. Every + column is observed-and-knowable: target lags read strictly-earlier observed targets, + calendar columns are pure functions of the date, exogenous columns read same-day observed + price/promotion/holiday/launch attributes. This is exactly the matrix + `ForecastingService._build_regression_features` / `_assemble_regression_rows` already + builds for training — it must be **reused**, not re-derived. + +- **`X_future`** — the feature matrix for the fold's test window (`test_size` days, after the + `gap`). The test window has **no observed target**, so its target-lag columns must be built + with the leakage-safe future-lag rule (`build_long_lag_columns`): a lag cell whose source + day lies in the test window is `NaN`, never the observed test value. + +### Leakage-safe fold contracts + +Each fold defines a forecast origin `T` = the fold's `train_end`. The invariant, mirroring +`app/shared/feature_frames/tests/test_leakage.py` and +`app/features/featuresets/tests/test_leakage.py`: + +> A feature value for a test-window day `D` may use ONLY information knowable at the fold +> origin `T`: the observed history at or before `T`, or the calendar (a pure function of the +> date). It may **NEVER** read an observed target at a test-window day. + +The `gap` between `train_end` and `test_start` simulates operational data latency — the +fold's `history_tail` ends at `T` and does **not** include the gap days; lag columns whose +source falls in the gap are therefore `NaN` too. + +The PRP must classify every canonical feature column with the **existing** +`app/shared/feature_frames.FeatureSafety` taxonomy (`SAFE` / `CONDITIONALLY_SAFE` / +`UNSAFE_UNLESS_SUPPLIED`) and decide, per class, how `X_future` is populated. It must also +draw the line — explicitly — between **target leakage** (reading `y` at a horizon day: +forbidden, must be structurally impossible) and **exogenous foresight** (assuming the future +price/promotion calendar is known: a disclosed modelling choice, not target leakage). The +result must record which exogenous policy it used so the metric is interpreted honestly. + +### Async / DB-backed loading requirements + +The fold loop (`_run_model_backtest`) is currently **synchronous and DB-free** — a property +worth keeping (it is unit-testable without a database). Feature-aware folds need exogenous +data (`unit_price`, `promotion` windows, `calendar` holidays, `product.launch_date`) that +only the DB has. The PRP must resolve all of that **async, once, up front** in `run_backtest` +(already `async`) into pure in-memory arrays, then keep the per-fold builders pure and sync — +mirroring how `ForecastingService._build_regression_features` resolves async then delegates +to the pure `_assemble_regression_rows`. + +### How feature-aware models should fail loudly until supported + +Even after this PRP, some paths stay unsupported and must fail **loud**, never silent: + +- A feature-aware model whose required feature frame cannot be sourced (e.g. an + `UNSAFE_UNLESS_SUPPLIED` column with no observed data-platform record and no supplied + assumptions) → raise `ValueError`, never a silent `NaN`/`0.0` fill. +- An unclassifiable feature column (`FeatureSafety` / `feature_safety()` raises `KeyError`) + → propagate loudly. +- The interim `test_feature_aware_model_fails_loud_in_backtest` is **repurposed**, not + deleted: feature-aware backtesting now *succeeds* on the supported path, so the test + becomes (a) a positive "regression model backtests and yields metrics" test plus (b) a new + loud-fail test for the genuinely-unsupported path. + +### Explicit out-of-scope items + +- **No new model families.** No XGBoost, no Prophet-like models — that is MLZOO-C. This PRP + wires the *existing* `regression` and `lightgbm` forecasters into backtesting and adds none. +- **No frontend work.** No changes to `frontend/`, no backtest-page model selector, no new UI. + The `/visualize/backtest` job-result contract (`_shape_backtest_result`) stays byte-stable. +- **No explainability work.** No driver attribution, no `ForecastExplanation`, no + `/explain/*` change — that is MLZOO-D. +- **No scenario persistence changes.** No `scenario_plan` schema change, no `/scenarios/*` + contract change. The scenarios slice's `feature_frame.py` is read as a reference pattern + only; it is not modified. +- **No hyperparameter search, no portfolio/global models, no recursive multi-step + forecasting.** `NaN`-as-unknown for future-sourced lags is kept (PRP-29 DECISIONS #6). +- **No new Alembic migration** — this PRP adds no table or column to the database. + +## EXAMPLES: + +Read these before PRP creation: + +- `PRPs/PRP-29-feature-aware-forecasting-foundation.md` + - The merged foundation. DECISIONS LOCKED #2 (`requires_features` capability flag), + #6 (NaN-as-unknown), #7 (the interim backtest loud-fail this PRP supersedes). + +- `PRPs/PRP-30-lightgbm-first-advanced-model.md` + - The merged first advanced model. DECISIONS LOCKED #6 explicitly defers feature-aware + backtesting to *this* PRP and states why (`_run_model_backtest` is sync, DB-free, + target-only). + +- `app/features/backtesting/service.py` + - `_load_series_data` (target-only load), `SeriesData` (the container to extend), + `_run_model_backtest` (the sync fold loop to branch), `_run_baseline_comparisons`. + +- `app/features/backtesting/splitter.py` + - `TimeSeriesSplitter` — index-based, needs **no change**; each `TimeSeriesSplit` already + carries `train_indices` / `test_indices` / `train_dates` / `test_dates`. + +- `app/features/forecasting/service.py` + - `_build_regression_features` (async exogenous resolution pattern) and + `_assemble_regression_rows` (the pure, leakage-safe historical row builder to promote). + +- `app/shared/feature_frames/contract.py` + - `canonical_feature_columns()`, `build_long_lag_columns`, `build_calendar_columns`, + the `FeatureSafety` enum and `feature_safety()` classifier — the contract to reuse. + +- `app/features/scenarios/feature_frame.py` + - `assemble_future_frame` / `build_exogenous_columns` — the future-frame *pattern* for a + feature-aware model. Reference only; do not import (cross-slice import is forbidden). + +- `app/shared/feature_frames/tests/test_leakage.py` and + `app/features/forecasting/tests/test_regression_features_leakage.py` + - The load-bearing leakage-test patterns the new builders must follow. + +- `app/features/jobs/service.py` + - `_execute_backtest` (the `model_type` allow-list to widen) and `_shape_backtest_result` + (the frontend contract that must NOT drift). + +## DOCUMENTATION: + +- scikit-learn TimeSeriesSplit: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html +- scikit-learn HistGradientBoostingRegressor (NaN-tolerant): https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html +- LightGBM (NaN handling / missing values): https://lightgbm.readthedocs.io/en/stable/Advanced-Topics.html +- Forecasting cross-validation / leakage: https://otexts.com/fpp3/tscv.html +- Pandas time series documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html +- Pydantic documentation: https://docs.pydantic.dev/latest/ +- `PRPs/ai_docs/exogenous-regressor-forecasting.md` — repo-local notes on exogenous-regressor + forecasting and the leakage-safe future-frame rule. + +## OTHER CONSIDERATIONS: + +The main architectural risk is the **train/serve-skew and leakage boundary** — get the +`X_future` exogenous-column policy wrong and the backtest is either leaky (optimistic and +worthless) or skewed (the model sees different feature distributions at train vs evaluate). + +Required decisions for the PRP: + +- **Where the per-fold builders live.** `backtesting` may not import `forecasting` or + `scenarios` (vertical-slice rule). The pure row builders must be promoted into + `app/shared/feature_frames` (the sanctioned cross-cutting home) so both slices consume one + definition. The promotion must be **additive** — `forecasting`'s `_assemble_regression_rows` + and its leakage test must keep working unchanged (a thin delegating shim is acceptable). + +- **The `X_future` exogenous-column policy.** `price_factor` / `promo_active` are + `UNSAFE_UNLESS_SUPPLIED`. In a *scenario* a planner supplies them; a *backtest* has no + assumptions. Decide v1 policy explicitly (recorded as a field on the result), and document + it as target-leakage-free but optimistic (it assumes the future promo/price calendar was + known at `T`). + +- **`gap` handling for future-lag columns.** With `gap > 0` the first test day is + `T + gap + 1`; `build_long_lag_columns` indexes test day `m` as `T + m`. The PRP must state + the offset correction (build `gap + test_size` rows, drop the first `gap`). + +- **The job-result contract.** `_shape_backtest_result` feeds `/visualize/backtest`. Any new + field must be additive; the existing keys must not move. + +Recommended defaults: + +- Reuse `canonical_feature_columns()` for both `X_train` and `X_future` — identical column + set and order on both sides is the train/serve-skew guard. +- Keep the per-fold builders pure and sync; do all DB I/O once in `run_backtest`. +- Branch on `model.requires_features` (a capability flag), never on a `model_type` string. +- Keep `NaN`-as-unknown for future-sourced lag cells — the estimators tolerate it natively. +- Prefer a loud `ValueError` on an unsupported path over any implicit guess or silent fill. + +Validation expectations: + +- Shared-builder leakage tests proving `X_future` lag cells are `NaN` where their source is a + test-window day (the new load-bearing spec). +- Unit tests for the per-fold `X_train` / `X_future` builders (pure, no DB). +- An integration test running a real DB-backed feature-aware (`regression`) backtest and + comparing it against the naive/seasonal baselines in the same response. +- A route test: `POST /backtesting/run` with a `regression` model config → `200`. +- A jobs test: a `backtest` job with `model_type="regression"` succeeds. +- A repurposed loud-fail test for the genuinely-unsupported feature-aware path. +- Regression proof: every existing baseline backtest test stays green with no behaviour change. diff --git a/PRPs/INITIAL/INITIAL-MLZOO-C-xgboost-prophet-extensions.md b/PRPs/INITIAL/INITIAL-MLZOO-C-xgboost-prophet-extensions.md new file mode 100644 index 00000000..1ad89ad1 --- /dev/null +++ b/PRPs/INITIAL/INITIAL-MLZOO-C-xgboost-prophet-extensions.md @@ -0,0 +1,74 @@ +# INITIAL-MLZOO-C-xgboost-prophet-extensions.md - XGBoost and Prophet-like Extensions + +> **This brief is split into TWO PRPs — two branches, two review units. Never one.** +> This INITIAL is the shared brief for both, but the two models are delivered separately: +> +> - **`PRPs/PRP-MLZOO-C1-xgboost-model.md`** — the XGBoost half. A low-risk follow-up that +> mirrors the merged `LightGBMForecaster` design (optional `ml-xgboost` extra, feature +> flag, lazy import, deterministic training, registry metadata). +> - **`PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md`** — the Prophet-like half. A +> distinct model-family design task — a pure-scikit-learn additive linear model with +> trend / seasonality / holiday-regressor decomposition; **not** a clone of the tree +> models and **not** the real `prophet` dependency. +> +> Do not combine the two models into a single PRP or a single branch. The "Out of scope" +> lists below still apply to *each* PRP individually (e.g. C1 does not touch Prophet-like +> work; C2 does not touch XGBoost). See `INITIAL-MLZOO-index.md` for the updated roadmap. + +## FEATURE: + +Extend the Advanced ML Model Zoo after the feature-frame foundation and first advanced model path are stable. + +This INITIAL is for later work, not PRP-29. + +Goals: + +- Add XGBoost as a second tree-based feature-aware model. +- Add a Prophet-like additive model path or choose the real Prophet dependency if justified. +- Support holiday/regressor-style features where appropriate. +- Add model-family-specific validation and metadata. + +Out of scope: + +- Foundation feature-frame work. +- First advanced model architecture. +- Frontend/explainability polish unless explicitly needed. +- Hyperparameter search unless separately scoped. + +## EXAMPLES: + +Read these before PRP creation: + +- `PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md` + - Foundation dependency. + +- `PRPs/INITIAL/INITIAL-MLZOO-B-lightgbm-first-model.md` + - First advanced model pattern to follow. + +- `app/features/forecasting/models.py` + - Model factory and advanced model pattern. + +- `app/features/forecasting/schemas.py` + - Config schema pattern. + +- `app/features/featuresets/service.py` + - Regressor and calendar feature source. + +## DOCUMENTATION: + +- XGBoost documentation: https://xgboost.readthedocs.io/en/stable/ +- XGBoost Python package documentation: https://xgboost.readthedocs.io/en/stable/python/ +- XGBoost parameters: https://xgboost.readthedocs.io/en/stable/parameter.html +- Prophet documentation: https://facebook.github.io/prophet/docs/quick_start.html +- Prophet seasonality, holidays, and regressors: https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html +- scikit-learn model persistence: https://scikit-learn.org/stable/model_persistence.html +- Pandas time series documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html + +## OTHER CONSIDERATIONS: + +- XGBoost should mirror the first advanced model path where possible. +- Prophet-like work should be carefully evaluated because dependency weight and API shape differ from sklearn-style regressors. +- Real Prophet support should be chosen only if install/runtime constraints are acceptable. +- A lightweight additive sklearn model may be safer than the real Prophet dependency. +- Holiday/regressor support must use known-in-advance or explicitly supplied future values. + diff --git a/PRPs/INITIAL/INITIAL-MLZOO-D-frontend-registry-explainability.md b/PRPs/INITIAL/INITIAL-MLZOO-D-frontend-registry-explainability.md new file mode 100644 index 00000000..e65591b7 --- /dev/null +++ b/PRPs/INITIAL/INITIAL-MLZOO-D-frontend-registry-explainability.md @@ -0,0 +1,69 @@ +# INITIAL-MLZOO-D-frontend-registry-explainability.md - Frontend, Registry, and Explainability Polish + +## FEATURE: + +Expose Advanced ML Model Zoo capabilities in the product after backend model contracts and at least one advanced model are stable. + +This INITIAL is for later work, not PRP-29. + +Goals: + +- Add model selection UI where useful. +- Surface advanced model metadata in run detail and comparison pages. +- Show feature config, feature columns, dependency versions, and model family metadata. +- Add basic feature importance or explanation hooks where available. +- Update docs/admin surfaces so operators understand advanced model constraints. + +Out of scope: + +- Core feature-frame foundation. +- First advanced model backend implementation. +- XGBoost/Prophet backend implementation. +- Full SHAP explainability unless separately scoped. + +## EXAMPLES: + +Read these before PRP creation: + +- `PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md` + - Foundation dependency. + +- `PRPs/INITIAL/INITIAL-MLZOO-B-lightgbm-first-model.md` + - First advanced model dependency. + +- `frontend/src/pages/explorer/runs.tsx` + - Existing run table. + +- `frontend/src/pages/explorer/run-detail.tsx` + - Existing run detail surface. + +- `frontend/src/pages/explorer/run-compare.tsx` + - Existing comparison surface. + +- `frontend/src/pages/visualize/forecast.tsx` + - Forecast visualization page. + +- `frontend/src/pages/visualize/backtest.tsx` + - Backtest visualization page. + +- `app/features/registry/schemas.py` + - Backend response contracts for run metadata. + +## DOCUMENTATION: + +- React Router documentation: https://reactrouter.com/home +- TanStack Query documentation: https://tanstack.com/query/latest/docs/framework/react/overview +- TanStack Table documentation: https://tanstack.com/table/latest/docs/overview +- shadcn/ui documentation: https://ui.shadcn.com/docs +- Recharts documentation: https://recharts.org/en-US/ +- SHAP documentation: https://shap.readthedocs.io/en/stable/ +- scikit-learn permutation importance: https://scikit-learn.org/stable/modules/permutation_importance.html + +## OTHER CONSIDERATIONS: + +- Do not create frontend controls before backend contracts are stable. +- Avoid adding a large admin panel if run detail and comparison pages are enough. +- Keep advanced model metadata readable and compact. +- Feature importance must be clearly labeled as model-derived, not causal truth. +- Browser QA is required for all frontend additions. + diff --git a/PRPs/INITIAL/INITIAL-MLZOO-index.md b/PRPs/INITIAL/INITIAL-MLZOO-index.md new file mode 100644 index 00000000..7d2634a2 --- /dev/null +++ b/PRPs/INITIAL/INITIAL-MLZOO-index.md @@ -0,0 +1,88 @@ +# INITIAL-MLZOO-index.md - Advanced ML Model Zoo Roadmap + +## FEATURE: + +Split the Advanced ML Model Zoo into multiple INITIAL briefs so each future PRP can remain small, reviewable, and implementation-safe. + +This index is the roadmap for the MLZOO sequence. Do not create one PRP that implements the full model zoo. The correct flow is: + +1. Use this index to understand the full architecture. +2. Use `INITIAL-MLZOO-A-foundation-feature-frames.md` as the first PRP input. +3. Implement and merge the foundation before creating PRPs for later parts. +4. Promote B, C, and D into PRPs only after their prerequisites are stable. + +Recommended PRP sequence: + +| Order | INITIAL | Intended PRP | Purpose | +| --- | --- | --- | --- | +| 1 | `INITIAL-MLZOO-A-foundation-feature-frames.md` | PRP-29 | Feature-aware forecasting foundation and leakage-safe frame contracts | +| 2 | `INITIAL-MLZOO-B-lightgbm-first-model.md` | PRP-30 | First advanced model path with LightGBM (optional `ml-lightgbm` extra) | +| 2.5 | `INITIAL-MLZOO-B.2-feature-aware-backtesting.md` | PRP-MLZOO-B.2 | Wire feature-aware models into the backtesting fold loop (per-fold leakage-safe `X_train` / `X_future`) | +| 3a | `INITIAL-MLZOO-C-xgboost-prophet-extensions.md` (XGBoost half) | PRP-MLZOO-C1 | XGBoost feature-aware model — a low-risk follow-up mirroring the merged LightGBM design (optional `ml-xgboost` extra) | +| 3b | `INITIAL-MLZOO-C-xgboost-prophet-extensions.md` (Prophet-like half) | PRP-MLZOO-C2 | Prophet-like additive model — a distinct model-family design (pure scikit-learn; trend / seasonality / regressor decomposition) | +| 4 | `INITIAL-MLZOO-D-frontend-registry-explainability.md` | Future PRP | UI, registry surfacing, and explanation polish | + +**C is two PRPs, not one.** `INITIAL-MLZOO-C` briefs both XGBoost and a Prophet-like model, +but they are deliberately split into **two separate PRPs, branches, and review units** — +`PRP-MLZOO-C1` (XGBoost) and `PRP-MLZOO-C2` (Prophet-like). They are additive and +order-independent; whichever merges second rebases cleanly. Do **not** combine them into a +single branch or a single review unit (this honours the "one reviewable unit" rule below). + +Dependency graph: + +```text +A. Foundation feature frames + -> B. LightGBM first model + -> B.2 Feature-aware backtesting + -> C1. XGBoost model (separate review unit) + -> C2. Prophet-like model (separate review unit; parallel to C1) + -> D. Frontend / registry / explainability +``` + +The full vision is documented in `docs/optional-features/05-advanced-ml-model-zoo.md`. + +## EXAMPLES: + +Read these before creating any MLZOO PRP: + +- `docs/optional-features/05-advanced-ml-model-zoo.md` + - Full optional-feature concept and documentation links. + +- `PRPs/INITIAL/INITIAL-5.md` + - Earlier forecasting model brief, including baseline model zoo and global ML hooks. + +- `docs/PHASE/4-FORECASTING.md` + - Completed forecasting phase, model interface, configs, persistence, service, and API behavior. + +- `app/features/forecasting/models.py` + - Current baseline model interface. + +- `app/features/featuresets/service.py` + - Existing time-safe feature engineering. + +- `app/features/featuresets/tests/test_leakage.py` + - Existing leakage-safety testing pattern. + +## DOCUMENTATION: + +- LightGBM documentation: https://lightgbm.readthedocs.io/ +- XGBoost documentation: https://xgboost.readthedocs.io/en/stable/ +- Prophet documentation: https://facebook.github.io/prophet/docs/quick_start.html +- scikit-learn model persistence: https://scikit-learn.org/stable/model_persistence.html +- scikit-learn TimeSeriesSplit: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html +- scikit-learn Pipeline composition: https://scikit-learn.org/stable/modules/compose.html +- Pandas time series documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html +- Joblib persistence documentation: https://joblib.readthedocs.io/en/stable/persistence.html +- Pydantic documentation: https://docs.pydantic.dev/latest/ +- FastAPI documentation: https://fastapi.tiangolo.com/ + +## OTHER CONSIDERATIONS: + +- The first PRP should be generated from `INITIAL-MLZOO-A-foundation-feature-frames.md`. +- Do not implement LightGBM before the feature-frame contracts and leakage tests are stable. +- Do not implement XGBoost or Prophet-like models before the first advanced model path proves the architecture. +- Do not add frontend/explainability scope before backend metadata and persistence contracts are stable. +- Keep each PRP to one branch and one reviewable unit. In particular, `INITIAL-MLZOO-C`'s + two models (XGBoost, Prophet-like) are **two PRPs** — `PRP-MLZOO-C1` and `PRP-MLZOO-C2` — + never one combined branch. + diff --git a/PRPs/PRP-29-feature-aware-forecasting-foundation.md b/PRPs/PRP-29-feature-aware-forecasting-foundation.md new file mode 100644 index 00000000..f60937df --- /dev/null +++ b/PRPs/PRP-29-feature-aware-forecasting-foundation.md @@ -0,0 +1,838 @@ +name: "PRP-29 — Feature-Aware Forecasting Foundation (MLZOO-A)" +description: | + +## Purpose + +The first PRP of the **Advanced ML Model Zoo** sequence (`PRPs/INITIAL/INITIAL-MLZOO-index.md`). +It builds the *foundation* a later LightGBM / XGBoost / Prophet-like model will stand on: +a single, leakage-safe, **shared** feature-frame contract — so a future advanced model can be +added without re-deriving the frame machinery and without breaking the baseline forecasters. + +This PRP implements **contracts, consolidation, and leakage tests only**. It adds **no** +advanced model, **no** new dependency, **no** migration, **no** frontend, **no** agent tool, +and **no** API behaviour change. If you find yourself implementing LightGBM, stop — that is +PRP-MLZOO-B. + +## What this PRP already inherits (DO NOT re-build) + +The feature-aware *machinery* already exists — it is just **fragmented and duplicated**: + +- `BaseForecaster` (`app/features/forecasting/models.py:47`) already has the feature-aware + signature: `fit(y, X=None)` / `predict(horizon, X=None)`. The three baselines ignore `X` + (every `fit`/`predict` carries `# noqa: ARG002`); `RegressionForecaster` (`models.py:428`) + *consumes* it and is the first feature-aware model. +- The **historical training frame** is built by `ForecastingService._build_regression_features` + (`app/features/forecasting/service.py:454-595`). +- The **future prediction frame** is built by `app/features/scenarios/feature_frame.py` + (the leakage-safe `build_*_columns` + `assemble_future_frame` + `build_future_frame`). +- The leakage-safe rule and the long-lag-vs-recursion decision are documented in + `PRPs/ai_docs/exogenous-regressor-forecasting.md` (§2, §5). + +The **problem this PRP fixes**: the regression feature-column contract is **physically +duplicated** across two slices — `_REGRESSION_FEATURE_COLUMNS` (`forecasting/service.py:87-99`) +and `canonical_feature_columns()` / `CALENDAR_COLUMNS` / `EXOGENOUS_COLUMNS` +(`scenarios/feature_frame.py:74-127`) — because a cross-slice import is forbidden +(AGENTS.md § Architecture, PRP-27 DECISIONS LOCKED #3). They are kept in lock-step only by a +fragile integration-test side-effect ("an empty-assumption simulation must yield a zero +delta"). A future model cannot safely build on a contract that lives in two places. + +## DEPENDS ON — read before starting + +- `PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md` — this PRP's brief. +- `PRPs/INITIAL/INITIAL-MLZOO-index.md` — the MLZOO roadmap (A → B → C → D). +- `docs/optional-features/05-advanced-ml-model-zoo.md` — the full model-zoo vision and risks. +- `PRPs/ai_docs/exogenous-regressor-forecasting.md` — the exogenous-regressor + future-frame + leakage reference (§1 contract, §2 leakage rule, §5 de-risking recommendations). + +--- + +## Goal + +Move the regression feature-frame **contract** and its **leakage-safe pure builders** into a +single cross-cutting module, `app/shared/feature_frames/`, that both the forecasting slice +(training frame) and the scenarios slice (future frame) import — eliminating the duplicated +`_REGRESSION_FEATURE_COLUMNS` ↔ `canonical_feature_columns()` pair. Formalise the +feature-aware model contract with a `requires_features` class attribute, document the +historical/future frame requirements and the safe / conditionally-safe / unsafe feature-class +taxonomy, and add load-bearing leakage tests for the shared builders and the historical +training builder. + +**End state:** there is exactly **one** definition of the regression feature-column set, +imported (not re-typed) by both slices; a future advanced model in PRP-MLZOO-B sets +`requires_features = True` and reuses the shared frame builders with zero new contract code. + +## Why + +- **Foundation for the model zoo.** PRP-MLZOO-B (LightGBM / sklearn fallback) needs a tested, + single-source frame contract. Today it would have to choose *which* of two duplicated column + lists to extend — a guaranteed drift bug. +- **Eliminates a latent correctness hazard.** A silent mismatch between the two column lists + corrupts the `model_exogenous` re-forecast (the model is fed columns in the wrong order). + Today only an integration-test side-effect catches it; after this PRP a mismatch is + structurally impossible (one shared list). +- **Codifies the leakage rules.** `docs/optional-features/05-advanced-ml-model-zoo.md:158-163` + names "Future feature generation is easy to get wrong" and "Backtests must prevent leakage + for every generated feature" as the top risks. This PRP turns the implicit rules into a + documented taxonomy + a load-bearing test file. +- **No behaviour change, no risk to baselines.** Pure consolidation + tests + docs. The + baseline forecasters, the registry, persisted model bundles, and every HTTP/WS contract are + untouched. + +## What + +A refactor-and-document PRP. User-visible behaviour is **identical** before and after; the +value is entirely structural (one contract, tested rules, a foundation doc). + +### Technical requirements + +1. New cross-cutting package `app/shared/feature_frames/` owning: the pinned constants + (`EXOGENOUS_LAGS`, `HISTORY_TAIL_DAYS`), the column-name tuples (`CALENDAR_COLUMNS`, + `EXOGENOUS_COLUMNS`), `canonical_feature_columns()`, the `FutureFeatureFrame` dataclass, + the leakage-safe pure builders (`build_calendar_columns`, `build_long_lag_columns`), and a + `FeatureSafety` taxonomy (`FEATURE_CLASS` map). +2. `app/features/scenarios/feature_frame.py` imports the above from the shared package and + **re-exports** them (back-compat for existing importers); it keeps only the + assumption-driven, DB-touching parts (`build_exogenous_columns`, `assemble_future_frame`, + `build_future_frame`, `MAX_COMPARE_SCENARIOS`). +3. `app/features/forecasting/service.py::_build_regression_features` imports the shared + contract; the local `_REGRESSION_FEATURE_COLUMNS` / `_REGRESSION_LAGS` / + `_REGRESSION_HISTORY_TAIL_DAYS` constants are deleted. +4. `BaseForecaster` gains a `requires_features: ClassVar[bool] = False`; `RegressionForecaster` + overrides it to `True`. `ForecastingService.train_model` / `predict` branch on + `model.requires_features` instead of the `config.model_type == "regression"` string check. +5. Load-bearing leakage tests: `app/shared/feature_frames/tests/test_leakage.py` (shared + builders) and `app/features/forecasting/tests/test_regression_features_leakage.py` + (historical training builder). +6. The curated contract doc `examples/models/feature_frame_contract.md` (historical vs future + frame shape, required columns, the safe/conditional/unsafe taxonomy); an additive update to + `examples/models/model_interface.md`. + +### Success Criteria + +- [ ] The regression feature-column set is defined **exactly once** (`canonical_feature_columns()` + in `app/shared/feature_frames/`); `grep -rn "_REGRESSION_FEATURE_COLUMNS" app/` returns nothing. +- [ ] `app/shared/feature_frames/` imports nothing from `app/features/**` (verified by a test). +- [ ] `BaseForecaster.requires_features` exists; `NaiveForecaster/SeasonalNaiveForecaster/MovingAverageForecaster` → `False`, `RegressionForecaster` → `True`. +- [ ] All existing tests pass unchanged: baseline forecasters, `test_regression_forecaster.py`, + every scenarios test (including the empty-assumption zero-delta integration test). +- [ ] New leakage tests prove no shared builder and no historical training row ever reads a + target value at or after the forecast origin / cutoff. +- [ ] `examples/models/feature_frame_contract.md` exists and documents both frame shapes + the taxonomy. +- [ ] `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` all green. +- [ ] No new dependency in `pyproject.toml`; no Alembic migration; no change to any route, schema, or WebSocket contract. + +--- + +## All Needed Context + +### Documentation & References + +```yaml +- file: app/features/scenarios/feature_frame.py + why: The future-frame builder. Its lines 62-127 (constants, CALENDAR_COLUMNS, + EXOGENOUS_COLUMNS, canonical_feature_columns) and 93-216 (FutureFeatureFrame, + _is_month_end, build_calendar_columns, build_long_lag_columns) MOVE VERBATIM to the + shared package. Lines 219-407 (build_exogenous_columns, assemble_future_frame, + build_future_frame) STAY — they depend on ScenarioAssumptions / the calendar table. + critical: build_long_lag_columns is the leakage-critical helper; its `idx = (j-1)-k`, + `idx < 0` guard is the spec. Move it byte-for-byte; do not "improve" it. + +- file: app/features/scenarios/tests/test_future_frame_leakage.py + why: The load-bearing leakage spec to mirror. The calendar/long-lag tests (the builders that + MOVE) become app/shared/feature_frames/tests/test_leakage.py; the exogenous/assemble + tests STAY here. Mirror its disjoint value-pool idiom (lines 50-56). + critical: AGENTS.md § Safety — a *_leakage.py file may never be weakened. Splitting it + across the move is allowed; deleting an assertion is not. + +- file: app/features/forecasting/service.py + why: Lines 74-99 are the duplicated constants to DELETE; lines 454-595 + (_build_regression_features) import the shared contract instead; lines 182-216 + (train_model branch) and 348-353 (predict branch) switch to `requires_features`. + +- file: app/features/forecasting/models.py + why: BaseForecaster (line 47) gets the `requires_features` ClassVar; RegressionForecaster + (line 428) overrides it. The `# noqa: ARG002` on the baseline fit/predict is the marker + that "this model ignores X" — `requires_features=False` is the formal version of it. + +- file: app/features/featuresets/tests/test_leakage.py + why: The canonical sequential-value leakage idiom for the *historical* builder test. + Mirror its two-tier assertion (direction check THEN exact-equality check) and the + "LEAKAGE DETECTED at row {i}" message convention. + +- file: app/features/forecasting/persistence.py + why: ModelBundle stores `feature_columns` in `metadata` as a plain list[str]. Moving the + *function* that produces those strings does not change the strings — persisted bundles + stay loadable. Do NOT change ModelBundle. + +- file: app/shared/seeder/ + why: The precedent for a package under app/shared/ (with its own tests/ subdir). Mirror its + layout: __init__.py re-exporting the public surface, tests/ alongside. + +- docfile: PRPs/ai_docs/exogenous-regressor-forecasting.md + why: §2 states the future-frame leakage rule verbatim and the feature-family table; §5 is + the "long-lag + calendar + exogenous, no recursion" decision. The taxonomy in this PRP + is the executable form of that table. Reference it from feature_frame_contract.md. + +- doc: https://scikit-learn.org/stable/developers/develop.html + section: Estimator interface conventions (get_params / set_params / fit returns self) + critical: BaseForecaster already follows this. requires_features is an ADDITIVE class + attribute — it does not break the sklearn-style contract. +``` + +### Current Codebase tree (relevant slices — all already exist) + +```bash +app/ +├── shared/ +│ ├── __init__.py +│ ├── models.py +│ ├── schemas.py +│ ├── utils.py +│ └── seeder/ # precedent: a package under app/shared/ with tests/ +├── features/ +│ ├── forecasting/ +│ │ ├── models.py # BaseForecaster + 4 forecasters + model_factory +│ │ ├── schemas.py # ModelConfig union, TrainRequest/PredictRequest +│ │ ├── service.py # _build_regression_features + _REGRESSION_* constants +│ │ ├── persistence.py # ModelBundle (UNTOUCHED) +│ │ └── tests/ +│ │ ├── test_regression_forecaster.py +│ │ └── test_service.py +│ ├── scenarios/ +│ │ ├── feature_frame.py # future-frame builder + duplicated contract +│ │ ├── service.py # imports build_future_frame +│ │ ├── schemas.py +│ │ └── tests/ +│ │ ├── conftest.py # imports canonical_feature_columns +│ │ ├── test_feature_frame.py +│ │ └── test_future_frame_leakage.py # load-bearing spec +│ └── backtesting/ +│ ├── service.py # _run_model_backtest fold loop (target-only) +│ └── tests/test_service.py +examples/models/ +├── baseline_naive.py / baseline_seasonal.py / baseline_mavg.py +└── model_interface.md # stale: no regression/lightgbm config rows +PRPs/ai_docs/exogenous-regressor-forecasting.md +``` + +### Desired Codebase tree — files to ADD + +```bash +app/shared/feature_frames/ +├── __init__.py # public re-exports of contract.py +├── contract.py # constants + taxonomy + columns + FutureFeatureFrame +│ # + build_calendar_columns + build_long_lag_columns +└── tests/ + ├── __init__.py + ├── test_contract.py # column order, taxonomy, dataclass shape, determinism + └── test_leakage.py # LOAD-BEARING: calendar + long-lag leakage spec + +app/features/forecasting/tests/ +└── test_regression_features_leakage.py # LOAD-BEARING: historical training-frame leakage + +examples/models/ +└── feature_frame_contract.md # the curated contract doc (INITIAL-A asks for this) +``` + +### Files to MODIFY (all additive or behaviour-preserving) + +```bash +app/features/scenarios/feature_frame.py # import from shared + re-export; delete moved defs +app/features/scenarios/tests/test_feature_frame.py # update imports (re-export keeps it passing) +app/features/scenarios/tests/test_future_frame_leakage.py # trim moved tests; keep exogenous/assemble +app/features/forecasting/models.py # + requires_features ClassVar +app/features/forecasting/service.py # import shared contract; requires_features branching +app/features/forecasting/tests/test_service.py # + requires_features assertions +app/features/backtesting/tests/test_service.py # + 1 guard test (no production-code change) +examples/models/model_interface.md # additive: requires_features + regression row +``` + +### DECISIONS LOCKED (resolved during planning — do NOT re-litigate) + +1. **Contract home = `app/shared/feature_frames/`.** A cross-cutting package (not a new + vertical slice, not document-only). Both `forecasting` and `scenarios` import it. This is + sanctioned by AGENTS.md § Architecture ("cross-cutting code goes through `app/core/` or + `app/shared/`"). It is **not** a vertical slice — it has no `models.py`/`routes.py`/router; + it is a pure library, like `app/shared/utils.py`. + +2. **No `FeatureAwareForecaster` subclass.** `BaseForecaster` already carries the feature-aware + signature. Formalise it with a `requires_features: ClassVar[bool]` attribute instead of a + new base class — zero churn to the class hierarchy, zero change to persisted bundle types, + and the service branches on `model.requires_features` with no `isinstance` check. A future + `LightGBMForecaster` just sets `requires_features = True`. + +3. **`POST /forecasting/predict` is NOT changed.** It still rejects feature-aware models + (today: regression). Wiring an assumptions-free future frame into the predict path is + PRP-MLZOO-B scope. The rejection branch is only *generalised* from a `model_type` string + check to `model.requires_features` — same behaviour, future-proof condition. + +4. **No `TrainingFrame` dataclass.** The historical training frame's requirements are *defined* + by `canonical_feature_columns()` (the executable column contract, now shared) plus + `examples/models/feature_frame_contract.md` (the prose spec) plus the new historical + leakage test. `ForecastingService` keeps its existing internal `RegressionFeatureMatrix` + carrier (it is not persisted; only its `.feature_columns` list is copied into bundle + metadata). Introducing a second frame dataclass that nothing returns would be dead code + (product-vision.md: do not add abstractions speculatively). + +5. **Builders move; the contract is shared by IMPORT, not by re-typing.** `build_calendar_columns` + and `build_long_lag_columns` are pure (no slice imports) → they move to the shared package. + `build_exogenous_columns` takes a `ScenarioAssumptions` and `build_future_frame` reads the + `calendar` table → they STAY in `scenarios/feature_frame.py` (the shared package may not + import `app/features/**`). The forecasting historical builder keeps its own *value + derivation* (DB-observed price/promo) but consumes the shared *column list and calendar + builder*. + +6. **NaN means "unknown", never a fabricated default.** A builder emits `math.nan` for a cell + whose source is genuinely unknowable at origin `T` (a long-lag whose source day is in the + horizon; `days_since_launch` when the product has no launch date). `HistGradientBoostingRegressor` + tolerates NaN natively. The contract forbids silently substituting `0.0`. A future model + that is *not* NaN-tolerant must impute explicitly in its own `fit`/`predict` — the frame + builder must not. + +7. **Backtesting is not wired for feature-aware models in this PRP.** The fold loop + (`backtesting/service.py` `_run_model_backtest`) calls `model.fit(y_train)` target-only; a + `RegressionForecaster` there raises `ValueError("RegressionForecaster requires exogenous + features X")` — a *loud, non-leaky* failure. We add one regression test pinning that + loud-failure behaviour and document it as a known limitation. Wiring feature-aware + backtesting is PRP-MLZOO-B. + +### Known Gotchas of our codebase & Library Quirks + +```python +# CRITICAL: app/shared/** may NEVER import from app/features/** (AGENTS.md § Architecture). +# The shared package is leaf-level. build_calendar_columns / build_long_lag_columns are pure +# (stdlib `math`, `datetime`, `dataclasses` only) so this holds. A test asserts it (see Task 3). + +# CRITICAL: build_long_lag_columns must move BYTE-FOR-BYTE. The leakage guard is the line +# `idx = (j - 1) - lag` then `if idx < 0 and -tail_len <= idx:`. Any "tidy-up" risks +# re-introducing the exact bug the load-bearing test exists to catch. + +# GOTCHA: 6+ files import names from `app.features.scenarios.feature_frame` +# (service.py, tests/conftest.py, tests/test_feature_frame.py, tests/test_future_frame_leakage.py). +# After the move, feature_frame.py MUST re-export the moved names +# (`from app.shared.feature_frames import (...) # noqa: F401`) so those imports keep +# resolving. Verified import sites — re-export ALL of: EXOGENOUS_LAGS, HISTORY_TAIL_DAYS, +# CALENDAR_COLUMNS, EXOGENOUS_COLUMNS, canonical_feature_columns, FutureFeatureFrame, +# build_calendar_columns, build_long_lag_columns. + +# GOTCHA: MAX_COMPARE_SCENARIOS stays in scenarios/feature_frame.py — it is a Phase-C scenario +# comparison cap, NOT a feature-frame concept. scenarios/schemas.py:413-414 references it by +# comment. Do not move it to the shared package. + +# CRITICAL: ConfigDict(strict=True) on request bodies — N/A here. This PRP adds no request +# schema. The forecasting ModelConfig union is untouched. + +# GOTCHA: `requires_features` is a ClassVar — annotate it `ClassVar[bool]` from `typing`. +# mypy --strict / pyright --strict both gate merge; a bare `requires_features = False` +# without the ClassVar annotation reads as an instance attribute and will type-error on the +# subclass override pattern. + +# GOTCHA: model bundles are joblib-pickled. `requires_features` is a *class* attribute, not an +# instance attribute — it is NOT pickled into the bundle, so old bundles loaded after this +# PRP transparently gain the attribute from the (new) class definition. No bundle migration. + +# GOTCHA: the scenarios "empty-assumption simulation → zero delta" integration test is the +# OLD drift detector for the duplicated contract. After consolidation the two lists ARE one +# import, so that test still passes — and now for a structural reason, not a coincidence. +# It must stay green; do not delete it. + +# GOTCHA: line endings — this repo has mixed CRLF/LF files. Run `git diff --stat` before +# committing; if a moved file shows a whole-file diff, normalise to the original file's +# ending so the review shows only the real change. +``` + +--- + +## Implementation Blueprint + +### Data models and structure + +No ORM models, no Pydantic schemas, no migration. The only new structured types: + +```python +# app/shared/feature_frames/contract.py + +from enum import Enum + +class FeatureSafety(Enum): + """Leakage classification of a feature column in a FUTURE prediction frame.""" + SAFE = "safe" # pure function of the date (calendar) — never a leak + CONDITIONALLY_SAFE = "cond" # target long-lag: safe iff source day <= origin T, else NaN + UNSAFE_UNLESS_SUPPLIED = "unsafe" # future price/promo/inventory — knowable ONLY if the + # caller posits it (scenario assumption); never inferred + +# FutureFeatureFrame — MOVED VERBATIM from scenarios/feature_frame.py:93-107 (unchanged). +@dataclass +class FutureFeatureFrame: + dates: list[date] + feature_columns: list[str] + matrix: list[list[float]] # [horizon][n_features]; NaN allowed and expected + +# FEATURE_CLASS — the executable taxonomy: every canonical column → its FeatureSafety. +FEATURE_CLASS: dict[str, FeatureSafety] = { + # lag_1 .. lag_28 -> CONDITIONALLY_SAFE + # dow_sin/dow_cos/month_sin/month_cos/is_weekend/is_month_end -> SAFE + # price_factor/promo_active -> UNSAFE_UNLESS_SUPPLIED + # is_holiday -> SAFE (calendar table is a timeless attribute) + # days_since_launch -> SAFE (pure function of date once launch_date is known) +} +``` + +### list of tasks (dependency-ordered) + +```yaml +# ════════ STEP 1 — Shared feature-frame package ════════ + +Task 1 — CREATE app/shared/feature_frames/contract.py: + - PURPOSE: the single source of truth for the regression feature-frame contract. + - MOVE VERBATIM from app/features/scenarios/feature_frame.py: + * EXOGENOUS_LAGS (line 65), HISTORY_TAIL_DAYS (line 68) # NOT MAX_COMPARE_SCENARIOS + * CALENDAR_COLUMNS (lines 74-81), EXOGENOUS_COLUMNS (lines 85-90) + * FutureFeatureFrame dataclass (lines 93-107) + * canonical_feature_columns() (lines 110-127) + * _is_month_end() (lines 141-143) + * build_calendar_columns() (lines 146-170) + * build_long_lag_columns() (lines 173-216) + - ADD: `FeatureSafety` Enum + `FEATURE_CLASS` dict (see Data models above). + - ADD: `feature_safety(column: str) -> FeatureSafety` — looks up FEATURE_CLASS; for a + `lag_*` column not literally in the map (custom lag offsets), returns CONDITIONALLY_SAFE; + raises KeyError for a genuinely unknown column (callers must classify every column). + - IMPORTS: stdlib only — `math`, `dataclasses`, `datetime`, `enum`, `typing`. NOTHING from + `app.features.*`. May import `app.core.logging.get_logger` (app/core is allowed). + - PRESERVE: every docstring on the moved functions verbatim (they carry the leakage proof). + - VALIDATE: uv run ruff check app/shared/feature_frames/ && uv run mypy app/shared/feature_frames/contract.py && uv run pyright app/shared/feature_frames/ + +Task 2 — CREATE app/shared/feature_frames/__init__.py: + - RE-EXPORT the public surface from contract.py: + EXOGENOUS_LAGS, HISTORY_TAIL_DAYS, CALENDAR_COLUMNS, EXOGENOUS_COLUMNS, + FutureFeatureFrame, FeatureSafety, FEATURE_CLASS, feature_safety, + canonical_feature_columns, build_calendar_columns, build_long_lag_columns. + - PATTERN: mirror app/shared/seeder/__init__.py — explicit `from .contract import (...)` + plus an `__all__` tuple. + - VALIDATE: uv run python -c "from app.shared.feature_frames import canonical_feature_columns; print(canonical_feature_columns())" + +Task 3 — CREATE app/shared/feature_frames/tests/__init__.py + test_contract.py: + - test_contract.py covers: + * test_canonical_feature_columns_order — 4 lags, then CALENDAR_COLUMNS, then + EXOGENOUS_COLUMNS; total length == sum. (MIRROR scenarios/tests/test_feature_frame.py:48-54.) + * test_pinned_constants — EXOGENOUS_LAGS == (1,7,14,28), HISTORY_TAIL_DAYS == 90. + * test_feature_class_covers_every_canonical_column — every column from + canonical_feature_columns() has a FEATURE_CLASS entry (or feature_safety() resolves it). + * test_calendar_columns_are_all_SAFE / test_lag_columns_are_CONDITIONALLY_SAFE. + * test_shared_package_imports_nothing_from_features — IMPORTANT architectural test: + walk app/shared/feature_frames/*.py source, assert no line matches + `import app.features` or `from app.features`. (AST-walk or a simple text scan; + mirror app/core/tests/test_strict_mode_policy.py's AST-walker style.) + * test_build_calendar_columns_is_deterministic — same dates → identical output. + - CONVENTION: module-level `def test_*` functions (no class), inline constants — mirror + scenarios/tests/test_feature_frame.py. No conftest. No @pytest.mark.integration. + - VALIDATE: uv run pytest -v -m "not integration" app/shared/feature_frames/tests/test_contract.py + +Task 4 — CREATE app/shared/feature_frames/tests/test_leakage.py: + - THIS IS A LOAD-BEARING SPEC (module docstring must say so, mirroring + scenarios/tests/test_future_frame_leakage.py:1-6 — "this file IS the spec, never weaken + it, AGENTS.md § Safety"). + - MOVE the calendar + long-lag leakage tests OUT of + scenarios/tests/test_future_frame_leakage.py INTO this file (they now test shared code): + * test_long_lag_columns_never_emit_a_future_target + * test_long_lag_source_index_is_never_at_or_after_the_horizon + * test_calendar_columns_ignore_the_target_series + (the assemble/exogenous tests STAY in scenarios — see Task 7.) + - MIRROR the disjoint value-pool idiom verbatim (scenarios/tests/test_future_frame_leakage.py:50-56): + _HISTORY_TAIL = [1000.0 + i for i in range(90)] # observed pool + _FUTURE_TARGETS = {9000.0 + i for i in range(_HORIZON)} # disjoint sentinel pool + → any _FUTURE_TARGETS value in a cell == proven leak. + - IMPORT the builders from app.shared.feature_frames (the new home). + - VALIDATE: uv run pytest -v -m "not integration" app/shared/feature_frames/tests/test_leakage.py + +# ════════ STEP 2 — Rewire the scenarios slice onto the shared contract ════════ + +Task 5 — MODIFY app/features/scenarios/feature_frame.py: + - DELETE the moved definitions: EXOGENOUS_LAGS, HISTORY_TAIL_DAYS, CALENDAR_COLUMNS, + EXOGENOUS_COLUMNS, FutureFeatureFrame, canonical_feature_columns, _is_month_end, + build_calendar_columns, build_long_lag_columns. + - KEEP: MAX_COMPARE_SCENARIOS, _in_window, build_exogenous_columns, assemble_future_frame, + build_future_frame. + - ADD at the top: `from app.shared.feature_frames import (EXOGENOUS_LAGS, HISTORY_TAIL_DAYS, + CALENDAR_COLUMNS, EXOGENOUS_COLUMNS, FutureFeatureFrame, canonical_feature_columns, + build_calendar_columns, build_long_lag_columns)` — and a `# noqa: F401` because they are + RE-EXPORTED for back-compat (assemble_future_frame still calls build_*; the names also + stay importable by existing call sites). + - UPDATE the module docstring: the "feature-column contract" paragraph now points at + `app/shared/feature_frames` as the single source of truth. + - GOTCHA: assemble_future_frame calls build_long_lag_columns / build_calendar_columns — + after the import they resolve to the shared functions. No logic change. + - VALIDATE: uv run mypy app/features/scenarios/ && uv run pyright app/features/scenarios/ + +Task 6 — VERIFY scenarios import sites still resolve: + - These files import from app.features.scenarios.feature_frame and rely on the re-export: + tests/conftest.py (canonical_feature_columns), service.py (build_future_frame), + tests/test_feature_frame.py, tests/test_future_frame_leakage.py. + - PREFERRED: update tests/conftest.py and tests/test_feature_frame.py to import the MOVED + names directly from `app.shared.feature_frames` (the stays-in-scenarios names — + build_exogenous_columns, assemble_future_frame — still come from feature_frame.py). + Keep service.py importing build_future_frame from feature_frame.py (it stays there). + - VALIDATE: uv run pytest -v -m "not integration" app/features/scenarios/tests/test_feature_frame.py + +Task 7 — MODIFY app/features/scenarios/tests/test_future_frame_leakage.py: + - REMOVE the three calendar/long-lag tests moved to the shared test_leakage.py (Task 4). + - KEEP every test that exercises build_exogenous_columns, assemble_future_frame, or the + end-to-end assembled frame (those builders stay in scenarios). + - The module docstring still declares it a load-bearing spec — its remaining scope is the + assumption-driven exogenous columns + the assembled frame. + - VALIDATE: uv run pytest -v -m "not integration" app/features/scenarios/tests/test_future_frame_leakage.py + +# ════════ STEP 3 — Formalise the feature-aware model contract ════════ + +Task 9 — MODIFY app/features/forecasting/models.py: + - ADD to BaseForecaster (class body, near `random_state`): + `requires_features: ClassVar[bool] = False` + with a docstring: "True when fit()/predict() REQUIRE a non-None X feature frame. + Baseline (target-only) models leave this False; feature-aware models override to True." + - ADD `from typing import ClassVar` to the imports if not present. + - OVERRIDE in RegressionForecaster: `requires_features: ClassVar[bool] = True`. + - The three baselines inherit False — no edit needed. + - GOTCHA: ClassVar, not a plain assignment — see Known Gotchas. + - VALIDATE: uv run mypy app/features/forecasting/models.py && uv run pyright app/features/forecasting/ + +Task 10 — MODIFY app/features/forecasting/service.py: + - DELETE the local constants _REGRESSION_LAGS (line 79), _REGRESSION_HISTORY_TAIL_DAYS + (line 77), _REGRESSION_FEATURE_COLUMNS (lines 87-99). KEEP _MIN_REGRESSION_TRAIN_ROWS + (line 74 — a training-data threshold, not a frame-contract constant). + - ADD: `from app.shared.feature_frames import (canonical_feature_columns, EXOGENOUS_LAGS, + HISTORY_TAIL_DAYS, build_calendar_columns)`. + - In _build_regression_features: replace `_REGRESSION_FEATURE_COLUMNS` with + `canonical_feature_columns()`, `_REGRESSION_LAGS` with `EXOGENOUS_LAGS`, + `_REGRESSION_HISTORY_TAIL_DAYS` with `HISTORY_TAIL_DAYS`. The per-row inline calendar + math (lines 561-566) is replaced by the shared build_calendar_columns — see pseudocode. + - In train_model: replace `if config.model_type == "regression":` with + `model = model_factory(config, ...)` FIRST, then `if model.requires_features:`. + - In predict(): replace `if bundle.config.model_type == "regression":` (line 348) with + `if bundle.model.requires_features:`. Generalise the error string: "Regression models" + → "Feature-aware models". + - GOTCHA: canonical_feature_columns() returns the SAME 14 strings in the SAME order as the + deleted _REGRESSION_FEATURE_COLUMNS — verify by eye against forecasting/service.py:87-99. + A column-list test (Task 12) pins it. + - VALIDATE: uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration" app/features/forecasting/tests/ + +Task 11 — CREATE app/features/forecasting/tests/test_regression_features_leakage.py: + - LOAD-BEARING spec for the HISTORICAL training builder (_build_regression_features). + - MIRROR featuresets/tests/test_leakage.py's sequential-value idiom: seed SalesDaily-shaped + input where quantity is sequential, assert lag_k at row i equals quantity[i-k] exactly + and is strictly < quantity[i] ("LEAKAGE DETECTED at row {i}" message convention). + - Assert the SQL window guard: no feature row has a date > end_date (the cutoff/origin). + - This is a service-level test → it needs the async DB session fixture; mark + @pytest.mark.integration if it hits Postgres, OR factor the pure row-assembly into a + testable helper. PREFERRED: add a small pure helper `_assemble_regression_rows(dates, + quantities, prices, ...)` inside service.py and unit-test THAT (no DB, no marker) — + mirrors how scenarios split pure `assemble_future_frame` from async `build_future_frame`. + - VALIDATE: uv run pytest -v -m "not integration" app/features/forecasting/tests/test_regression_features_leakage.py + +Task 12 — MODIFY app/features/forecasting/tests/test_service.py: + - ADD: test_requires_features_flag — model_factory(NaiveModelConfig()).requires_features is + False; same for seasonal_naive, moving_average; model_factory(RegressionModelConfig()) + .requires_features is True. + - ADD: test_canonical_columns_match_regression_contract — assert + canonical_feature_columns() equals the exact 14-name list the regression bundle expects + (pins the contract after the constant deletion). + - VALIDATE: uv run pytest -v -m "not integration" app/features/forecasting/tests/test_service.py + +# ════════ STEP 4 — Backtesting guard + docs ════════ + +Task 13 — MODIFY app/features/backtesting/tests/test_service.py: + - ADD ONE test (no production-code change): test_feature_aware_model_fails_loud_in_backtest + — a backtest of a feature-aware model must raise a clear ValueError (the fold loop calls + model.fit(y_train) target-only → RegressionForecaster.fit raises), NEVER silently run. + This pins DECISIONS LOCKED #7 — feature-aware backtesting is loud-fail until PRP-MLZOO-B. + - VALIDATE: uv run pytest -v -m "not integration" app/features/backtesting/tests/test_service.py + +Task 14 — CREATE examples/models/feature_frame_contract.md: + - SECTIONS: + * Historical training frame — shape [n_observations, n_features], rows = observed days + in [train_start, train_end], the `date <= end_date` SQL filter IS the cutoff guard, + lag_k reads quantity[i-k] (i >= k else NaN). + * Future prediction frame — shape [horizon, n_features], rows = T+1..T+horizon, lag_k + reads history_tail[(j-1)-k] only when (j-1)-k < 0 else NaN, NO recursion in v1. + * The canonical column set — the 14 columns, the order, where they come from (cite + app/shared/feature_frames). + * Feature-class taxonomy — the SAFE / CONDITIONALLY_SAFE / UNSAFE_UNLESS_SUPPLIED table, + one row per column, "how to populate / leakage trap" (mirror the table in + PRPs/ai_docs/exogenous-regressor-forecasting.md §2). + * The NaN-as-unknown rule (DECISIONS LOCKED #6) — builders never fabricate defaults. + * How a future advanced model plugs in — set requires_features=True, reuse the shared + builders; backtesting is loud-fail until PRP-MLZOO-B. + - VALIDATE: test -f examples/models/feature_frame_contract.md + +Task 15 — MODIFY examples/models/model_interface.md: + - ADDITIVE only: document the `requires_features` class attribute on the BaseForecaster + interface; add a `regression` row to the Model Configurations / Model Formulas sections; + add a one-line pointer to examples/models/feature_frame_contract.md. + - Do NOT rewrite the file; do NOT "fix" the ModelBundle drift noted in research (out of scope). + - VALIDATE: uv run ruff check . && uv run ruff format --check . +``` + +### Per-task pseudocode (critical details only) + +```python +# ── Task 1 — contract.py: the moved long-lag builder is the leakage core ── +# MOVE VERBATIM. Shown here ONLY so you can confirm it arrived unchanged. +def build_long_lag_columns(history_tail, horizon, lags=EXOGENOUS_LAGS): + tail_len = len(history_tail) + columns = {} + for lag in lags: + column = [] + for j in range(1, horizon + 1): + idx = (j - 1) - lag # <-- the leakage guard + if idx < 0 and -tail_len <= idx: # idx<0 => source day <= origin T + column.append(float(history_tail[idx])) + else: + column.append(math.nan) # future target => NaN, never recursion + columns[f"lag_{lag}"] = column + return columns + +# ── Task 1 — the new taxonomy ── +FEATURE_CLASS = { + **{f"lag_{k}": FeatureSafety.CONDITIONALLY_SAFE for k in EXOGENOUS_LAGS}, + "dow_sin": FeatureSafety.SAFE, "dow_cos": FeatureSafety.SAFE, + "month_sin": FeatureSafety.SAFE, "month_cos": FeatureSafety.SAFE, + "is_weekend": FeatureSafety.SAFE, "is_month_end": FeatureSafety.SAFE, + "is_holiday": FeatureSafety.SAFE, # calendar table = timeless attribute + "days_since_launch": FeatureSafety.SAFE, # pure fn of date once launch_date known + "price_factor": FeatureSafety.UNSAFE_UNLESS_SUPPLIED, + "promo_active": FeatureSafety.UNSAFE_UNLESS_SUPPLIED, +} + +def feature_safety(column: str) -> FeatureSafety: + if column in FEATURE_CLASS: + return FEATURE_CLASS[column] + if column.startswith("lag_"): # custom lag offset + return FeatureSafety.CONDITIONALLY_SAFE + raise KeyError(f"Unclassified feature column: {column!r}") + +# ── Task 3 — the architectural-invariant test ── +def test_shared_package_imports_nothing_from_features(): + """app/shared/** is leaf-level — it may never import a vertical slice.""" + pkg_dir = Path(__file__).resolve().parents[1] # app/shared/feature_frames/ + for py_file in pkg_dir.rglob("*.py"): + source = py_file.read_text(encoding="utf-8") + for node in ast.walk(ast.parse(source)): + if isinstance(node, ast.ImportFrom) and node.module: + assert not node.module.startswith("app.features"), ( + f"ARCHITECTURE BREACH: {py_file} imports {node.module}" + ) + if isinstance(node, ast.Import): + for alias in node.names: + assert not alias.name.startswith("app.features"), ... + +# ── Task 10 — train_model: branch on the model, not on a string ── +async def train_model(self, db, store_id, product_id, train_start, train_end, config): + # PATTERN: build the model first (cheap, no fit), then branch on its capability. + model = model_factory(config, random_state=self.settings.forecast_random_seed) + extra_metadata: dict[str, object] = {} + if model.requires_features: # was: config.model_type == "regression" + features = await self._build_regression_features(db, store_id, product_id, + train_start, train_end) + model.fit(features.y, features.X) + n_observations = features.n_observations + extra_metadata = {"feature_columns": features.feature_columns, + "history_tail": features.history_tail, + "history_tail_dates": features.history_tail_dates, + "launch_date": features.launch_date_iso} + else: + training_data = await self._load_training_data(db, store_id, product_id, + train_start, train_end) + if training_data.n_observations == 0: + raise ValueError(f"No training data found for store={store_id} ...") + model.fit(training_data.y) + n_observations = training_data.n_observations + # ... bundle creation, save, TrainResponse — UNCHANGED below this line. + +# ── Task 10 — predict(): generalise the rejection condition ── + bundle = load_model_bundle(resolved_path) + # ... store/product validation unchanged ... + if bundle.model.requires_features: # was: bundle.config.model_type == "regression" + raise ValueError( + "Feature-aware models forecast through POST /scenarios/simulate, which supplies " + "the exogenous feature frame. POST /forecasting/predict does not support them." + ) + +# ── Task 10 — _build_regression_features: consume the shared contract ── +# The DB reads (sales, calendar holidays, promotions, launch_date) are UNCHANGED. +# Replace the per-row inline calendar math with the shared column builder, and the +# local constants with the shared ones: + feature_columns = canonical_feature_columns() # was list(_REGRESSION_FEATURE_COLUMNS) + calendar_cols = build_calendar_columns(dates) # shared; replaces lines 561-566 + rows = [] + for index, day in enumerate(dates): + row = [] + for lag in EXOGENOUS_LAGS: # was _REGRESSION_LAGS + row.append(quantities[index - lag] if index >= lag else math.nan) + for name in CALENDAR_COLUMNS: # imported from shared + row.append(calendar_cols[name][index]) + row.append(prices[index] / baseline_price) # price_factor — DB-observed + row.append(1.0 if day in promo_dates else 0.0) + row.append(1.0 if day in holiday_dates else 0.0) + row.append(float((day - launch_date).days) if launch_date else math.nan) + rows.append(row) + tail = quantities[-HISTORY_TAIL_DAYS:] # was _REGRESSION_HISTORY_TAIL_DAYS + # CRITICAL: the column ORDER above (lags, calendar, price_factor, promo_active, + # is_holiday, days_since_launch) MUST equal canonical_feature_columns() — Task 12 pins it. +``` + +### Integration Points + +```yaml +PACKAGE WIRING: + - app/shared/feature_frames/ is a pure library — NO router, NO app/main.py change. + - It is imported by: app/features/scenarios/feature_frame.py and + app/features/forecasting/service.py (both directions are app/features -> app/shared, which + is the allowed direction). + +CONFIG: + - No new settings. `forecast_random_seed` (config.py, default 42) is still the determinism + source. EXOGENOUS_LAGS / HISTORY_TAIL_DAYS are code constants, not settings (matches the + PRP-27 precedent — they were code constants in feature_frame.py). + +PERSISTENCE: + - ModelBundle is UNTOUCHED. `feature_columns` in bundle metadata is still a list[str]; the + strings are identical (canonical_feature_columns() == the deleted _REGRESSION_FEATURE_COLUMNS). + - `requires_features` is a class attribute — not pickled — so bundles trained before this PRP + load cleanly and gain the attribute from the new class definition. + +NO MIGRATION: this PRP touches no SQLAlchemy model and no Alembic version. +``` + +--- + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check . --fix && uv run ruff format --check . +# Expected: no errors. Fix everything before Level 2. +``` + +### Level 2: Type Checks + +```bash +uv run mypy app/ # --strict; gates merge +uv run pyright app/ # --strict; gates merge +# Expected: clean. Watch for: ClassVar annotation on requires_features; the F401 re-export +# in scenarios/feature_frame.py needs a `# noqa: F401` (ruff), not a type ignore. +``` + +### Level 3: Unit Tests + +```bash +# New + moved tests +uv run pytest -v -m "not integration" app/shared/feature_frames/tests/ +uv run pytest -v -m "not integration" app/features/forecasting/tests/test_regression_features_leakage.py + +# Regression — the slices this PRP rewired MUST stay green unchanged +uv run pytest -v -m "not integration" app/features/forecasting/tests/ +uv run pytest -v -m "not integration" app/features/scenarios/tests/ +uv run pytest -v -m "not integration" app/features/backtesting/tests/ + +# Whole fast suite +uv run pytest -v -m "not integration" +# Expected: all green. The baseline-forecaster tests and test_regression_forecaster.py must +# pass with ZERO edits — if one fails, the consolidation changed behaviour (it must not). +``` + +### Level 4: Integration Tests + Contract Drift + +```bash +docker compose up -d && uv run alembic upgrade head +uv run pytest -v -m integration app/features/scenarios/ app/features/forecasting/ +# CRITICAL: the scenarios "empty-assumption model_exogenous simulation -> zero delta" test is +# the old drift detector for the duplicated contract. It MUST stay green — now structurally, +# because both slices import the one shared column list. +# No migration in this PRP -> no `alembic downgrade` round-trip needed. +``` + +### Level 5: Manual Validation (dogfood — REQUIRED) + +```bash +# 1. Shared package wires up +uv run python -c "from app.shared.feature_frames import canonical_feature_columns, FeatureSafety, feature_safety; \ +cols = canonical_feature_columns(); assert len(cols) == 14; \ +assert all(feature_safety(c) for c in cols); print('contract OK:', cols)" + +# 2. The duplicated constant is GONE +grep -rn "_REGRESSION_FEATURE_COLUMNS\|_REGRESSION_LAGS\|_REGRESSION_HISTORY_TAIL_DAYS" app/ \ + && echo "FAIL: duplicate still present" || echo "OK: single source of truth" + +# 3. requires_features is correct on every forecaster +uv run python -c " +from app.features.forecasting.models import (NaiveForecaster, SeasonalNaiveForecaster, \ +MovingAverageForecaster, RegressionForecaster); +assert NaiveForecaster.requires_features is False; +assert SeasonalNaiveForecaster.requires_features is False; +assert MovingAverageForecaster.requires_features is False; +assert RegressionForecaster.requires_features is True; +print('requires_features OK')" + +# 4. End-to-end behaviour unchanged — train a regression model and run a model_exogenous +# scenario; confirm it still produces a comparison (start backend first): +# uv run uvicorn app.main:app --port 8123 & +# curl -sX POST localhost:8123/forecasting/train -H 'Content-Type: application/json' \ +# -d '{"store_id":1,"product_id":1,"train_start_date":"2024-01-01", +# "train_end_date":"2024-06-01","config":{"model_type":"regression"}}' +# -> 200; then POST /scenarios/simulate with that run_id -> method "model_exogenous". +``` + +--- + +## Final Validation Checklist + +- [ ] `uv run ruff check .` and `uv run ruff format --check .` clean. +- [ ] `uv run mypy app/` and `uv run pyright app/` clean (both --strict). +- [ ] `uv run pytest -v -m "not integration"` fully green. +- [ ] `uv run pytest -v -m integration app/features/scenarios/ app/features/forecasting/` green — including the empty-assumption zero-delta test. +- [ ] `grep -rn "_REGRESSION_FEATURE_COLUMNS" app/` returns nothing. +- [ ] `app/shared/feature_frames/tests/test_contract.py::test_shared_package_imports_nothing_from_features` passes. +- [ ] `app/shared/feature_frames/tests/test_leakage.py` and `app/features/forecasting/tests/test_regression_features_leakage.py` exist, carry the load-bearing-spec docstring, and pass. +- [ ] Baseline-forecaster tests and `test_regression_forecaster.py` pass with **no edits**. +- [ ] `examples/models/feature_frame_contract.md` exists and documents both frame shapes + the taxonomy; `examples/models/model_interface.md` updated additively. +- [ ] `git diff --stat` shows only the intended files — no whole-file CRLF/LF noise diffs. +- [ ] No new dependency in `pyproject.toml`; no Alembic migration; no route/schema/WebSocket change. +- [ ] An OPEN GitHub issue exists for this work (`gh issue view --json state` → `OPEN`); commit `feat(forecast): feature-aware forecasting foundation — shared feature-frame contract (#)`; branch `feat/feature-aware-forecasting-foundation` off `dev`. + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't implement LightGBM, XGBoost, Prophet, or any new model — that is PRP-MLZOO-B+ (INITIAL-MLZOO-index.md). This PRP is contracts + tests + docs only. +- ❌ Don't add a `FeatureAwareForecaster` base class — DECISIONS LOCKED #2 chose the `requires_features` attribute. +- ❌ Don't introduce a `TrainingFrame` dataclass nothing returns — DECISIONS LOCKED #4 (dead code; product-vision.md forbids speculative abstraction). +- ❌ Don't change `POST /forecasting/predict` behaviour — it still rejects feature-aware models (DECISIONS LOCKED #3). +- ❌ Don't "tidy up" `build_long_lag_columns` while moving it — move it byte-for-byte; the `idx = (j-1)-k` guard is the leakage spec. +- ❌ Don't weaken or delete an assertion in any `*_leakage.py` file — AGENTS.md § Safety. Splitting tests across the module move is fine; dropping coverage is not. +- ❌ Don't let `app/shared/feature_frames/` import from `app/features/**` — it is leaf-level; a test enforces it. +- ❌ Don't silently zero-fill an unknown feature cell — emit `math.nan` (DECISIONS LOCKED #6). +- ❌ Don't add an Alembic migration or touch `ModelBundle` — persistence is untouched. +- ❌ Don't wire feature-aware backtesting — DECISIONS LOCKED #7; loud-fail + a guard test is the deliverable here. + +## Open Questions — ALL RESOLVED + +The three "Required decisions" in INITIAL-MLZOO-A were resolved during planning and are +recorded as DECISIONS LOCKED #1 (contract home → `app/shared/feature_frames/`), #2 (no +`FeatureAwareForecaster` class → `requires_features` attribute), and #3 (`/forecasting/predict` +unchanged). #4–#7 record the derived decisions (no `TrainingFrame`, builders move by import, +NaN-as-unknown, backtesting loud-fail). Nothing is left to litigate at implementation time. + +## Confidence Score + +**8 / 10** for one-pass implementation success. + +Rationale: this is a consolidation-and-test PRP with no new dependency, no migration, no API +change, and no new algorithm — the highest-confidence PRP class. The feature-frame machinery +already exists and is well-tested; the work is moving pure functions into `app/shared/`, +swapping a string check for a class attribute, and adding leakage tests that mirror an +existing, proven idiom. The −2 risk is entirely in **import-update completeness**: 6+ sites +import from `scenarios/feature_frame.py`, and the `_build_regression_features` calendar +refactor (row-major inline math → shared column builder) must reproduce the exact 14-column +order. Both risks are caught fast — by `mypy`/`pyright` (unresolved imports) and by the Task 12 +column-order test plus the scenarios zero-delta integration test. Following the per-task +pseudocode and the Level 3 "baselines must pass unedited" gate makes a regression hard to miss. diff --git a/PRPs/PRP-30-lightgbm-first-advanced-model.md b/PRPs/PRP-30-lightgbm-first-advanced-model.md new file mode 100644 index 00000000..cdf3f141 --- /dev/null +++ b/PRPs/PRP-30-lightgbm-first-advanced-model.md @@ -0,0 +1,944 @@ +name: "PRP-30 — LightGBM First Advanced Model (MLZOO-B)" +description: | + +## Purpose + +The second PRP of the **Advanced ML Model Zoo** sequence (`PRPs/INITIAL/INITIAL-MLZOO-index.md`). +It adds the **first advanced, feature-aware forecasting model** — `LightGBMForecaster`, +wrapping `lightgbm.LGBMRegressor` — on top of the leakage-safe shared feature-frame contract +delivered by PRP-29 (MLZOO-A). + +This PRP implements **one model**: the LightGBM forecaster, its `model_factory` wiring, its +training path, its scenario `model_exogenous` re-forecast path, and its reproducibility +metadata. It adds **no** new model beyond LightGBM, **no** hyperparameter search, **no** +portfolio/global models, **no** frontend, **no** explainability change, and — by an explicit +scoping decision (see DECISIONS LOCKED #6) — **no** feature-aware backtesting wiring. If you +find yourself touching the backtesting fold loop, stop — that is PRP-MLZOO-B.2. + +## What this PRP already inherits (DO NOT re-build) + +PRP-29 (MLZOO-A, merged commit `b116489`) already shipped the foundation a feature-aware model +stands on. Re-use it; do not re-derive it: + +- **The feature-aware model contract.** `BaseForecaster.requires_features: ClassVar[bool]` + (`app/features/forecasting/models.py:64`). `RegressionForecaster` (`models.py:438`) is the + *existing* feature-aware model — `requires_features = True`, `fit(y, X)` / `predict(horizon, X)` + both require a non-`None` `X`. `LightGBMForecaster` is its structural twin. +- **The shared feature-frame contract.** `app/shared/feature_frames/` owns the pinned + constants, `canonical_feature_columns()` (the 14-column set), `FutureFeatureFrame`, the + leakage-safe pure builders, and the `FeatureSafety` taxonomy. A feature-aware model writes + **zero** new contract code. +- **The training-frame branch.** `ForecastingService.train_model` + (`app/features/forecasting/service.py:~226`) already branches on `model.requires_features`: + if true it builds the historical frame via `_build_regression_features` and calls + `model.fit(features.y, features.X)`, persisting `feature_columns` / `history_tail` / + `launch_date` into the bundle metadata. **A new feature-aware model trains with zero + changes to `train_model`.** +- **The predict rejection.** `ForecastingService.predict` (`service.py:~392`) already rejects + any `requires_features` model — `POST /forecasting/predict` cannot supply an exogenous frame. + A LightGBM model is rejected there automatically (DECISIONS LOCKED #4 — no predict change). +- **The historical-frame leakage spec.** `app/features/forecasting/tests/test_regression_features_leakage.py` + pins `_assemble_regression_rows`; `app/shared/feature_frames/tests/test_leakage.py` pins the + shared builders. LightGBM consumes the **same** builders → these specs already cover its + training feature matrix and its future feature matrix. **No new leakage test is required** + (see DECISIONS LOCKED #5). + +The **problem this PRP fixes**: `lightgbm` is a dead placeholder. `LightGBMModelConfig` +exists (`schemas.py:107`), `forecast_enable_lightgbm` exists (`config.py:101`), `lightgbm` is +in the `ModelType` literal (`models.py:581`) and the `ModelConfig` union — but `model_factory` +raises `NotImplementedError("LightGBM forecaster not yet implemented")` (`models.py:623-629`), +`lightgbm` is absent from `pyproject.toml`, and `jobs._execute_train` rejects it as an +unsupported `model_type`. There is no advanced model to compare baselines against. + +## DEPENDS ON — read before starting + +- `PRPs/INITIAL/INITIAL-MLZOO-B-lightgbm-first-model.md` — this PRP's brief. +- `PRPs/INITIAL/INITIAL-MLZOO-index.md` — the MLZOO roadmap (A ✅ → **B (this)** → C → D). +- `PRPs/PRP-29-feature-aware-forecasting-foundation.md` — the merged foundation. Its + DECISIONS LOCKED #2 (`requires_features` attribute, not a `FeatureAwareForecaster` class) + and #6 (NaN-as-unknown) are binding here too. +- `PRPs/ai_docs/exogenous-regressor-forecasting.md` — §1 has the `LGBMRegressor` snippet; §2 is + the leakage rule the shared builders already obey; §5 explains why `regression` shipped on + scikit-learn first. +- `examples/models/feature_frame_contract.md` — the historical/future frame shapes a + feature-aware model consumes, and the canonical 14-column set. +- `docs/optional-features/05-advanced-ml-model-zoo.md` — the full model-zoo vision and risks. + +--- + +## Goal + +Implement `LightGBMForecaster` — a deterministic, feature-aware forecasting model wrapping +`lightgbm.LGBMRegressor` — and wire it end-to-end: `model_factory` instantiates it (behind the +existing `forecast_enable_lightgbm` flag), `ForecastingService.train_model` trains it through +the existing `requires_features` branch, `POST /scenarios/simulate` re-forecasts it through +`method="model_exogenous"`, `JobService._execute_train` accepts `model_type="lightgbm"`, and +the LightGBM library version is captured in the model bundle and the registry's `runtime_info`. +LightGBM ships as an **optional dependency group** (`ml-lightgbm`); the model code lazy-imports +it so a single-host install without the extra still works for every baseline model. + +**End state:** a user with `forecast_enable_lightgbm=True` and the `ml-lightgbm` extra +installed can train a `lightgbm` model (HTTP or job), and re-forecast it in a what-if +scenario, exactly as they can a `regression` model today. Every baseline model and the +`regression` model behave **identically** before and after. Backtesting a feature-aware model +still fails loud (unchanged) — feature-aware backtesting is PRP-MLZOO-B.2. + +## Why + +- **The model zoo needs a credible advanced model.** `docs/optional-features/05-advanced-ml-model-zoo.md` + frames the goal as "upgrade ForecastLabAI from baseline forecasting to a credible model + *comparison* platform". The `regression` model (scikit-learn `HistGradientBoostingRegressor`) + shipped as the *foundation-prover*; LightGBM is the first model whose presence makes a + comparison meaningful — it is the industry-standard gradient-boosted tree for tabular retail + demand on engineered lag/calendar features. +- **The foundation is already paid for.** PRP-29 made the feature-frame contract single-source + and the train/predict path branch on `requires_features`. Adding a second feature-aware model + is now a *small, contained* change — the expensive structural work is done. +- **It de-risks the dependency one step at a time.** `INITIAL-MLZOO-index.md` mandates "Add + LightGBM support behind optional dependency group" before XGBoost/Prophet. This PRP does + exactly that and no more. +- **Low blast radius.** No migration, no API-contract change, no baseline-model change, no + new vertical slice. The risk surface is one new class + three small wiring edits + metadata + + tests. + +## What + +A backend-only feature PRP. User-visible behaviour gains exactly one thing: `model_type: +"lightgbm"` becomes a real, trainable, scenario-re-forecastable model when the feature flag +and the optional dependency are both present. Everything else is identical. + +### Technical requirements + +1. **Optional dependency group.** `pyproject.toml` gains `[project.optional-dependencies] + ml-lightgbm = ["lightgbm>=4.5.0"]`. CI already runs `uv sync --frozen --all-extras --dev` + (`.github/workflows/ci.yml:48,74,116,163`) so the extra is installed and tested in CI with + **no workflow change**. `uv.lock` is regenerated (`uv lock`) because CI uses `--frozen`. +2. **`LightGBMForecaster`** in `app/features/forecasting/models.py` — a `BaseForecaster` + subclass with `requires_features: ClassVar[bool] = True`, structurally mirroring + `RegressionForecaster`. It lazy-imports `lightgbm` inside `fit()` (not at module load, not + in `__init__`) so importing `models.py` never requires the optional dependency. It is + deterministic (`n_jobs=1`, `deterministic=True`, `force_col_wise=True`, fixed + `random_state`) and NaN-tolerant (LightGBM handles `NaN` natively). +3. **`model_factory`** — the `lightgbm` branch (`models.py:623-629`) replaces its + `NotImplementedError` with a real `LightGBMForecaster(...)` instantiation, keeping the + `forecast_enable_lightgbm` gate. +4. **Scenario re-forecast.** `app/features/scenarios/service.py` dispatches the + `model_exogenous` re-forecast on `bundle.model.requires_features` instead of the hard-coded + `bundle.config.model_type == "regression"` — so a LightGBM bundle takes the genuine + re-forecast path, not the heuristic-multiplier fallback. +5. **Jobs integration.** `JobService._execute_train` (`jobs/service.py:453-469`) gains a + `lightgbm` branch building `LightGBMModelConfig`. `_execute_backtest` is **not** changed — + it keeps rejecting feature-aware models (backtesting deferred). +6. **Reproducibility metadata.** `ModelBundle` gains a `lightgbm_version: str | None` field + (best-effort captured on save, mismatch-warned on load — mirroring `sklearn_version`); + `RegistryService._capture_runtime_info` gains a `lightgbm` version block. +7. **Tests** mirroring the `RegressionForecaster` test suite, gated with + `pytest.importorskip("lightgbm")`; an `examples/models/advanced_lightgbm.py` minimal + train/predict example; additive docs. + +### Success Criteria + +- [ ] `model_factory(LightGBMModelConfig(), random_state=42)` returns a `LightGBMForecaster` + when `forecast_enable_lightgbm=True`; raises a clear `ValueError` when the flag is off. +- [ ] `LightGBMForecaster.requires_features is True`; `fit`/`predict` require a non-`None` `X` + and raise the same error-message substrings as `RegressionForecaster` + (`"requires exogenous features"`, `"rows must match"`, `"horizon"`, `"fitted"`). +- [ ] Two fits with the same `random_state` produce **identical** forecasts + (`np.testing.assert_array_equal`). +- [ ] `ForecastingService.train_model` trains a `lightgbm` model with **no edit to + `train_model`** (it routes through the existing `requires_features` branch). +- [ ] `POST /scenarios/simulate` against a trained `lightgbm` run returns + `method="model_exogenous"` (not `"heuristic"`). +- [ ] `JobService._execute_train` accepts `model_type="lightgbm"`; `_execute_backtest` still + rejects it with `Unsupported model_type`. +- [ ] `ModelBundle.lightgbm_version` and registry `runtime_info["lightgbm_version"]` are + captured when `lightgbm` is installed. +- [ ] Every baseline model, the `regression` model, and the backtesting loud-fail guard test + (`test_feature_aware_model_fails_loud_in_backtest`) pass **with no behaviour change**. +- [ ] `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` all green. +- [ ] No Alembic migration; no route/schema/WebSocket contract change; LightGBM stays an + *optional* dependency (the core `dependencies` list is unchanged). + +--- + +## All Needed Context + +### Documentation & References + +```yaml +- file: app/features/forecasting/models.py + why: RegressionForecaster (lines 438-577) is the BYTE-FOR-BYTE structural template for + LightGBMForecaster — same __init__ shape, same fit/predict guards, same error strings, + same get_params/set_params. requires_features ClassVar is at line 64 (base) / 458 + (RegressionForecaster override). model_factory lightgbm branch to replace: lines 623-629. + critical: The estimator is typed `Any` (see `estimator: Any = HistGradientBoostingRegressor(...)` + at models.py:510) — mirror that for LGBMRegressor so pyright --strict stays quiet. + +- file: app/features/forecasting/schemas.py + why: LightGBMModelConfig (lines 107-144) ALREADY EXISTS — model_type Literal["lightgbm"], + n_estimators (10-1000, default 100), max_depth (1-20, default 6), learning_rate + (0.001-1.0, default 0.1), feature_config_hash. It is already in the ModelConfig union + (lines 193-199). DO NOT add fields — DECISIONS LOCKED #3 keeps it conservative. + +- file: app/features/forecasting/service.py + why: train_model (~line 226) already branches `if model.requires_features:` → builds the + historical frame and persists feature metadata. predict (~line 392) already rejects + feature-aware models. _build_regression_features (lines 503-635) and the pure + _assemble_regression_rows (lines 114-170) are REUSED unchanged by LightGBM. + critical: DO NOT EDIT service.py. A LightGBM model trains and is predict-rejected purely + because requires_features=True. Verify by reading, then leave it alone. + +- file: app/features/forecasting/persistence.py + why: ModelBundle dataclass (lines 30-69) has python_version + sklearn_version (no + extensible version dict). save_model_bundle captures them at lines 94-98; + load_model_bundle mismatch-warns at lines 156-172. Add `lightgbm_version` mirroring + `sklearn_version` exactly. compute_hash (lines ~52-66) does NOT include version fields + → adding lightgbm_version does not change any bundle hash. + +- file: app/features/forecasting/routes.py + why: POST /forecasting/train (lines 47-131) ALREADY has the lightgbm feature-flag gate at + lines 67-72 (`model_type == "lightgbm" and not settings.forecast_enable_lightgbm` → + 400). ValueError → 400, FileNotFoundError → 404. No route-code change needed; only a + new route test. + +- file: app/features/scenarios/service.py + why: The model_exogenous dispatch is hard-coded `if bundle.config.model_type == "regression":` + (~line 112). CHANGE it to `if bundle.model.requires_features:`. _simulate_model_exogenous + reads bundle.metadata["feature_columns"]/["history_tail"] and calls + `bundle.model.predict(horizon, X)` — model-agnostic, no other change. + critical: grep `app/features/scenarios/` for every `"regression"` / `model_type` check and + generalise each model-capability check to requires_features. Do not miss agent_tools.py. + +- file: app/features/scenarios/feature_frame.py + why: build_future_frame (lines 232-299) / assemble_future_frame (lines 181-229) produce a + FutureFeatureFrame for ANY feature_columns list — model-agnostic. LightGBM model_exogenous + needs ZERO feature-frame code. Read to confirm; do not edit. + +- file: app/features/jobs/service.py + why: _execute_train (lines 409-503) has a model_type if/elif chain (lines 453-469) that + rejects anything but naive/seasonal_naive/moving_average/regression with + `ValueError("Unsupported model_type: ...")`. ADD a `lightgbm` branch. _execute_backtest + (lines 583-668, branch 630-640) keeps rejecting lightgbm — backtesting is deferred. + +- file: app/features/registry/service.py + why: _capture_runtime_info (lines 84-123) best-effort-imports sklearn/numpy/pandas/joblib + into a runtime_info dict. ADD an identical `try: import lightgbm` block. runtime_info + is JSONB (registry/models.py) — a new key needs NO migration. + +- file: app/features/forecasting/tests/test_regression_forecaster.py + why: The FULL test template for test_lightgbm_forecaster.py. Clone every test: fit/predict + roundtrip, rejects-None-X, rejects-mismatched-rows, predict-before-fit, NaN tolerance, + get/set params, factory creation, and especially test_determinism_same_random_state + (np.testing.assert_array_equal of two same-seed fits). Copy its `_synthetic_data` helper. + +- file: app/features/forecasting/tests/test_service.py + why: TestFeatureAwareContract (lines 349-385) — test_requires_features_flag and + test_canonical_columns_match_regression_contract. Extend the first; the second is + UNCHANGED (LightGBM reuses the same 14-column contract). + +- file: app/features/backtesting/tests/test_service.py + why: test_feature_aware_model_fails_loud_in_backtest (lines 120-152) STAYS — it is the + interim contract until PRP-MLZOO-B.2 wires feature-aware backtesting. It uses + RegressionModelConfig and is unaffected by this PRP. Do not touch it. + +- file: app/features/jobs/tests/test_service.py + why: A test around line 218 (`test_..._rejects_unsupported_model_type`) currently uses + `"lightgbm"` as the genuinely-unsupported type. Once _execute_train accepts lightgbm + this test is WRONG — swap its payload to a still-unsupported string ("arima"). + +- url: https://lightgbm.readthedocs.io/en/stable/pythonapi/lightgbm.LGBMRegressor.html + why: LGBMRegressor sklearn-API constructor — n_estimators, learning_rate, max_depth, + random_state, n_jobs, verbosity. fit(X, y) / predict(X) are sklearn-compatible. + +- url: https://lightgbm.readthedocs.io/en/stable/Parameters.html#deterministic + section: deterministic, force_col_wise, num_threads + critical: For bit-reproducible trees set deterministic=true AND force_col_wise=true (or + force_row_wise=true) AND num_threads=1 (LGBMRegressor: n_jobs=1). Without all three a + multi-threaded fit can differ by ULPs and the determinism test (assert_array_equal) + flakes. verbosity=-1 silences LightGBM's training chatter. + +- docfile: PRPs/ai_docs/exogenous-regressor-forecasting.md + why: §1 has the exact LGBMRegressor snippet; §5 records why `regression` shipped on + scikit-learn first — LightGBM was deferred precisely to a later PRP (this one). +``` + +### Current Codebase tree (relevant — all already exist) + +```bash +app/features/forecasting/ +├── models.py # BaseForecaster, RegressionForecaster, model_factory (lightgbm stub) +├── schemas.py # LightGBMModelConfig (already exists), ModelConfig union +├── service.py # train_model + predict already branch on requires_features +├── persistence.py # ModelBundle (python_version + sklearn_version only) +├── routes.py # /forecasting/train already has the lightgbm flag gate +└── tests/ + ├── conftest.py + ├── test_regression_forecaster.py # the test template to clone + ├── test_service.py # TestFeatureAwareContract + ├── test_routes.py + ├── test_persistence.py + └── test_regression_features_leakage.py # load-bearing — already covers LightGBM's frame +app/features/scenarios/service.py # model_exogenous dispatch (model_type=="regression") +app/features/jobs/service.py # _execute_train rejects lightgbm +app/features/registry/service.py # _capture_runtime_info (no lightgbm) +app/features/backtesting/tests/test_service.py # loud-fail guard (stays) +app/shared/feature_frames/ # the shared contract (PRP-29) — reused, untouched +examples/models/ +├── baseline_naive.py / baseline_seasonal.py / baseline_mavg.py +├── model_interface.md +└── feature_frame_contract.md +pyproject.toml # lightgbm absent +.github/workflows/ci.yml # uv sync --frozen --all-extras --dev (no change) +``` + +### Desired Codebase tree — files to ADD + +```bash +app/features/forecasting/tests/ +└── test_lightgbm_forecaster.py # cloned from test_regression_forecaster.py, importorskip + +examples/models/ +└── advanced_lightgbm.py # minimal LightGBM train/predict example (INITIAL-B asks) +``` + +### Files to MODIFY (all additive or behaviour-preserving) + +```bash +pyproject.toml # + [project.optional-dependencies] ml-lightgbm + # + [[tool.mypy.overrides]] lightgbm.* ignore_missing_imports +uv.lock # regenerated by `uv lock` +app/features/forecasting/models.py # + LightGBMForecaster; model_factory lightgbm branch +app/features/forecasting/persistence.py # + ModelBundle.lightgbm_version (save + load) +app/features/scenarios/service.py # model_type=="regression" -> requires_features +app/features/jobs/service.py # _execute_train: + lightgbm branch +app/features/registry/service.py # _capture_runtime_info: + lightgbm block +app/features/forecasting/__init__.py # export LightGBMForecaster if RegressionForecaster is exported +app/features/forecasting/tests/test_service.py # extend test_requires_features_flag +app/features/forecasting/tests/test_routes.py # + lightgbm 400-when-disabled route test +app/features/forecasting/tests/test_persistence.py # + lightgbm_version captured assertion +app/features/jobs/tests/test_service.py # fix rejects-unsupported test; + lightgbm job test +app/features/scenarios/tests/test_routes_integration.py # + lightgbm model_exogenous test +app/features/registry/tests/test_service.py # + runtime_info has lightgbm_version +examples/models/model_interface.md # additive: lightgbm row +examples/models/feature_frame_contract.md # additive: lightgbm is now a feature-aware model +README.md # additive: the ml-lightgbm optional extra +``` + +### DECISIONS LOCKED (resolved during planning — do NOT re-litigate) + +1. **LightGBM ships as an optional dependency group, not a core/hard dependency.** A new + `[project.optional-dependencies] ml-lightgbm = ["lightgbm>=4.5.0"]`. Rationale: AGENTS.md + lists LightGBM as "(opt-in)"; `INITIAL-MLZOO-index.md` mandates "behind optional dependency + group"; the single-host vision keeps the core install dependency-light. CI runs + `uv sync --frozen --all-extras --dev` so the extra **is** installed and tested in CI — no + workflow edit needed. (User-confirmed during PRP planning.) + +2. **The sklearn fallback is rejected — it would be a no-op.** `INITIAL-B` offered + `HistGradientBoostingRegressor` as a fallback, but the `regression` model **already is** + exactly that (PRP-27). Implementing the fallback would duplicate an existing model and add + nothing. LightGBM is genuinely implemented. (User-confirmed.) + +3. **`LightGBMModelConfig` is used UNCHANGED — conservative config.** It already exists with + `n_estimators` / `max_depth` / `learning_rate` / `feature_config_hash`. `INITIAL-B` says + "Keep the first config conservative". `num_leaves` / `min_child_samples` / `subsample` / + `colsample_bytree` (named in `INITIAL-MLZOO-index.md`) are a deliberate future-PRP + extension — adding them now widens the schema surface for no MVP value. The forecaster uses + LightGBM defaults for every parameter not in the config. + +4. **`POST /forecasting/predict` is NOT changed.** A LightGBM model is feature-aware + (`requires_features=True`) and is rejected by the existing predict branch — identical to + `regression`. It forecasts through `POST /scenarios/simulate`. This answers INITIAL-B's + "How prediction rejects missing future feature frames": the rejection already exists and is + capability-based (`requires_features`), not `model_type`-string-based. + +5. **No new leakage test.** LightGBM reuses `_build_regression_features` / + `_assemble_regression_rows` (historical frame) and the shared `app/shared/feature_frames` + builders (future frame) **byte-for-byte**. Those are already pinned by the load-bearing + `app/features/forecasting/tests/test_regression_features_leakage.py` and + `app/shared/feature_frames/tests/test_leakage.py`. The model is leakage-covered by + construction; a duplicate LightGBM-flavoured leakage test would test the same code twice. + The PRP MUST state this reuse explicitly so the reviewer sees the coverage is intact. + +6. **Feature-aware backtesting is OUT OF SCOPE — deferred to PRP-MLZOO-B.2.** The backtest + fold loop (`backtesting/service.py::_run_model_backtest`) is synchronous, DB-free, and + loads target quantities only; wiring per-fold leakage-safe `X_train`/`X_future` builders is + itself PRP-sized. `test_feature_aware_model_fails_loud_in_backtest` stays as the interim + loud-fail contract. `JobService._execute_backtest` keeps rejecting `lightgbm`. + (User-confirmed: "Defer to a follow-up PRP".) + +7. **The `lightgbm` import is LAZY — inside `fit()`, never at module scope.** Importing + `app/features/forecasting/models.py` must never require the optional dependency (the module + is imported by every forecasting code path, baseline models included). `model_factory` and + `LightGBMForecaster.__init__` only store parameters; `import lightgbm` happens the first + time `fit()` runs. `requires_features` is a `ClassVar` → readable with no import. + +### Known Gotchas of our codebase & Library Quirks + +```python +# CRITICAL: lazy import. `import lightgbm` goes INSIDE LightGBMForecaster.fit(), not at the +# top of models.py and not in __init__. models.py is imported for naive/seasonal/mavg too; +# a module-level lightgbm import would make every forecast path require the optional extra. + +# CRITICAL: determinism. LGBMRegressor is reproducible ONLY with n_jobs=1 AND +# deterministic=True AND force_col_wise=True AND a fixed random_state. Omit any one and a +# multi-threaded fit differs by ULPs → test_determinism_same_random_state (which uses +# np.testing.assert_array_equal — EXACT equality, the repo idiom) flakes in CI. Do not +# "fix" a flake by switching to assert_allclose — fix the LGBMRegressor params. + +# GOTCHA: mypy --strict + warn_unused_ignores=true. lightgbm has incomplete type information. +# Add a `[[tool.mypy.overrides]] module=["lightgbm.*"] ignore_missing_imports=true` block to +# pyproject.toml. With that, do NOT also add an inline `# type: ignore[import-untyped]` — the +# override handles it and an unused inline ignore is itself a mypy error. Type the estimator +# `Any` (mirror `estimator: Any = HistGradientBoostingRegressor(...)` at models.py:510). + +# GOTCHA: pyright --strict excludes tests/ (pyproject [tool.pyright] exclude) but scans app/. +# With the ml-lightgbm extra installed (CI: --all-extras; locally: see Validation Level 0) +# pyright resolves `import lightgbm`. reportUnknownMemberType is already "warning" not +# "error" (pyproject:169) so dynamic LGBMRegressor attribute access does not fail the gate. + +# GOTCHA: uv.lock + --frozen. CI installs with `uv sync --frozen` — `--frozen` REFUSES to +# update the lockfile. After editing pyproject.toml you MUST run `uv lock` and commit the +# refreshed uv.lock, or every CI job fails at the install step. + +# GOTCHA: tests must not hard-require the optional dep. test_lightgbm_forecaster.py starts with +# `pytest.importorskip("lightgbm")` so a dev who ran `uv sync --extra dev` (no ml-lightgbm) +# sees the suite SKIP, not ERROR. CI installs --all-extras so the tests RUN there. + +# GOTCHA: jobs/tests/test_service.py ~line 218 — a test asserts `model_type="lightgbm"` is +# rejected as unsupported. After _execute_train gains a lightgbm branch that assertion is +# false. Swap the test payload to a string that is genuinely unsupported, e.g. "arima". + +# GOTCHA: loading a LightGBM bundle requires the ml-lightgbm extra. joblib.load unpickles the +# embedded LGBMRegressor, which needs `lightgbm` importable. This is inherent to an optional +# ML dependency — document it; do not try to engineer around it. + +# GOTCHA: scenarios dispatch. The model_exogenous re-forecast is gated `if +# bundle.config.model_type == "regression":`. A LightGBM bundle would silently fall through +# to the heuristic multiplier. Change it to `if bundle.model.requires_features:` — the +# repo's own forecasting/service.py already branches on exactly that flag. + +# GOTCHA: line endings — repo has mixed CRLF/LF, no .gitattributes. Run `git diff --stat` +# before committing; if a modified file shows a whole-file diff, re-normalise to its +# original ending so the review shows only the real change. +``` + +--- + +## Implementation Blueprint + +### Data models and structure + +No ORM model, no Pydantic schema change, no migration. `LightGBMModelConfig` already exists +and is used as-is (DECISIONS LOCKED #3). The only new structured type is the forecaster class: + +```python +# app/features/forecasting/models.py — mirrors RegressionForecaster (models.py:438-577) + +class LightGBMForecaster(BaseForecaster): + """Feature-aware forecaster wrapping ``lightgbm.LGBMRegressor``. + + The first ADVANCED feature-aware model (MLZOO-B). Like RegressionForecaster + it REQUIRES a non-None exogenous X for both fit and predict; unlike it, the + estimator is gradient-boosted leaf-wise trees from the optional ``lightgbm`` + package. ``lightgbm`` is imported lazily inside ``fit`` so importing this + module never requires the optional dependency. + """ + + requires_features: ClassVar[bool] = True + + def __init__(self, *, n_estimators: int = 100, learning_rate: float = 0.1, + max_depth: int = 6, random_state: int = 42) -> None: + super().__init__(random_state) + self.n_estimators = n_estimators + self.learning_rate = learning_rate + self.max_depth = max_depth + self._estimator: Any = None +``` + +### list of tasks (dependency-ordered) + +```yaml +# ════════ STEP 1 — Optional dependency ════════ + +Task 1 — MODIFY pyproject.toml + regenerate uv.lock: + - ADD under [project.optional-dependencies], after the `dev` group: + ml-lightgbm = ["lightgbm>=4.5.0"] + - ADD a new mypy override block (after the existing alembic override, ~line 145): + [[tool.mypy.overrides]] + module = ["lightgbm.*"] + ignore_missing_imports = true + - RUN `uv lock` to refresh uv.lock (CI uses `uv sync --frozen` — a stale lock fails CI). + - RUN `uv sync --extra dev --extra ml-lightgbm` locally so the gates can see lightgbm. + - VALIDATE: uv run python -c "import lightgbm; print(lightgbm.__version__)" + +# ════════ STEP 2 — The forecaster + factory ════════ + +Task 2 — MODIFY app/features/forecasting/models.py — ADD LightGBMForecaster: + - PLACE the new class immediately AFTER RegressionForecaster (after models.py:577), + BEFORE the `ModelType` alias. + - MIRROR RegressionForecaster byte-for-byte for: __init__ shape (keyword-only params + + random_state, `self._estimator: Any = None`), the fit guards (X is None → ValueError + "LightGBMForecaster requires exogenous features X for fit()"; empty y → ValueError + "Cannot fit on empty array"; row mismatch → ValueError f"X has {X.shape[0]} rows but y + has {len(y)} — feature/target rows must match"), the predict guards (not fitted → + RuntimeError "Model must be fitted before predict"; X is None → ValueError + "LightGBMForecaster requires exogenous features X for predict()"; shape mismatch → + ValueError f"X has {X.shape[0]} rows but horizon is {horizon} — they must match"), + get_params, set_params. + - INSIDE fit(): `import lightgbm as lgb` (lazy), then + `estimator: Any = lgb.LGBMRegressor(n_estimators=self.n_estimators, + learning_rate=self.learning_rate, max_depth=self.max_depth, + random_state=self.random_state, n_jobs=1, deterministic=True, + force_col_wise=True, verbosity=-1)`; `estimator.fit(X, y)`. + - set requires_features: ClassVar[bool] = True (with the one-line docstring). + - get_params returns {n_estimators, learning_rate, max_depth, random_state}. + - PRESERVE the error-message substrings EXACTLY — the cloned tests `match=` on them. + - VALIDATE: uv run mypy app/features/forecasting/models.py && uv run pyright app/features/forecasting/ + +Task 3 — MODIFY app/features/forecasting/models.py — model_factory lightgbm branch: + - REPLACE the body of `elif model_type == "lightgbm":` (currently models.py:623-629, + the `NotImplementedError`) with: + if not settings.forecast_enable_lightgbm: + raise ValueError("LightGBM is not enabled. Set forecast_enable_lightgbm=True in settings.") + from app.features.forecasting.schemas import LightGBMModelConfig + if isinstance(config, LightGBMModelConfig): + return LightGBMForecaster( + n_estimators=config.n_estimators, + learning_rate=config.learning_rate, + max_depth=config.max_depth, + random_state=random_state, + ) + raise ValueError("Invalid config type for lightgbm") + - KEEP the forecast_enable_lightgbm gate FIRST (the route relies on it; tests rely on it). + - VALIDATE: uv run mypy app/ && uv run pyright app/ + +Task 4 — MODIFY app/features/forecasting/__init__.py: + - IF `RegressionForecaster` is imported/__all__-listed there, ADD `LightGBMForecaster` + alongside it (same import line + __all__). IF RegressionForecaster is NOT exported, + make NO change (match the existing convention exactly). + - VALIDATE: uv run ruff check app/features/forecasting/__init__.py + +# ════════ STEP 3 — Scenario re-forecast path ════════ + +Task 5 — MODIFY app/features/scenarios/service.py — generalise the model_exogenous dispatch: + - FIND `if bundle.config.model_type == "regression":` (~service.py:112) and CHANGE the + condition to `if bundle.model.requires_features:`. + - GREP `app/features/scenarios/` for every other `"regression"` / `model_type ==` check + (service.py, agent_tools.py, schemas.py): any check that means "is this model + feature-aware / re-forecastable" becomes `requires_features`; a check that is genuinely + regression-specific (none expected) stays. Report what you found in the PR description. + - DO NOT change feature_frame.py — build_future_frame is already model-agnostic. + - VALIDATE: uv run mypy app/features/scenarios/ && uv run pyright app/features/scenarios/ + +# ════════ STEP 4 — Jobs integration ════════ + +Task 6 — MODIFY app/features/jobs/service.py — _execute_train: + - ADD `LightGBMModelConfig` to the forecasting-schemas import block (jobs/service.py:426-431). + - ADD a branch in the model_type if/elif chain (jobs/service.py:453-469), BEFORE the + final `else: raise ValueError("Unsupported model_type: ...")`: + elif model_type == "lightgbm": + config = LightGBMModelConfig( + n_estimators=params.get("n_estimators", 100), + learning_rate=params.get("learning_rate", 0.1), + max_depth=params.get("max_depth", 6), + ) + - DO NOT touch _execute_backtest (jobs/service.py:583-668) — it keeps rejecting lightgbm; + feature-aware backtesting is PRP-MLZOO-B.2 (DECISIONS LOCKED #6). + - NOTE: _execute_train does not check forecast_enable_lightgbm — model_factory does. A + lightgbm job with the flag off fails LOUD ("LightGBM is not enabled"). That is correct. + - VALIDATE: uv run mypy app/features/jobs/ && uv run pyright app/features/jobs/ + +# ════════ STEP 5 — Reproducibility metadata ════════ + +Task 7 — MODIFY app/features/forecasting/persistence.py — ModelBundle.lightgbm_version: + - ADD field to ModelBundle (after `sklearn_version: str | None = None`): + lightgbm_version: str | None = None + - In save_model_bundle (near persistence.py:94-98), AFTER `bundle.sklearn_version = ...`, + ADD a best-effort capture (mirror RegistryService._capture_runtime_info's idiom): + try: + import lightgbm + bundle.lightgbm_version = str(lightgbm.__version__) + except ImportError: + bundle.lightgbm_version = None + - In load_model_bundle (near persistence.py:156-172), ADD a mismatch-warning block + mirroring the sklearn_version one — log `lightgbm_version_mismatch` (saved vs current) + only when both are non-None and differ. Guard the current-version lookup in a + try/except ImportError so loading a non-LightGBM bundle without the extra never warns. + - compute_hash is unchanged (it never read version fields) — confirm no bundle hash shifts. + - VALIDATE: uv run mypy app/features/forecasting/ && uv run pyright app/features/forecasting/ + +Task 8 — MODIFY app/features/registry/service.py — _capture_runtime_info: + - ADD, alongside the sklearn/numpy/pandas/joblib blocks (registry/service.py:84-123): + try: + import lightgbm + runtime_info["lightgbm_version"] = lightgbm.__version__ + except ImportError: + pass + - VALIDATE: uv run mypy app/features/registry/ && uv run pyright app/features/registry/ + +# ════════ STEP 6 — Tests ════════ + +Task 9 — CREATE app/features/forecasting/tests/test_lightgbm_forecaster.py: + - START the module with `import pytest` then `pytest.importorskip("lightgbm")` at module + scope (skips the whole file when the optional extra is absent). + - CLONE every test from test_regression_forecaster.py, swapping RegressionForecaster → + LightGBMForecaster and RegressionModelConfig → LightGBMModelConfig: fit/predict + roundtrip, fit-rejects-None-features, fit-rejects-mismatched-rows, + predict-rejects-None-features, predict-rejects-wrong-shape, predict-before-fit-raises, + test_determinism_same_random_state (np.testing.assert_array_equal), handles-NaN-features, + get/set params, and model_factory creation. + - COPY the `_synthetic_data` helper verbatim. + - For the factory test: `model_factory` needs forecast_enable_lightgbm=True — patch it via + `unittest.mock.patch("app.features.forecasting.models.get_settings")` returning a + settings stub with forecast_enable_lightgbm=True (note: model_factory imports + get_settings INSIDE the function — patch that path), OR construct LightGBMForecaster + directly for the non-factory tests. + - VALIDATE: uv run pytest -v -m "not integration" app/features/forecasting/tests/test_lightgbm_forecaster.py + +Task 10 — MODIFY app/features/forecasting/tests/test_service.py: + - In TestFeatureAwareContract.test_requires_features_flag (~line 351), ADD a class-level + assertion that needs neither the factory flag nor lightgbm installed: + from app.features.forecasting.models import LightGBMForecaster + assert LightGBMForecaster.requires_features is True + - ADD test_lightgbm_factory_respects_flag: with forecast_enable_lightgbm=False (patched) + model_factory(LightGBMModelConfig()) raises ValueError matching "not enabled"; with + True it returns a LightGBMForecaster. Guard the True-branch with + `pytest.importorskip("lightgbm")` only if it constructs+fits — construction alone needs + no import (lazy), so the isinstance check needs no importorskip. + - test_canonical_columns_match_regression_contract is UNCHANGED (LightGBM reuses it). + - VALIDATE: uv run pytest -v -m "not integration" app/features/forecasting/tests/test_service.py + +Task 11 — MODIFY app/features/forecasting/tests/test_routes.py: + - ADD test_train_lightgbm_rejected_when_disabled: POST /forecasting/train with + `config={"model_type":"lightgbm"}` and forecast_enable_lightgbm at its default (False) + → 400, problem+json detail mentioning LightGBM disabled (route gate routes.py:67-72). + - Follow the file's existing ASGITransport client fixture + @pytest.mark.integration idiom. + - VALIDATE: uv run pytest -v app/features/forecasting/tests/test_routes.py + +Task 12 — MODIFY app/features/jobs/tests/test_service.py: + - FIX the rejects-unsupported-model-type test (~line 218): change the params payload + `model_type` from `"lightgbm"` to a genuinely unsupported value `"arima"`; keep the + `ValueError`/`Unsupported model_type` expectation. + - ADD test_execute_train_accepts_lightgbm: build a JobCreate train job with + model_type="lightgbm"; with forecast_enable_lightgbm enabled + `pytest.importorskip + ("lightgbm")` assert it runs (or, mirroring the existing regression job test, assert + _execute_train builds a LightGBMModelConfig — match the file's existing depth of mock). + - VALIDATE: uv run pytest -v app/features/jobs/tests/test_service.py + +Task 13 — MODIFY app/features/scenarios/tests/test_routes_integration.py: + - ADD an integration test that trains a `lightgbm` model then POSTs /scenarios/simulate + with its run_id and asserts the response `method == "model_exogenous"` (NOT "heuristic") + — pins the Task 5 dispatch change. Mirror the existing regression model_exogenous test + in this file; gate with `pytest.importorskip("lightgbm")` and enable + forecast_enable_lightgbm. + - VALIDATE: uv run pytest -v -m integration app/features/scenarios/tests/test_routes_integration.py + +Task 14 — MODIFY app/features/forecasting/tests/test_persistence.py: + - ADD test_lightgbm_version_recorded: after `pytest.importorskip("lightgbm")`, save a + ModelBundle and assert `bundle.lightgbm_version` is a non-empty str; and assert a + non-LightGBM bundle still saves/loads with lightgbm_version possibly set (str) — the + field is populated best-effort regardless of model type. + - VALIDATE: uv run pytest -v -m "not integration" app/features/forecasting/tests/test_persistence.py + +Task 15 — MODIFY app/features/registry/tests/test_service.py: + - ADD/extend a runtime_info test: when `lightgbm` is importable + (`pytest.importorskip("lightgbm")`) a created run's runtime_info contains the + `lightgbm_version` key. Mirror the existing sklearn/numpy runtime_info assertions. + - VALIDATE: uv run pytest -v app/features/registry/tests/test_service.py + +# ════════ STEP 7 — Docs & example ════════ + +Task 16 — CREATE examples/models/advanced_lightgbm.py: + - A minimal, runnable script: build a small synthetic [n, 14] feature matrix + target, + `LightGBMForecaster(random_state=42).fit(y, X)`, `predict(horizon, X_future)`, print + the forecasts. Mirror the structure/header of examples/models/baseline_naive.py. + - `examples/**` ignores T201/ANN/S101 (pyproject ruff per-file-ignores) — `print` is fine. + - VALIDATE: uv run python examples/models/advanced_lightgbm.py (requires the ml-lightgbm extra) + +Task 17 — MODIFY examples/models/model_interface.md + feature_frame_contract.md: + - model_interface.md: ADDITIVE — add a `lightgbm` row to the Model Configurations / Model + Formulas sections; note requires_features=True; note it is optional (ml-lightgbm extra). + - feature_frame_contract.md: ADDITIVE — update the opening line ("the regression forecaster + today; LightGBM ... in the MLZOO sequence") to record LightGBM as an IMPLEMENTED + feature-aware model. Do NOT rewrite the file. The backtesting loud-fail limitation + paragraph stays accurate (B.2 still pending). + - VALIDATE: uv run ruff check . && uv run ruff format --check . + +Task 18 — MODIFY README.md: + - ADDITIVE: one line in the install/setup section that `LightGBM` is an opt-in model + installed via `uv sync --extra dev --extra ml-lightgbm` and enabled with + `forecast_enable_lightgbm=true`. Mirror the README's existing tone; do not restructure. + - VALIDATE: uv run ruff format --check . (README is markdown — visual check only) +``` + +### Per-task pseudocode (critical details only) + +```python +# ── Task 2 — LightGBMForecaster.fit/predict (the lazy import + determinism is the crux) ── +def fit(self, y, X=None): + if X is None: + raise ValueError("LightGBMForecaster requires exogenous features X for fit()") + if len(y) == 0: + raise ValueError("Cannot fit on empty array") + if X.shape[0] != len(y): + raise ValueError( + f"X has {X.shape[0]} rows but y has {len(y)} — feature/target rows must match" + ) + import lightgbm as lgb # LAZY — optional dependency; never module-scope + estimator: Any = lgb.LGBMRegressor( + n_estimators=self.n_estimators, + learning_rate=self.learning_rate, + max_depth=self.max_depth, + random_state=self.random_state, + n_jobs=1, # \ + deterministic=True, # } all three required for bit-reproducible fit + force_col_wise=True, # / + verbosity=-1, # silence LightGBM training chatter + ) + estimator.fit(X, y) # NaN in X is fine — LightGBM handles missing natively + self._estimator = estimator + self._last_values = np.asarray(y[-1:], dtype=np.float64) + self._is_fitted = True + return self + +def predict(self, horizon, X=None): + if not self._is_fitted or self._estimator is None: + raise RuntimeError("Model must be fitted before predict") + if X is None: + raise ValueError("LightGBMForecaster requires exogenous features X for predict()") + if X.shape[0] != horizon: + raise ValueError(f"X has {X.shape[0]} rows but horizon is {horizon} — they must match") + predictions = self._estimator.predict(X) + return np.asarray(predictions, dtype=np.float64) + +# ── Task 5 — scenarios dispatch: branch on capability, not a string ── +# app/features/scenarios/service.py ~line 112 +# BEFORE: if bundle.config.model_type == "regression": +# AFTER: if bundle.model.requires_features: +# return await self._simulate_model_exogenous(db, request, bundle, store_id, product_id) +# This mirrors forecasting/service.py, which already branches on +# `model.requires_features` / `bundle.model.requires_features`. A LightGBM bundle now +# takes the genuine re-forecast path; a baseline bundle still takes the heuristic path. + +# ── Task 3 — model_factory: gate first, then construct ── +elif model_type == "lightgbm": + if not settings.forecast_enable_lightgbm: + raise ValueError("LightGBM is not enabled. Set forecast_enable_lightgbm=True in settings.") + from app.features.forecasting.schemas import LightGBMModelConfig + if isinstance(config, LightGBMModelConfig): + return LightGBMForecaster( + n_estimators=config.n_estimators, + learning_rate=config.learning_rate, + max_depth=config.max_depth, + random_state=random_state, # threaded from settings.forecast_random_seed + ) + raise ValueError("Invalid config type for lightgbm") +``` + +### Integration Points + +```yaml +DEPENDENCY: + - pyproject.toml: + [project.optional-dependencies] ml-lightgbm = ["lightgbm>=4.5.0"]. + - uv.lock: regenerated by `uv lock` (CI installs with --frozen). + - CI: NO workflow change — ci.yml already runs `uv sync --frozen --all-extras --dev` + (lines 48, 74, 116, 163), so --all-extras installs ml-lightgbm and CI tests the path. + +CONFIG: + - No new setting. forecast_enable_lightgbm (config.py:101, default False) is the runtime + gate — UNCHANGED. forecast_random_seed (config.py:97, default 42) is the determinism + source threaded through model_factory — UNCHANGED. + +TRAIN PATH: + - ForecastingService.train_model — UNCHANGED. It already branches on + `model.requires_features`; a LightGBM model (requires_features=True) routes through + `_build_regression_features` automatically and persists feature_columns/history_tail. + +PREDICT PATH: + - POST /forecasting/predict — UNCHANGED. Rejects all requires_features models, LightGBM + included. LightGBM forecasts through POST /scenarios/simulate (model_exogenous). + +PERSISTENCE: + - ModelBundle: + lightgbm_version field (best-effort on save, mismatch-warn on load). + compute_hash unchanged → no bundle hash shifts. Old bundles load fine (dataclass default). + +REGISTRY: + - runtime_info JSONB: + "lightgbm_version" key when lightgbm is importable. JSONB → NO + Alembic migration. + +NO MIGRATION: this PRP touches no SQLAlchemy model and no Alembic version. +NO API CONTRACT CHANGE: no route, request/response schema, or WebSocket frame changes. +``` + +--- + +## Validation Loop + +### Level 0: Environment + +```bash +uv lock # refresh the lockfile after the pyproject edit +uv sync --extra dev --extra ml-lightgbm # install the optional extra locally +uv run python -c "import lightgbm; print('lightgbm', lightgbm.__version__)" +# Expected: prints a 4.x version. Without this, mypy/pyright on the lazy import + the +# LightGBM tests cannot run locally (CI installs --all-extras automatically). +``` + +### Level 1: Syntax & Style + +```bash +uv run ruff check . --fix && uv run ruff format --check . +# Expected: no errors. Fix everything before Level 2. +``` + +### Level 2: Type Checks + +```bash +uv run mypy app/ # --strict; gates merge +uv run pyright app/ # --strict; gates merge +# Watch for: the [[tool.mypy.overrides]] lightgbm.* block must exist; do NOT add an inline +# `# type: ignore[import-untyped]` on `import lightgbm` (warn_unused_ignores would flag it). +``` + +### Level 3: Unit Tests + +```bash +# New + extended tests +uv run pytest -v -m "not integration" app/features/forecasting/tests/test_lightgbm_forecaster.py +uv run pytest -v -m "not integration" app/features/forecasting/tests/test_service.py +uv run pytest -v -m "not integration" app/features/jobs/tests/test_service.py + +# Regression — these slices must stay green with NO behaviour change +uv run pytest -v -m "not integration" app/features/forecasting/tests/ +uv run pytest -v -m "not integration" app/features/backtesting/tests/ # loud-fail guard MUST still pass +uv run pytest -v -m "not integration" app/features/scenarios/tests/ + +# Whole fast suite +uv run pytest -v -m "not integration" +# Expected: all green. The baseline-forecaster tests and test_regression_forecaster.py must +# pass with ZERO edits. If lightgbm is somehow absent, test_lightgbm_forecaster.py SKIPS +# (importorskip) — it must never ERROR. +``` + +### Level 4: Integration Tests + +```bash +docker compose up -d && uv run alembic upgrade head +uv run pytest -v -m integration app/features/forecasting/ app/features/scenarios/ \ + app/features/jobs/ app/features/registry/ +# CRITICAL: the scenarios lightgbm model_exogenous test (Task 13) must report +# method="model_exogenous". No migration in this PRP → no `alembic downgrade` round-trip. +``` + +### Level 5: Manual Validation (dogfood — REQUIRED) + +```bash +# 1. The forecaster wires up and is deterministic +uv run python -c " +import numpy as np +from app.features.forecasting.models import LightGBMForecaster +rng = np.random.default_rng(0) +X = rng.normal(size=(80, 14)); y = (3.0 * X[:, 0] + rng.normal(size=80)).astype(np.float64) +a = LightGBMForecaster(random_state=7).fit(y, X).predict(12, X[:12]) +b = LightGBMForecaster(random_state=7).fit(y, X).predict(12, X[:12]) +np.testing.assert_array_equal(a, b); print('lightgbm deterministic OK', a[:3])" + +# 2. requires_features is correct +uv run python -c " +from app.features.forecasting.models import LightGBMForecaster +assert LightGBMForecaster.requires_features is True; print('requires_features OK')" + +# 3. End-to-end: train a lightgbm model and re-forecast it in a scenario. +# Set FORECAST_ENABLE_LIGHTGBM=true in .env, restart uvicorn, then: +# curl -sX POST localhost:8123/forecasting/train -H 'Content-Type: application/json' \ +# -d '{"store_id":1,"product_id":1,"train_start_date":"2024-01-01", +# "train_end_date":"2024-06-01","config":{"model_type":"lightgbm"}}' +# -> 200; take the run_id from the model_path, then POST /scenarios/simulate with it +# -> the ScenarioComparison "method" field is "model_exogenous". + +# 4. The optional dep stays optional — a baseline still works without it +# (in a venv WITHOUT ml-lightgbm): training a naive model must still succeed, and +# `import app.features.forecasting.models` must not raise ImportError. +``` + +--- + +## Final Validation Checklist + +- [ ] `uv run ruff check .` and `uv run ruff format --check .` clean. +- [ ] `uv run mypy app/` and `uv run pyright app/` clean (both --strict). +- [ ] `uv run pytest -v -m "not integration"` fully green; `test_lightgbm_forecaster.py` runs + (lightgbm installed) and passes — never ERRORs. +- [ ] `uv run pytest -v -m integration app/features/{forecasting,scenarios,jobs,registry}/` green, + including the scenarios `lightgbm` `model_exogenous` test. +- [ ] `model_factory(LightGBMModelConfig())` returns a `LightGBMForecaster` with the flag on, + raises a clear `ValueError` with the flag off. +- [ ] `test_feature_aware_model_fails_loud_in_backtest` and every baseline / `regression` + test pass with **no edit**. +- [ ] `git grep -n 'NotImplementedError' app/features/forecasting/models.py` no longer matches + the LightGBM line. +- [ ] `uv.lock` is regenerated and committed; the core `[project] dependencies` list is + UNCHANGED (LightGBM is only in `[project.optional-dependencies]`). +- [ ] No Alembic migration; no route/schema/WebSocket contract change. +- [ ] `git diff --stat` shows only intended files — no whole-file CRLF/LF noise diffs. +- [ ] An OPEN GitHub issue exists for this work (`gh issue view --json state` → `OPEN`); + commit `feat(forecast): add LightGBM feature-aware forecasting model (#)`; + branch `feat/forecasting-lightgbm-first-model` off `dev`. + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't implement XGBoost, Prophet, or any second model — that is PRP-MLZOO-C + (`INITIAL-MLZOO-index.md`). This PRP is LightGBM only. +- ❌ Don't add hyperparameter search, portfolio/global models, or an explainability change — + all explicitly out of scope in `INITIAL-MLZOO-B`. +- ❌ Don't wire feature-aware backtesting — DECISIONS LOCKED #6 defers it to PRP-MLZOO-B.2. + Keep `test_feature_aware_model_fails_loud_in_backtest`; do not touch `_run_model_backtest` + or `_execute_backtest`. +- ❌ Don't add `lightgbm` to the core `[project] dependencies` — it is an OPTIONAL extra + (DECISIONS LOCKED #1). Don't `import lightgbm` at module scope (DECISIONS LOCKED #7). +- ❌ Don't add fields to `LightGBMModelConfig` — DECISIONS LOCKED #3 keeps it conservative. +- ❌ Don't edit `ForecastingService.train_model` / `predict` — they already branch on + `requires_features`; a LightGBM model trains and is predict-rejected with zero service edits. +- ❌ Don't write a new leakage test for LightGBM — it reuses the already-pinned shared and + historical builders (DECISIONS LOCKED #5). Re-testing the same code is noise. +- ❌ Don't "fix" a determinism-test flake with `assert_allclose` — pin `n_jobs=1` + + `deterministic=True` + `force_col_wise=True` on `LGBMRegressor` and keep `assert_array_equal`. +- ❌ Don't forget `uv lock` — CI's `uv sync --frozen` fails on a stale lockfile. +- ❌ Don't make `test_lightgbm_forecaster.py` hard-require the extra — `pytest.importorskip` + so it SKIPS (never ERRORs) when `ml-lightgbm` is not installed. + +## Open Questions — ALL RESOLVED + +`INITIAL-MLZOO-B`'s "Required decisions" were resolved during PRP planning: +- **Dependency strategy** → optional dependency group `ml-lightgbm` (DECISIONS LOCKED #1; + user-confirmed). The sklearn fallback is rejected as a no-op (#2). +- **Advanced model config fields** → `LightGBMModelConfig` used unchanged, conservative (#3). +- **Dependency versions in registry/runtime metadata** → `ModelBundle.lightgbm_version` + + `runtime_info["lightgbm_version"]` (Tasks 7-8). +- **How prediction rejects missing future feature frames** → the existing capability-based + (`requires_features`) rejection in `POST /forecasting/predict` (#4). +- **Backtesting scope** → deferred to PRP-MLZOO-B.2 (#6; user-confirmed "Defer to a + follow-up PRP"). + +Nothing is left to litigate at implementation time. + +## Confidence Score + +**8 / 10** for one-pass implementation success. + +Rationale: the feature-aware foundation (PRP-29) did the hard structural work — the train +path, the predict rejection, and the shared frame contract already branch on +`requires_features`, so adding a second feature-aware model is a *contained* change: +one new class (a near-clone of the proven `RegressionForecaster`), one `model_factory` +branch, one scenarios-dispatch line, one jobs branch, and metadata + tests. There is no +migration, no API change, no baseline-model change, and no new algorithm to invent. +The −2 risk is concentrated in two places: (a) **LightGBM determinism** — `assert_array_equal` +requires `n_jobs=1` + `deterministic=True` + `force_col_wise=True`, and a miss flakes only in +CI; the pseudocode pins all three explicitly. (b) **the optional-dependency mechanics** — the +`uv lock` refresh, the mypy `ignore_missing_imports` override, and `pytest.importorskip` must +all land together or the gates fail; each is called out as a discrete task with its own +validation step. Both risks are caught fast — by Level 0/2 (env + types) and Level 3 (the +determinism test). The "baselines + regression + backtest guard must pass unedited" gate makes +any accidental regression impossible to miss. diff --git a/PRPs/PRP-MLZOO-B.2-feature-aware-backtesting.md b/PRPs/PRP-MLZOO-B.2-feature-aware-backtesting.md new file mode 100644 index 00000000..105a4075 --- /dev/null +++ b/PRPs/PRP-MLZOO-B.2-feature-aware-backtesting.md @@ -0,0 +1,707 @@ +name: "PRP-MLZOO-B.2 — Feature-Aware Backtesting Wiring" +description: | + +## Purpose + +The third unit of the **Advanced ML Model Zoo** sequence (`PRPs/INITIAL/INITIAL-MLZOO-index.md`), +sitting between MLZOO-B (PRP-30) and MLZOO-C. It wires the **existing feature-aware +forecasting models** — `RegressionForecaster` (PRP-27) and `LightGBMForecaster` (PRP-30), +i.e. every model with `requires_features=True` — into the **backtesting fold loop** so they +can be evaluated by `POST /backtesting/run` and `backtest` jobs. + +This is the explicit follow-up deferred by **PRP-30 DECISIONS LOCKED #6**, which named it +`PRP-MLZOO-B.2` and stated the reason: `BacktestingService._run_model_backtest` is +synchronous, DB-free, and target-only, so per-fold leakage-safe `X_train` / `X_future` +construction is itself PRP-sized. + +This PRP wires **zero new models**. It adds **no** XGBoost/Prophet, **no** frontend, **no** +explainability, **no** scenario-persistence change, **no** Alembic migration. If you find +yourself adding a model family, touching `frontend/`, or editing a `scenario_plan` schema — +stop; that is out of scope (see DECISIONS LOCKED #9). + +## What this PRP already inherits (DO NOT re-build) + +MLZOO-A (PRP-29, merged `b116489`) and MLZOO-B (PRP-30, merged `2f1b8a5` / PR #243) shipped +everything a feature-aware *model* needs. Re-use it; do not re-derive it: + +- **The capability flag.** `BaseForecaster.requires_features: ClassVar[bool]` + (`app/features/forecasting/models.py:64`). `RegressionForecaster` and `LightGBMForecaster` + both set it `True`; all three baselines leave it `False`. **Branch on this flag — never on + a `model_type` string.** +- **The shared feature-frame contract.** `app/shared/feature_frames/` owns the pinned + constants (`EXOGENOUS_LAGS`, `HISTORY_TAIL_DAYS`), `canonical_feature_columns()` (the + 14-column set + order), the leakage-safe pure builders `build_long_lag_columns` / + `build_calendar_columns`, and the `FeatureSafety` taxonomy + `feature_safety()` classifier. + This PRP **extends** that package; it writes no new contract constants. +- **The historical feature matrix.** `ForecastingService._assemble_regression_rows` + (`app/features/forecasting/service.py:114`) is the pure, leakage-safe historical row + builder, pinned by `app/features/forecasting/tests/test_regression_features_leakage.py`. + This PRP **promotes** it into `app/shared/feature_frames` so the backtesting slice can + consume it without a forbidden cross-slice import. +- **The future-frame pattern.** `app/features/scenarios/feature_frame.py::assemble_future_frame` + is the reference for a feature-aware model's *future* matrix (calendar + leakage-safe lags + + exogenous). This PRP builds the backtesting equivalent in `app/shared/feature_frames`. + The scenarios module is **read as a pattern, never imported or modified.** +- **The fold splitter.** `app/features/backtesting/splitter.py::TimeSeriesSplitter` is purely + index-based — each `TimeSeriesSplit` already carries `train_indices` / `test_indices` / + `train_dates` / `test_dates`. **It needs no change.** + +The **problem this PRP fixes**: `BacktestingService._load_series_data` loads only +`(date, quantity)`; `_run_model_backtest` calls `model.fit(y_train)` target-only; a +`requires_features=True` model raises `ValueError` at `fit()`. That loud failure is the +*interim* contract pinned by +`app/features/backtesting/tests/test_service.py::test_feature_aware_model_fails_loud_in_backtest` +(PRP-29 DECISIONS LOCKED #7). `JobService._execute_backtest` hard-rejects every non-baseline +`model_type`. The advanced LightGBM model from PRP-30 can be trained and scenario-re-forecast +but **cannot be evaluated against the baselines** — backtesting is the only honest comparison. + +## DEPENDS ON — read before starting + +- `PRPs/INITIAL/INITIAL-MLZOO-B.2-feature-aware-backtesting.md` — this PRP's brief. +- `PRPs/INITIAL/INITIAL-MLZOO-index.md` — the roadmap (A ✅ → B ✅ → **B.2 (this)** → C → D). +- `PRPs/PRP-29-feature-aware-forecasting-foundation.md` — DECISIONS LOCKED #2 + (`requires_features`), #6 (NaN-as-unknown), #7 (the interim backtest loud-fail superseded here). +- `PRPs/PRP-30-lightgbm-first-advanced-model.md` — DECISIONS LOCKED #6 defers feature-aware + backtesting to this PRP and explains why. +- `PRPs/ai_docs/exogenous-regressor-forecasting.md` — the leakage-safe future-frame rule. + +## Goal + +Make `POST /backtesting/run` (and `backtest` jobs) accept `regression` and `lightgbm` model +configs and evaluate them across time-series CV folds with **per-fold, leakage-safe** +`X_train` / `X_future` feature matrices — the advanced models compared head-to-head against +the naive / seasonal baselines, with **no target leakage** and **no train/serve skew**. + +## Why + +- **Portfolio completeness.** A forecasting system whose advanced model cannot be backtested + has no defensible model-selection story. PRP-30 delivered the model; this delivers its + evaluation. +- **Time-safety is the repo's load-bearing invariant** (`product-vision.md` §5). Wiring a + feature-consuming model into CV is exactly where leakage creeps in — doing it once, in a + shared and test-pinned way, protects every future MLZOO model (C, D). +- **It unblocks the MLZOO sequence.** MLZOO-C (XGBoost/Prophet) and MLZOO-D + (frontend/registry) both assume feature-aware models are backtestable; this is their gate. + +## What + +### Technical requirements + +1. **Per-fold `X_train`** — the historical feature matrix sliced to the fold's train rows. + Built once over the full series via the promoted shared builder; sliced by + `split.train_indices`. Leakage-safe by position: every row reads only strictly-earlier + observed targets. +2. **Per-fold `X_future`** — the test-window feature matrix, **rebuilt per fold** (never + sliced from the historical matrix). Target-lag columns use `build_long_lag_columns` with a + `history_tail` ending at the fold origin `T = train_end`; a lag cell whose source is a + test-window day is `NaN`. +3. **Async exogenous loading up front.** `run_backtest` (already `async`) resolves + `unit_price`, `promotion` windows, `calendar` holidays, and `product.launch_date` once, + into pure in-memory arrays. `_run_model_backtest` stays **sync and DB-free**. +4. **Capability branch.** `_run_model_backtest` branches on `model.requires_features`. + Target-only models keep the exact current code path; feature-aware models take the new + per-fold builder path. +5. **`JobService._execute_backtest`** accepts `regression` and `lightgbm` (the latter still + gated by `forecast_enable_lightgbm` inside `model_factory`). +6. **Loud failure** on every unsupported path — never a silent `NaN`/`0.0` fill. +7. **No frontend contract drift.** `_shape_backtest_result` and the `/visualize/backtest` + job-result keys stay byte-stable; new schema fields are additive only. + +### Future feature classes — the `X_future` policy (the architectural core) + +Every canonical column is classified by the **existing** `FeatureSafety` enum +(`app/shared/feature_frames/contract.py`). `X_future` populates each class as follows: + +| Class | Columns | `X_future` source | Leakage status | +|-------|---------|-------------------|----------------| +| `SAFE` | `dow_sin/cos`, `month_sin/cos`, `is_weekend`, `is_month_end`, `is_holiday`, `days_since_launch` | Pure function of the test-window date / timeless attribute (calendar table, product launch date) | None — never reads a target | +| `CONDITIONALLY_SAFE` | `lag_1`, `lag_7`, `lag_14`, `lag_28` | `build_long_lag_columns(history_tail, …)` — `history_tail` ends at `T`; a cell whose source day is in the test window is `NaN` | None — `NaN`-where-future is structurally enforced | +| `UNSAFE_UNLESS_SUPPLIED` | `price_factor`, `promo_active` | v1 policy `observed`: recorded `sales_daily.unit_price` / `promotion` rows for the test window | **No target leakage**; *is* exogenous foresight (see below) | + +**Target leakage vs. exogenous foresight — the line this PRP draws explicitly.** The repo's +load-bearing leakage rule is: *never read an observed target at a horizon day*. Target-lag +columns obey it structurally (`build_long_lag_columns`). The `UNSAFE_UNLESS_SUPPLIED` +exogenous columns (`price_factor`, `promo_active`) read **recorded price/promotion** for the +test window — never the target `y`. That is **not** target leakage. It **is** *exogenous +foresight*: the backtest assumes the future price/promotion calendar was known at `T`. For +retail demand that is realistic (promo calendars are planned ahead) and it keeps `X_train` +and `X_future` distributionally identical (no train/serve skew — both read same-day observed +exogenous). It is, however, optimistic, so the result **records `exogenous_policy="observed"`** +and the metric must be interpreted as "accuracy given a known promo/price plan". A future +PRP may add an `assumptions` policy; v1 ships exactly one. + +### Success Criteria + +- [ ] `POST /backtesting/run` with a `regression` model config returns `200` with per-fold + metrics and a baseline comparison. +- [ ] A `backtest` job with `model_type="regression"` completes `success`. +- [ ] A feature-aware backtest's `X_future` lag cells are `NaN` exactly where their source + day falls in the test window — pinned by a new shared leakage test. +- [ ] `X_train` and `X_future` use the identical `canonical_feature_columns()` set and order. +- [ ] Every existing baseline backtest test passes with **zero** edits. +- [ ] `_shape_backtest_result` output keys are byte-stable (frontend contract intact). +- [ ] An unsupported feature-aware path raises a loud `ValueError`, never silently degrades. +- [ ] `ruff` + `mypy --strict` + `pyright --strict` clean; unit + integration suites green. + +## All Needed Context + +### Documentation & References + +```yaml +- file: app/features/backtesting/service.py + why: _load_series_data (extend), SeriesData (extend), _run_model_backtest (branch), + _run_baseline_comparisons, _validate_config (add min-train guard). + +- file: app/features/backtesting/splitter.py + why: TimeSeriesSplit carries train/test indices + dates. NO CHANGE — index-based. + +- file: app/features/forecasting/service.py + why: _build_regression_features = the async exogenous-resolution pattern to mirror; + _assemble_regression_rows = the pure historical builder to promote to shared. + +- file: app/shared/feature_frames/contract.py + why: canonical_feature_columns(), build_long_lag_columns, build_calendar_columns, + FeatureSafety + feature_safety(). The contract this PRP extends. + +- file: app/features/scenarios/feature_frame.py + why: assemble_future_frame = the future-matrix PATTERN. Reference only — do NOT import + (backtesting -> scenarios is a forbidden cross-slice import). + +- file: app/features/jobs/service.py + why: _execute_backtest (widen the model_type allow-list); _shape_backtest_result + (the frontend contract — additive changes only, keys must not move). + +- file: app/shared/feature_frames/tests/test_leakage.py + why: the load-bearing leakage-test pattern the new builder must follow. + +- file: app/features/forecasting/tests/test_regression_features_leakage.py + why: pins _assemble_regression_rows — must stay GREEN after the promotion (shim). + +- url: https://otexts.com/fpp3/tscv.html + why: time-series cross-validation — the standard this fold loop implements. +``` + +### Current Codebase tree (relevant — all already exist) + +``` +app/ + shared/feature_frames/ + __init__.py # re-exports the contract surface + contract.py # constants, builders, FeatureSafety + tests/test_leakage.py # load-bearing leakage spec + tests/test_contract.py # AST walk — no app/features import + features/ + backtesting/ + service.py # SeriesData, run_backtest, _run_model_backtest (target-only) + splitter.py # TimeSeriesSplitter — index-based, unchanged + schemas.py # BacktestConfig, ModelBacktestResult, BacktestResponse + tests/test_service.py, test_service_integration.py, test_routes_integration.py + forecasting/ + service.py # _assemble_regression_rows (to promote), _build_regression_features + jobs/ + service.py # _execute_backtest (allow-list), _shape_backtest_result +``` + +### Desired Codebase tree — files to ADD + +``` +app/shared/feature_frames/ + rows.py # build_historical_feature_rows (promoted), + # build_future_feature_rows (NEW, leakage-safe) +app/features/backtesting/tests/ + test_feature_aware_backtest.py # unit tests for the fold builders + loud-fail +PRPs/ + PRP-MLZOO-B.2-feature-aware-backtesting.md # this file +``` + +### Files to MODIFY (all additive or behaviour-preserving) + +``` +app/shared/feature_frames/__init__.py # export the two row builders +app/shared/feature_frames/tests/test_leakage.py # ADD build_future_feature_rows leakage spec +app/shared/feature_frames/tests/test_contract.py # ADD: rows.py imports nothing from app/features +app/features/forecasting/service.py # _assemble_regression_rows -> delegating shim +app/features/backtesting/service.py # ExogenousFrame, exogenous load, fold branch +app/features/backtesting/schemas.py # ADD feature_aware + exogenous_policy (additive) +app/features/jobs/service.py # _execute_backtest: accept regression + lightgbm +app/features/backtesting/tests/test_service.py # repurpose the interim loud-fail test +app/features/backtesting/tests/test_service_integration.py # feature-aware DB-backed backtest +app/features/backtesting/tests/test_routes_integration.py # POST /backtesting/run regression +app/features/backtesting/tests/test_schemas.py # the new additive fields +app/features/jobs/tests/test_service.py # backtest job with model_type=regression +examples/models/feature_frame_contract.md # document the backtest future-frame +docs/PHASE/5-BACKTESTING.md # feature-aware backtesting section +README.md # backtest model list: add regression/lightgbm +PRPs/INITIAL/INITIAL-MLZOO-index.md # note B.2 -> this PRP +``` + +### DECISIONS LOCKED (resolved during planning — do NOT re-litigate) + +1. **The per-fold row builders live in `app/shared/feature_frames/rows.py`.** `backtesting` + may not import `forecasting` or `scenarios` (vertical-slice rule). `app/shared/` is the + sanctioned cross-cutting home and already owns the column builders. `rows.py` (pure, + stdlib-only, `app/features` import forbidden — same as `contract.py`) holds the two + row-matrix assemblers. `contract.py` stays the column-builder + taxonomy home. + +2. **`_assemble_regression_rows` is PROMOTED, not duplicated, and the promotion is additive.** + Its body moves verbatim to `build_historical_feature_rows` in `rows.py`. + `ForecastingService._assemble_regression_rows` becomes a one-line delegating shim + (`return build_historical_feature_rows(...)`). `test_regression_features_leakage.py` + imports the shim by its old name and stays **GREEN with zero edits** — the existing + leakage test is not weakened, not moved, not touched. + +3. **`X_train` is sliced from one full-series historical matrix; `X_future` is NEVER sliced.** + The historical matrix is built once over `dates[0:N]` (each row reads only strictly-earlier + targets → leakage-safe as a *training* row). Per fold, `X_train = matrix[train_indices]`. + `X_future` MUST be rebuilt per fold via `build_future_feature_rows` — slicing the + historical matrix for test rows would let `lag_1` read an adjacent test-day observed target + (**target leakage**). This asymmetry is the crux of the PRP (see GOTCHA below). + +4. **The v1 `X_future` exogenous policy is `observed`, recorded on the result.** `price_factor` + / `promo_active` for the test window come from recorded `sales_daily.unit_price` / + `promotion` rows. This is exogenous foresight, not target leakage (it never reads `y`), and + it keeps `X_train`/`X_future` skew-free (both read same-day observed exogenous). + `ModelBacktestResult.exogenous_policy` records `"observed"` so the metric is interpreted + honestly. A `Literal["observed"]` (one value in v1) is the documented extension seam — a + future PRP may add `"assumptions"` without a breaking change. + +5. **Branch on `model.requires_features`, never on a `model_type` string.** `run_backtest` + builds a cheap probe model from `config.model_config_main` to read the flag *before* the + fold loop, deciding whether to load exogenous data. Mirrors + `ForecastingService.train_model`, which already branches on exactly this flag. + The probe is a no-fit `model_factory(...)` construction (cheap) — each of the three + sites that need the flag (`_validate_config`, `run_backtest`, `_run_model_backtest`) + builds its own probe locally; there is no need to thread one instance through, and + `BacktestingService` keeps no probe/matrix instance state. + +6. **The fold loop stays sync and DB-free.** All DB I/O happens once in `run_backtest` + (`async`), resolved into a pure `ExogenousFrame`. `_run_model_backtest` and the row + builders remain unit-testable without a database — the existing architecture is preserved. + +7. **`min_train_size >= 30` is enforced for feature-aware backtests.** `_validate_config` + raises `ValueError` when the main model `requires_features` and + `split_config.min_train_size < 30` (`_MIN_REGRESSION_TRAIN_ROWS`) — each fold's train + window must resolve the lag features. Loud, not silent. + +8. **The interim loud-fail test is REPURPOSED, not deleted.** + `test_feature_aware_model_fails_loud_in_backtest` asserted a `regression` backtest raises + `ValueError`. After this PRP it succeeds. The test is rewritten as (a) a positive + "feature-aware backtest runs and yields metrics" assertion and (b) a new loud-fail + assertion for the genuinely-unsupported path (a `requires_features` model with no + `ExogenousFrame` loaded → `ValueError`). PRP-29 DECISIONS LOCKED #7 and PRP-30 DECISIONS + LOCKED #6 are **superseded** — note this in the PRP commit body and the test docstring. + +9. **OUT OF SCOPE — do not touch.** No new model family (XGBoost/Prophet = MLZOO-C). No + `frontend/` change — `_shape_backtest_result` keys stay byte-stable. No explainability + (MLZOO-D). No `scenario_plan` / `/scenarios/*` change. No Alembic migration (this PRP adds + no table/column). No recursive multi-step forecasting — `NaN`-as-unknown is kept. + +### Known Gotchas of our codebase & Library Quirks + +```python +# CRITICAL: X_future is NEVER a slice of the historical matrix. The historical row for a +# test day reads quantities[i-1] (lag_1) — for a test day that source is an adjacent +# observed TEST-DAY target == target leakage. X_future MUST be rebuilt per fold with +# build_long_lag_columns(history_tail_ending_at_T, ...) so future-sourced lag cells are NaN. + +# CRITICAL: gap offset. With gap > 0 the first test day is T + gap + 1, but +# build_long_lag_columns indexes test day m as T + m. Call it with +# horizon = gap + test_size and DROP the first `gap` rows. With gap=0 (the common case) +# this is a no-op. test_feature_aware_backtest.py MUST cover a gap>0 fold. + +# CRITICAL: history_tail ends at T = train_end, and EXCLUDES the gap days. The gap simulates +# operational data latency — data for gap days is "not yet available" at forecast time. +# history_tail = series.values[:train_end_idx][-HISTORY_TAIL_DAYS:] where +# train_end_idx = split.train_indices[-1] + 1. + +# GOTCHA: leaf-level import rule. app/shared/feature_frames/rows.py may NEVER import from +# app/features/** (test_contract.py enforces it with an AST walk). Keep rows.py pure — +# stdlib math/datetime only, same as contract.py. + +# GOTCHA: SeriesData.__post_init__ computes n_observations from `values`. Adding an optional +# `exogenous: ExogenousFrame | None = None` field is fine — keep it last, keep the default. + +# GOTCHA: ModelBacktestResult is frozen-free but consumed by _shape_backtest_result. New +# fields (feature_aware: bool = False, exogenous_policy: str | None = None) MUST have +# defaults so every existing construction site and test stays valid. + +# GOTCHA: lightgbm in a backtest job. _execute_backtest building a LightGBMModelConfig is +# fine — model_factory still raises ValueError if forecast_enable_lightgbm is False. That +# surfaces as a failed job (loud), which is correct. Do not pre-check the flag in jobs. + +# GOTCHA: baseline comparison. _run_baseline_comparisons runs naive + seasonal_naive — both +# target-only (requires_features=False). They take the UNCHANGED target-only fold path +# even when the main model is feature-aware. Do not feed them X. + +# GOTCHA: line endings — repo has mixed CRLF/LF, no .gitattributes. Run `git diff --stat` +# before committing; re-normalise any whole-file diff to the file's original ending. +``` + +## Implementation Blueprint + +### Data models and structure + +```python +# app/shared/feature_frames/rows.py (NEW — pure, stdlib only) + +def build_historical_feature_rows( + *, dates: list[date], quantities: list[float], prices: list[float], + baseline_price: float, promo_dates: set[date], holiday_dates: set[date], + launch_date: date | None, +) -> list[list[float]]: + """Promoted verbatim from ForecastingService._assemble_regression_rows. + Row i: target lags read quantities[i-lag] (strictly earlier), calendar columns + are pure, exogenous columns read same-day observed attributes. canonical order.""" + +def build_future_feature_rows( + *, test_dates: list[date], history_tail: list[float], gap: int, + test_prices: list[float], baseline_price: float, test_promo_dates: set[date], + test_holiday_dates: set[date], launch_date: date | None, +) -> list[list[float]]: + """Leakage-safe test-window matrix. lag_* columns come from + build_long_lag_columns(history_tail, gap + len(test_dates))[gap:] — NaN where the + source day is in the test window. Calendar columns pure. price_factor / promo_active + from the OBSERVED test-window records (policy='observed'). canonical order. + Raises ValueError if asked to emit a column it cannot classify/source.""" +``` + +```python +# app/features/backtesting/service.py (MODIFY) + +@dataclass +class ExogenousFrame: + """Pre-loaded exogenous data for one series — resolved async in run_backtest, + consumed by the pure/sync fold loop.""" + prices: list[float] # aligned with SeriesData.dates + baseline_price: float # median positive price (>0 fallback 1.0) + promo_dates: set[date] + holiday_dates: set[date] + launch_date: date | None + +@dataclass +class SeriesData: + dates: list[date] + values: np.ndarray + store_id: int + product_id: int + exogenous: ExogenousFrame | None = None # NEW — present only for feature-aware runs + n_observations: int = field(init=False) +``` + +### list of tasks (dependency-ordered) + +```yaml +# ════════ STEP 1 — Shared row builders (pure, no behaviour change) ════════ +Task 1: CREATE app/shared/feature_frames/rows.py + - build_historical_feature_rows: body lifted verbatim from _assemble_regression_rows. + - build_future_feature_rows: NEW — see Per-task pseudocode. + - Pure: import only math/datetime + the contract builders. No app/features import. + +Task 2: MODIFY app/shared/feature_frames/__init__.py + - Export build_historical_feature_rows, build_future_feature_rows. + +Task 3: MODIFY app/features/forecasting/service.py + - _assemble_regression_rows becomes a one-line shim delegating to + build_historical_feature_rows. Keep the signature and name byte-identical so + test_regression_features_leakage.py imports stay valid. + +# ════════ STEP 2 — Shared leakage spec ════════ +Task 4: MODIFY app/shared/feature_frames/tests/test_leakage.py + - ADD: build_future_feature_rows lag cells are NaN exactly where source day is in the + test window; an observed test-day target never appears as a lag value out of place; + gap>0 case; the historical-vs-future asymmetry. +Task 5: MODIFY app/shared/feature_frames/tests/test_contract.py + - ADD rows.py to the AST walk asserting no app/features import. + +# ════════ STEP 3 — Backtesting schemas (additive) ════════ +Task 6: MODIFY app/features/backtesting/schemas.py + - ModelBacktestResult: ADD feature_aware: bool = False, + exogenous_policy: Literal["observed"] | None = None (defaults preserve all callers). + +# ════════ STEP 4 — Backtesting service wiring ════════ +Task 7: MODIFY app/features/backtesting/service.py + - ADD ExogenousFrame; ADD optional SeriesData.exogenous. + - _validate_config: if main model requires_features and min_train_size < 30 -> ValueError. + - run_backtest: build a probe model from config.model_config_main; if requires_features, + call new async _load_exogenous_frame and attach to series_data.exogenous. + - _run_model_backtest: signature is UNCHANGED (still series_data, splitter, + model_config, store_fold_details). It builds a probe model, branches on + probe.requires_features, reads gap from splitter.config.gap, and builds the full + historical matrix as a LOCAL once before the fold loop. + target-only -> existing code path, untouched. + feature-aware -> _run_feature_aware_fold (new helper, all args explicit): per fold + slice X_train from the local historical matrix, build X_future, + fit(y,X_train), predict(test_size, X_future). Set feature_aware + + exogenous_policy on the ModelBacktestResult. + - ADD _load_exogenous_frame (async): unit_price per date, promotion windows, calendar + holidays, product.launch_date — mirrors _build_regression_features. + +# ════════ STEP 5 — Jobs integration ════════ +Task 8: MODIFY app/features/jobs/service.py + - _execute_backtest: add regression + lightgbm branches building RegressionModelConfig / + LightGBMModelConfig. _shape_backtest_result UNCHANGED (frontend contract byte-stable). + +# ════════ STEP 6 — Tests ════════ +Task 9: CREATE app/features/backtesting/tests/test_feature_aware_backtest.py + - Pure unit tests: per-fold X_train/X_future shape + column order; gap>0 fold; + feature-aware model with exogenous=None -> loud ValueError. +Task 10: MODIFY app/features/backtesting/tests/test_service.py + - Repurpose test_feature_aware_model_fails_loud_in_backtest (DECISIONS LOCKED #8). +Task 11: MODIFY app/features/backtesting/tests/test_service_integration.py + - DB-backed regression backtest vs naive/seasonal baselines in one response. +Task 12: MODIFY app/features/backtesting/tests/test_routes_integration.py + - POST /backtesting/run with a regression model config -> 200 + per-fold metrics. +Task 13: MODIFY app/features/backtesting/tests/test_schemas.py + - ModelBacktestResult new fields: defaults + explicit values. +Task 14: MODIFY app/features/jobs/tests/test_service.py + - backtest job with model_type="regression" -> success + shaped result. + +# ════════ STEP 7 — Docs ════════ +Task 15: MODIFY examples/models/feature_frame_contract.md, docs/PHASE/5-BACKTESTING.md, + README.md (backtest model list), PRPs/INITIAL/INITIAL-MLZOO-index.md (B.2 row). +``` + +### Per-task pseudocode (critical details only) + +```python +# ── Task 1 — build_future_feature_rows (the leakage-critical builder) ── +def build_future_feature_rows(*, test_dates, history_tail, gap, test_prices, + baseline_price, test_promo_dates, test_holiday_dates, + launch_date): + horizon = len(test_dates) + columns = canonical_feature_columns() + # lags: build for gap+horizon days, drop the gap lead-in. NaN where source > T. + lag_cols = build_long_lag_columns(history_tail, gap + horizon) # {"lag_k": [...]} + lag_cols = {k: v[gap:] for k, v in lag_cols.items()} + cal_cols = build_calendar_columns(test_dates) # SAFE — pure + rows: list[list[float]] = [] + for j, day in enumerate(test_dates): + row: list[float] = [] + for col in columns: + safety = feature_safety(col) # raises KeyError on unknown -> loud + if col.startswith("lag_"): # CONDITIONALLY_SAFE + row.append(lag_cols[col][j]) + elif col in cal_cols: # SAFE + row.append(cal_cols[col][j]) + elif col == "price_factor": # UNSAFE_UNLESS_SUPPLIED -> observed + row.append(test_prices[j] / baseline_price) + elif col == "promo_active": # UNSAFE_UNLESS_SUPPLIED -> observed + row.append(1.0 if day in test_promo_dates else 0.0) + elif col == "is_holiday": # SAFE — calendar timeless attribute + row.append(1.0 if day in test_holiday_dates else 0.0) + elif col == "days_since_launch": # SAFE — pure fn of date + row.append(float((day - launch_date).days) if launch_date else math.nan) + else: + raise ValueError(f"build_future_feature_rows: unsourced column {col!r}") + rows.append(row) + return rows + +# ── Task 7 — _run_model_backtest branch + _run_feature_aware_fold (pure, sync) ── +# _run_model_backtest gains NO new parameters. `gap` is read from the splitter it already +# receives (splitter.config.gap — SplitConfig is reachable as TimeSeriesSplitter.config). +# The full historical matrix is a LOCAL built once before the fold loop — there is no +# self._historical_matrix and no self.config on BacktestingService (__init__ sets only +# self.settings + self.metrics_calculator). _run_feature_aware_fold takes everything it +# needs as explicit arguments — no phantom instance state. + +def _run_model_backtest(self, series_data, splitter, model_config, store_fold_details): + probe = model_factory(model_config, random_state=self.settings.forecast_random_seed) + feature_aware = probe.requires_features # capability flag, never a string + historical_matrix: np.ndarray | None = None + if feature_aware: + if series_data.exogenous is None: + raise ValueError("feature-aware backtest requires a loaded ExogenousFrame") + exo = series_data.exogenous + historical_matrix = np.array(build_historical_feature_rows( + dates=series_data.dates, quantities=series_data.values.tolist(), + prices=exo.prices, baseline_price=exo.baseline_price, + promo_dates=exo.promo_dates, holiday_dates=exo.holiday_dates, + launch_date=exo.launch_date), dtype=np.float64) + for split in splitter.split(series_data.dates, series_data.values): + if feature_aware: + predictions = self._run_feature_aware_fold( + series_data, split, model_config, historical_matrix, splitter.config.gap) + else: + ... # existing target-only path — UNCHANGED + ... # metrics / FoldResult assembly is shared, unchanged + # set feature_aware=feature_aware and exogenous_policy on the returned ModelBacktestResult. + +def _run_feature_aware_fold(self, series_data, split, model_config, + historical_matrix, gap): + exo = series_data.exogenous # caller already guaranteed non-None + # X_train — slice the full historical matrix (built once, leakage-safe by position) + X_train = historical_matrix[split.train_indices] + y_train = series_data.values[split.train_indices] + # X_future — rebuilt per fold; history_tail ends at T = train_end, excludes gap + train_end_idx = int(split.train_indices[-1]) + 1 + history_tail = series_data.values[:train_end_idx][-HISTORY_TAIL_DAYS:].tolist() + test_idx = split.test_indices + X_future = np.array(build_future_feature_rows( + test_dates=split.test_dates, history_tail=history_tail, gap=gap, + test_prices=[exo.prices[i] for i in test_idx], baseline_price=exo.baseline_price, + test_promo_dates={series_data.dates[i] for i in test_idx if series_data.dates[i] in exo.promo_dates}, + test_holiday_dates={d for d in split.test_dates if d in exo.holiday_dates}, + launch_date=exo.launch_date), dtype=np.float64) + model = model_factory(model_config, random_state=self.settings.forecast_random_seed) + model.fit(y_train, X_train) + return model.predict(len(test_idx), X_future) + +# ── Task 8 — _execute_backtest allow-list ── +# elif model_type == "regression": model_config = RegressionModelConfig() +# elif model_type == "lightgbm": model_config = LightGBMModelConfig() +# else: raise ValueError(f"Unsupported model_type: {model_type}") # e.g. "arima" +``` + +### Integration Points + +```yaml +BACKTESTING SERVICE: + - run_backtest stays the only async entry; _run_model_backtest stays sync. + - the full historical matrix is built once per _run_model_backtest call (feature-aware + path) as a LOCAL variable, sliced per fold — never per-fold rebuilt for X_train, and + never stored on the service instance. + +JOBS: + - _execute_backtest gains regression + lightgbm; _shape_backtest_result is NOT touched. + - a backtest job for a disabled lightgbm fails loud via model_factory — expected. + +SHARED CONTRACT: + - rows.py joins contract.py under app/shared/feature_frames; __init__.py re-exports both. + - forecasting + backtesting both consume one definition — no column-order drift possible. + +NO CHANGE: + - splitter.py, scenarios/**, frontend/**, alembic/**, registry/** — untouched. +``` + +## Phased Execution Plan + +This is one coherent architectural change and **fits one reviewable PR** (~1 branch, +`feat/backtesting-feature-aware-folds`, off `dev`, tracked by **GitHub issue #244** — +every commit references `(#244)`). If the reviewer +prefers a smaller diff, split along the natural seam — Phase 1 is independently mergeable +because it is pure and behaviour-preserving: + +- **Phase 1 — Shared builders + leakage spec (Tasks 1–5).** Promote + `build_historical_feature_rows`, add `build_future_feature_rows`, wire the delegating shim, + add the shared leakage tests. Zero behaviour change — every existing test stays green. A + self-contained PR that lands the contract without touching backtesting. +- **Phase 2 — Backtesting + jobs wiring (Tasks 6–15).** Consume the builders: schema fields, + async exogenous load, the fold-loop branch, jobs allow-list, integration tests, docs. + +Recommended: ship as **one PR** unless the diff is judged too large at review time; the +phase boundary is the fallback, not the default. + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check . && uv run ruff format --check . +``` + +### Level 2: Type Checks + +```bash +uv run mypy app/ && uv run pyright app/ +# Watch: rows.py must type cleanly with no app/features import; the new optional +# SeriesData.exogenous and the additive ModelBacktestResult fields must not break callers. +``` + +### Level 3: Unit Tests + +```bash +uv run pytest -v -m "not integration" \ + app/shared/feature_frames/tests/ \ + app/features/backtesting/tests/test_service.py \ + app/features/backtesting/tests/test_feature_aware_backtest.py \ + app/features/backtesting/tests/test_schemas.py \ + app/features/forecasting/tests/test_regression_features_leakage.py \ + app/features/jobs/tests/test_service.py +# test_regression_features_leakage.py MUST pass with ZERO edits (the shim preserves it). +uv run pytest -v -m "not integration" # whole fast suite — all green +``` + +### Level 4: Integration Tests + +```bash +docker compose up -d +uv run pytest -v -m integration \ + app/features/backtesting/tests/test_service_integration.py \ + app/features/backtesting/tests/test_routes_integration.py +# A regression backtest must return per-fold metrics + a baseline comparison; +# the response carries exogenous_policy="observed" on the main-model result. +``` + +### Level 5: Manual Validation (dogfood — REQUIRED) + +```bash +# 1. A regression backtest runs end-to-end (needs seeded data). +curl -sX POST localhost:8123/backtesting/run -H 'Content-Type: application/json' \ + -d '{"store_id":1,"product_id":1,"start_date":"2024-01-01","end_date":"2024-12-01", + "config":{"model_config_main":{"model_type":"regression"}, + "split_config":{"n_splits":3,"min_train_size":60,"horizon":14}}}' +# -> 200; main_model_results.feature_aware == true; exogenous_policy == "observed"; +# baseline_results has naive + seasonal_naive; leakage_check_passed == true. + +# 2. min_train_size guard fires loud. +# ... same call with "min_train_size":20 -> 400 RFC-7807, "at least 30". + +# 3. A backtest job with model_type=regression completes success. +curl -sX POST localhost:8123/jobs -H 'Content-Type: application/json' \ + -d '{"job_type":"backtest","params":{"model_type":"regression","store_id":1, + "product_id":1,"start_date":"2024-01-01","end_date":"2024-12-01","n_splits":3}}' +# -> poll GET /jobs/{id} -> status "success", result has per-fold metrics. + +# 4. Baselines unaffected — a naive backtest still works exactly as before. +``` + +## Final Validation Checklist + +- [ ] `ruff` + `mypy --strict` + `pyright --strict` clean. +- [ ] Whole fast unit suite green; `test_regression_features_leakage.py` unedited & green. +- [ ] New shared leakage spec proves `X_future` lag cells are `NaN`-where-future (incl. gap>0). +- [ ] Integration: a `regression` backtest returns per-fold metrics + baseline comparison. +- [ ] `POST /backtesting/run` with `regression` → `200`; with `min_train_size<30` → `400`. +- [ ] A `backtest` job with `model_type="regression"` → `success`. +- [ ] `_shape_backtest_result` output keys byte-identical to pre-PRP (frontend contract). +- [ ] Every baseline backtest test green with zero edits. +- [ ] The interim loud-fail test is repurposed (not deleted); supersession noted. +- [ ] No `frontend/`, `scenarios/`, `alembic/` change; no new migration. +- [ ] `git diff --stat` shows no whole-file CRLF/LF noise. + +## Anti-Patterns to Avoid + +- ❌ Slicing the historical matrix for `X_future` — that leaks adjacent test-day targets. +- ❌ Filling an unknowable future column with `0.0`/`NaN` silently — raise `ValueError`. +- ❌ Branching the fold loop on a `model_type` string — branch on `requires_features`. +- ❌ Importing `forecasting`/`scenarios` from `backtesting` — promote to `app/shared/`. +- ❌ Doing DB I/O inside `_run_model_backtest` — keep it sync; load async up front. +- ❌ Re-deriving `build_long_lag_columns` / `canonical_feature_columns()` — reuse the contract. +- ❌ Weakening or deleting `test_feature_aware_model_fails_loud_in_backtest` — repurpose it. +- ❌ Editing `_shape_backtest_result` keys or any `frontend/` file — out of scope. +- ❌ Adding XGBoost/Prophet, hyperparameter search, or a migration — all out of scope. + +## Open Questions + +1. **`exogenous_policy` v1 = `observed` only.** This PRP ships exactly one policy (recorded + `price`/`promo` for the test window — exogenous foresight, target-leakage-free). A stricter + `origin_carry_forward` policy (carry the last observed price/promo state from `T`, zero + foresight) and an `assumptions` policy (planner-supplied, mirroring the scenarios slice) + are deliberately deferred. **Resolve at PRP review:** is one policy acceptable for v1, or + should `origin_carry_forward` ship alongside as the conservative default? The `Literal` + field is the seam either way. +2. **Feature-aware baseline comparison.** v1 compares a feature-aware main model only against + the *target-only* naive/seasonal baselines. Whether `regression` should also auto-run as a + baseline for a `lightgbm` main model (advanced-vs-advanced) is left to MLZOO-D — flag if + the reviewer wants it sooner. +3. **Per-series caching of the historical matrix.** Built once per `run_backtest` call; not + cached across calls. Fine for single-series backtests; revisit only if a portfolio/batch + backtester (a separate optional feature) ever lands. + +## Confidence Score + +**9/10** for one-pass implementation. The contract (MLZOO-A), both feature-aware models +(PRP-27, PRP-30), the splitter, and the leakage-test patterns all already exist and are +stable. The only genuine design judgement — the `X_future` exogenous policy — is resolved +and locked (DECISIONS LOCKED #4) with Open Question #1 as the explicit review hook. The work +is additive, single-slice-plus-shared, and needs no migration. diff --git a/PRPs/PRP-MLZOO-C1-xgboost-model.md b/PRPs/PRP-MLZOO-C1-xgboost-model.md new file mode 100644 index 00000000..16ac07d5 --- /dev/null +++ b/PRPs/PRP-MLZOO-C1-xgboost-model.md @@ -0,0 +1,979 @@ +name: "PRP-MLZOO-C1 — XGBoost Feature-Aware Forecasting Model" +description: | + +## Purpose + +The first half of MLZOO-C (`PRPs/INITIAL/INITIAL-MLZOO-C-xgboost-prophet-extensions.md`). +It adds **one** advanced, feature-aware forecasting model — `XGBoostForecaster`, wrapping +`xgboost.XGBRegressor` — as a low-risk follow-up that **mirrors the merged +`LightGBMForecaster` design byte-for-byte** (PRP-30 / MLZOO-B, commit `2f1b8a5`). + +This PRP implements **XGBoost only**: its `XGBoostModelConfig` schema, the +`XGBoostForecaster` class, its `model_factory` wiring, the `forecast_enable_xgboost` +runtime flag, the `ml-xgboost` optional dependency group, the jobs train/backtest +branches, the reproducibility metadata, and tests. It adds **no** Prophet-like model +(that is PRP-MLZOO-C2, a separate branch and review unit — see DECISIONS LOCKED #1), +**no** hyperparameter search, **no** portfolio/global models, **no** frontend, and +**no** explainability change. + +> **Sibling PRP:** `PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md` ships the +> Prophet-like additive model. C1 and C2 are intentionally **separate branches and +> separate review units** — never combine them. They are additive and order-independent; +> whichever merges second rebases cleanly (see "Sibling-PRP integration" below). + +## What this PRP already inherits (DO NOT re-build) + +PRP-29 (MLZOO-A), PRP-30 (MLZOO-B), and PRP-MLZOO-B.2 (feature-aware backtesting, +PR #244) already shipped the entire structural foundation a new feature-aware model +stands on. Re-use it; do not re-derive it: + +- **The feature-aware model contract.** `BaseForecaster.requires_features: ClassVar[bool]` + (`app/features/forecasting/models.py:64`). `RegressionForecaster` (`models.py:438`) and + `LightGBMForecaster` (`models.py:580`) are the *existing* feature-aware models — + `requires_features = True`, `fit(y, X)` / `predict(horizon, X)` both require a + non-`None` `X`. `XGBoostForecaster` is their structural twin. +- **The shared feature-frame contract.** `app/shared/feature_frames/` owns the pinned + constants, `canonical_feature_columns()` (the 14-column set), the leakage-safe pure + builders, and the `FeatureSafety` taxonomy. A new feature-aware model writes **zero** + new contract code. +- **The training-frame branch.** `ForecastingService.train_model` + (`app/features/forecasting/service.py:180-297`) branches on `model.requires_features` + (`service.py:219`) — **model-type-agnostic**, no string compare. If true it builds the + historical frame via `_build_regression_features` and calls `model.fit(features.y, + features.X)`, persisting `feature_columns` / `history_tail` / `launch_date` into the + bundle metadata. **An XGBoost model trains with zero changes to `train_model`.** +- **The predict rejection.** `ForecastingService.predict` (`service.py:383-393`) rejects + any `bundle.model.requires_features` model — capability-based, not `model_type`-string. + An XGBoost model is rejected there automatically; it forecasts through + `POST /scenarios/simulate`. +- **The scenario `model_exogenous` dispatch.** `app/features/scenarios/service.py:114` + already branches on `bundle.model.requires_features` — no `model_type` strings remain + in `app/features/scenarios/`. An XGBoost bundle takes the genuine re-forecast path + with **zero scenarios changes**. +- **Feature-aware backtesting.** `app/features/backtesting/service.py:384-409` probes + `model_factory(...).requires_features` and, when true, builds per-fold leakage-safe + `X_train` (sliced) / `X_future` (rebuilt) via `build_historical_feature_rows` / + `build_future_feature_rows`. **Model-agnostic** — never checks a `model_type` string. + An XGBoost model backtests with **zero backtesting-service changes**. (This is the key + difference from PRP-30, which had to defer backtesting to B.2 — B.2 is now merged.) +- **The historical-frame leakage spec.** `app/features/forecasting/tests/test_regression_features_leakage.py` + and `app/shared/feature_frames/tests/test_leakage.py` pin the historical and future + builders. XGBoost consumes the **same** builders → these specs already cover its + training and future feature matrices. **No new leakage test is required** (DECISIONS + LOCKED #6). + +The **problem this PRP fixes**: XGBoost — named in `INITIAL-MLZOO-C` and +`docs/optional-features/05-advanced-ml-model-zoo.md` as the second tree-based model and +the robust-regularization benchmark against LightGBM — does not exist. There is no +`xgboost` dependency, no `XGBoostModelConfig`, no `xgboost` in the `ModelType` literal, +no `model_factory` branch, and `JobService._execute_train` / `_execute_backtest` reject +`model_type="xgboost"` as unsupported. + +## DEPENDS ON — read before starting + +- `PRPs/INITIAL/INITIAL-MLZOO-C-xgboost-prophet-extensions.md` — the shared C brief. +- `PRPs/INITIAL/INITIAL-MLZOO-index.md` — the MLZOO roadmap (A ✅ → B ✅ → B.2 ✅ → + **C1 (this) ∥ C2** → D). +- `PRPs/PRP-30-lightgbm-first-advanced-model.md` — **the byte-for-byte template for this + PRP.** Every DECISIONS LOCKED entry and Anti-Pattern there applies here with `lightgbm` + → `xgboost`. Read it in full first. +- `PRPs/PRP-MLZOO-B.2-feature-aware-backtesting.md` — explains why backtesting now works + for any `requires_features` model with no per-model wiring. +- `examples/models/feature_frame_contract.md` — the historical/future frame shapes a + feature-aware model consumes, and the canonical 14-column set. + +--- + +## Goal + +Implement `XGBoostForecaster` — a deterministic, feature-aware forecasting model wrapping +`xgboost.XGBRegressor` — and wire it end-to-end: `model_factory` instantiates it (behind a +new `forecast_enable_xgboost` flag), `ForecastingService.train_model` trains it through the +existing `requires_features` branch, `POST /scenarios/simulate` re-forecasts it through +`method="model_exogenous"`, the backtesting fold loop backtests it through the existing +`requires_features` probe, `JobService._execute_train` and `_execute_backtest` accept +`model_type="xgboost"`, and the XGBoost library version is captured in the model bundle and +the registry's `runtime_info`. XGBoost ships as an **optional dependency group** +(`ml-xgboost`); the model code lazy-imports it so a single-host install without the extra +still works for every other model. + +**End state:** a user with `forecast_enable_xgboost=True` and the `ml-xgboost` extra +installed can train an `xgboost` model (HTTP or job), re-forecast it in a what-if scenario, +and backtest it, exactly as they can a `lightgbm` model today. Every existing model behaves +**identically** before and after. + +## Why + +- **The model zoo needs a second tree benchmark.** `docs/optional-features/05-advanced-ml-model-zoo.md` + frames XGBoost as the "strong tabular benchmark … robust regularization … useful + comparison against LightGBM". A credible model-*comparison* platform needs more than one + advanced model; XGBoost is the industry-standard second. +- **The foundation is fully paid for.** PRP-29/30/B.2 made train, predict, scenarios, and + backtesting all branch on `requires_features`. Adding a second tree model is now a + *small, contained* change — one class (a near-clone of the proven `LightGBMForecaster`), + one config, one factory branch, two jobs branches, metadata, and tests. +- **De-risks the dependency one step at a time.** `INITIAL-MLZOO-index.md` mandates "Add + XGBoost as a second tree model" only after the first advanced model path is stable. It is. +- **Low blast radius.** No migration, no API-contract change, no existing-model change, no + new vertical slice. + +## What + +A backend-only feature PRP. User-visible behaviour gains exactly one thing: `model_type: +"xgboost"` becomes a real, trainable, scenario-re-forecastable, backtestable model when the +feature flag and the optional dependency are both present. Everything else is identical. + +### Technical requirements + +1. **Optional dependency group.** `pyproject.toml` gains `[project.optional-dependencies] + ml-xgboost = ["xgboost>=2.1.0"]`. CI already runs `uv sync --frozen --all-extras --dev` + (`.github/workflows/ci.yml:48,74,116,163`) so the extra is installed and tested in CI + with **no workflow change**. `uv.lock` is regenerated (`uv lock`) because CI uses + `--frozen`. +2. **Runtime flag.** `app/core/config.py` gains `forecast_enable_xgboost: bool = False` + (after `forecast_enable_lightgbm`, `config.py:101`) — mirrors the LightGBM gate exactly. +3. **`XGBoostModelConfig`** in `app/features/forecasting/schemas.py` — a `ModelConfigBase` + subclass, **conservative field set matching `LightGBMModelConfig`** (DECISIONS LOCKED + #4): `n_estimators` (10-1000, default 100), `max_depth` (1-20, default 6), + `learning_rate` (0.001-1.0, default 0.1), `feature_config_hash: str | None`. Added to + the `ModelConfig` union. +4. **`XGBoostForecaster`** in `app/features/forecasting/models.py` — a `BaseForecaster` + subclass with `requires_features: ClassVar[bool] = True`, structurally mirroring + `LightGBMForecaster` (`models.py:580-732`). It lazy-imports `xgboost` inside `fit()` so + importing `models.py` never requires the optional dependency. It is deterministic + (`n_jobs=1`, `tree_method="hist"`, fixed `random_state`) and NaN-tolerant (XGBoost + handles `NaN` natively via `missing=np.nan`). +5. **`model_factory`** — a new `xgboost` branch mirroring the `lightgbm` branch + (`models.py:778-792`), gated on `forecast_enable_xgboost`. The `ModelType` literal + (`models.py:736`) gains `"xgboost"`. +6. **Jobs integration.** `JobService._execute_train` (`jobs/service.py:454-478`) and + `_execute_backtest` (`jobs/service.py:641-658`) each gain an `xgboost` branch building + `XGBoostModelConfig` — mirroring the existing `lightgbm` branches. +7. **Route gate.** `POST /forecasting/train` (`forecasting/routes.py:67-72`) gains an + `xgboost` flag gate mirroring the `lightgbm` one. +8. **Reproducibility metadata.** `ModelBundle` gains an `xgboost_version: str | None` + field (best-effort captured on save, mismatch-warned on load — mirroring + `lightgbm_version`, `persistence.py:56,104-108,185-199`); + `RegistryService._capture_runtime_info` (`registry/service.py:124-129`) gains an + `xgboost` version block. +9. **Tests** mirroring the `LightGBMForecaster` suite, gated with + `pytest.importorskip("xgboost")`; an `examples/models/advanced_xgboost.py` example; + additive docs. + +### Success Criteria + +- [ ] `model_factory(XGBoostModelConfig(), random_state=42)` returns an `XGBoostForecaster` + when `forecast_enable_xgboost=True`; raises a clear `ValueError` when the flag is off. +- [ ] `XGBoostForecaster.requires_features is True`; `fit`/`predict` require a non-`None` + `X` and raise the same error-message substrings as `LightGBMForecaster` + (`"requires exogenous features"`, `"rows must match"`, `"horizon"`, `"fitted"`). +- [ ] Two fits with the same `random_state` produce **identical** forecasts + (`np.testing.assert_array_equal`) — single-threaded `hist` is reproducible within one + environment (see Gotchas). +- [ ] `ForecastingService.train_model` trains an `xgboost` model with **no edit to + `train_model`** (routes through the existing `requires_features` branch). +- [ ] `POST /scenarios/simulate` against a trained `xgboost` run returns + `method="model_exogenous"` (not `"heuristic"`) — **no edit to scenarios code**. +- [ ] A backtest of an `xgboost` model produces per-fold metrics — **no edit to + backtesting-service code** (the B.2 `requires_features` probe handles it). +- [ ] `JobService._execute_train` and `_execute_backtest` accept `model_type="xgboost"`. +- [ ] `ModelBundle.xgboost_version` and registry `runtime_info["xgboost_version"]` are + captured when `xgboost` is installed. +- [ ] Every baseline model, `regression`, `lightgbm`, and every existing test pass **with + no behaviour change**. +- [ ] `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` all green. +- [ ] No Alembic migration; no route/schema/WebSocket *contract* change; XGBoost stays an + *optional* dependency (the core `dependencies` list is unchanged). + +--- + +## All Needed Context + +### Documentation & References + +```yaml +- file: PRPs/PRP-30-lightgbm-first-advanced-model.md + why: THE template. This PRP is a near-clone with lightgbm -> xgboost. Every DECISIONS + LOCKED entry, every Anti-Pattern, every Validation Level there applies here. Read + it fully before touching code. + +- file: app/features/forecasting/models.py + why: LightGBMForecaster (lines 580-732) is the BYTE-FOR-BYTE structural template for + XGBoostForecaster — same __init__ shape, same fit/predict guards, same error + strings, same lazy-import-inside-fit pattern, same get_params/set_params. The + model_factory lightgbm branch (lines 778-792) is the template for the xgboost + branch. The ModelType literal is at line 736. + critical: The estimator is typed `Any` (`estimator: Any = lgb.LGBMRegressor(...)` at + models.py:661) — mirror that for XGBRegressor so pyright --strict stays quiet. + +- file: app/features/forecasting/schemas.py + why: LightGBMModelConfig (lines 107-144) is the template for XGBoostModelConfig — same + four fields, same Field(ge=, le=, default=) bounds. The ModelConfig union is at + lines 192-199. DECISIONS LOCKED #4: keep XGBoostModelConfig conservative — DO NOT + add subsample/colsample_bytree/reg_alpha/reg_lambda. + +- file: app/features/forecasting/service.py + why: train_model (lines 180-297) branches `if model.requires_features:` (line 219) — + MODEL-AGNOSTIC. predict (lines 299-437) rejects feature-aware models at + lines 383-393 — also capability-based. _build_regression_features and + _assemble_regression_rows are REUSED unchanged by XGBoost. + critical: DO NOT EDIT service.py. An XGBoost model trains and is predict-rejected purely + because requires_features=True. Verify by reading, then leave it alone. + +- file: app/features/forecasting/persistence.py + why: ModelBundle dataclass (lines 48-57) has python_version + sklearn_version + + lightgbm_version. save_model_bundle captures lightgbm_version best-effort at + lines 102-108; load_model_bundle mismatch-warns at lines 185-199. ADD + `xgboost_version` mirroring `lightgbm_version` EXACTLY. compute_hash (lines 59-72) + reads only config_hash/model_params/metadata — adding xgboost_version does NOT + change any bundle hash. + +- file: app/features/forecasting/routes.py + why: POST /forecasting/train has the lightgbm feature-flag gate at lines 67-72 + (`request.config.model_type == "lightgbm" and not settings.forecast_enable_lightgbm` + -> 400). ADD a parallel xgboost gate. ValueError -> 400 (lines 115-118). + +- file: app/features/jobs/service.py + why: _execute_train has the model_type if/elif chain at lines 454-478 (lightgbm branch + at the elif; final `else: raise ValueError("Unsupported model_type: ...")`). + _execute_backtest has an IDENTICAL chain at lines 641-658. The forecasting-schemas + import block is at lines 426-433. ADD an xgboost branch to BOTH chains and + XGBoostModelConfig to the import. + +- file: app/features/registry/service.py + why: _capture_runtime_info (lines 84-131) best-effort-imports sklearn/numpy/pandas/ + joblib/lightgbm into a runtime_info dict. The lightgbm block is at lines 124-129. + ADD an identical `try: import xgboost` block. runtime_info is JSONB — NO migration. + +- file: app/features/backtesting/service.py + why: lines 384-409 probe `model_factory(...).requires_features` and branch to + _run_feature_aware_fold for any feature-aware model. MODEL-AGNOSTIC — DO NOT EDIT. + Read to confirm an xgboost model backtests for free. + +- file: app/features/forecasting/tests/test_lightgbm_forecaster.py + why: THE test template for test_xgboost_forecaster.py. Clone every test 1:1 swapping + LightGBMForecaster -> XGBoostForecaster, LightGBMModelConfig -> XGBoostModelConfig, + forecast_enable_lightgbm -> forecast_enable_xgboost, and the importorskip target. + Copy the `_synthetic_data` helper verbatim. + +- file: app/features/forecasting/tests/test_regression_forecaster.py + why: The fuller 10-test template the LightGBM file itself was cloned from — same test + names. Either file works as the clone source. + +- file: app/features/forecasting/tests/test_service.py + why: TestFeatureAwareContract (lines 349-412) — test_requires_features_flag and + test_lightgbm_factory_respects_flag. Extend the first with XGBoost; add an + xgboost-flag mirror of the second. + +- file: app/features/jobs/tests/test_service.py + why: test_execute_train_rejects_unsupported_model_type (lines 243-249, already uses + "arima" — NO fix needed). test_execute_train_builds_lightgbm_config (lines 222-241) + and test_execute_backtest_builds_lightgbm_config (lines 286-304) are the templates + for the xgboost job tests. + +- url: https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBRegressor + why: XGBRegressor sklearn-API constructor — n_estimators, learning_rate, max_depth, + random_state, n_jobs, tree_method, verbosity. fit(X, y) / predict(X) are + sklearn-compatible. `missing` defaults to np.nan (native NaN handling). + +- url: https://xgboost.readthedocs.io/en/stable/faq.html + section: "Slightly different result between runs" + critical: XGBoost has NO `deterministic=True` switch (unlike LightGBM). Single-machine + bit-reproducibility comes from `n_jobs=1` + a fixed `random_state` + no stochastic + sampling (conservative config has no subsample/colsample, so this holds) + + `tree_method="hist"` (the default; pin it explicitly). Multi-threaded fits differ + by float-summation order. Reproducibility is promised only within the SAME + hardware+build — fine for CI and the determinism unit test. + +- url: https://xgboost.readthedocs.io/en/stable/parameter.html + section: Parameters for Tree Booster / Global Configuration + why: max_depth, eta(=learning_rate), tree_method, verbosity semantics and ranges. +``` + +### Current Codebase tree (relevant — all already exist) + +```bash +app/features/forecasting/ +├── models.py # BaseForecaster, RegressionForecaster, LightGBMForecaster, +│ # model_factory (lightgbm branch is the template), ModelType +├── schemas.py # LightGBMModelConfig (the template), ModelConfig union +├── service.py # train_model + predict branch on requires_features (untouched) +├── persistence.py # ModelBundle (python/sklearn/lightgbm_version) +├── routes.py # /forecasting/train has the lightgbm flag gate (lines 67-72) +└── tests/ + ├── test_lightgbm_forecaster.py # the test template to clone + ├── test_regression_forecaster.py # the fuller 10-test template + ├── test_service.py # TestFeatureAwareContract + ├── test_routes.py + ├── test_persistence.py + └── test_regression_features_leakage.py # load-bearing — already covers XGBoost's frame +app/core/config.py # forecast_enable_lightgbm at line 101 +app/features/scenarios/service.py # model_exogenous dispatch on requires_features (untouched) +app/features/backtesting/service.py # feature-aware fold loop, requires_features probe (untouched) +app/features/jobs/service.py # _execute_train + _execute_backtest model_type chains +app/features/registry/service.py # _capture_runtime_info (lightgbm block at 124-129) +app/shared/feature_frames/ # the shared contract — reused, untouched +examples/models/advanced_lightgbm.py # the example template +pyproject.toml # ml-lightgbm extra + lightgbm.* mypy override +.github/workflows/ci.yml # uv sync --frozen --all-extras --dev (no change) +``` + +### Desired Codebase tree — files to ADD + +```bash +app/features/forecasting/tests/ +└── test_xgboost_forecaster.py # cloned from test_lightgbm_forecaster.py, importorskip +examples/models/ +└── advanced_xgboost.py # minimal XGBoost train/predict example +``` + +### Files to MODIFY (all additive or behaviour-preserving) + +```bash +pyproject.toml # + [project.optional-dependencies] ml-xgboost + # + (only if mypy --strict complains) xgboost.* override +uv.lock # regenerated by `uv lock` +app/core/config.py # + forecast_enable_xgboost: bool = False +app/features/forecasting/schemas.py # + XGBoostModelConfig; + to ModelConfig union +app/features/forecasting/models.py # + XGBoostForecaster; + "xgboost" in ModelType; + # + model_factory xgboost branch +app/features/forecasting/persistence.py # + ModelBundle.xgboost_version (save + load) +app/features/forecasting/routes.py # + xgboost flag gate +app/features/jobs/service.py # _execute_train + _execute_backtest: + xgboost branch +app/features/registry/service.py # _capture_runtime_info: + xgboost block +app/features/forecasting/tests/test_service.py # extend TestFeatureAwareContract +app/features/forecasting/tests/test_routes.py # + xgboost 400-when-disabled route test +app/features/forecasting/tests/test_persistence.py # + xgboost_version captured assertion +app/features/jobs/tests/test_service.py # + xgboost train + backtest job tests +app/features/scenarios/tests/test_routes_integration.py # + xgboost model_exogenous test +app/features/backtesting/tests/test_feature_aware_backtest.py # + light xgboost backtest test +app/features/registry/tests/test_service.py # + runtime_info has xgboost_version +examples/models/model_interface.md # additive: xgboost row +examples/models/feature_frame_contract.md # additive: xgboost is a feature-aware model +README.md # additive: the ml-xgboost optional extra +``` + +### DECISIONS LOCKED (resolved during planning — do NOT re-litigate) + +1. **C1 (XGBoost) and C2 (Prophet-like) are separate PRPs, branches, and review units.** + `INITIAL-MLZOO-C` describes both; the MLZOO index now lists them as two rows. This PRP + touches **only** XGBoost. If you find yourself adding a Prophet-like model, stop — that + is `PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md`. (User-confirmed.) + +2. **XGBoost ships as an optional dependency group, not a core dependency.** A new + `[project.optional-dependencies] ml-xgboost = ["xgboost>=2.1.0"]`. Rationale: mirrors + the merged `ml-lightgbm` decision (PRP-30 DECISIONS LOCKED #1); the single-host vision + keeps the core install dependency-light; `INITIAL-MLZOO-index.md` mandates dependency + groups (`ml-xgboost` is named there). CI's `--all-extras` installs and tests it. (User- + confirmed.) + +3. **The `xgboost` import is LAZY — inside `fit()`, never at module scope.** `models.py` + is imported by every forecasting code path (baseline models included); a module-level + `import xgboost` would make every forecast path require the optional extra. Mirror + `LightGBMForecaster` exactly: `model_factory` and `XGBoostForecaster.__init__` only + store parameters; `import xgboost` happens the first time `fit()` runs. + `requires_features` is a `ClassVar` → readable with no import. + +4. **`XGBoostModelConfig` is CONSERVATIVE — `n_estimators` / `max_depth` / + `learning_rate` / `feature_config_hash` only.** It mirrors `LightGBMModelConfig` + (PRP-30 DECISIONS LOCKED #3). `subsample` / `colsample_bytree` / `reg_alpha` / + `reg_lambda` (named in `docs/optional-features/05-advanced-ml-model-zoo.md`) are a + deliberate future-PRP extension — adding them now widens the schema surface for no MVP + value, AND `subsample`/`colsample_bytree` < 1.0 introduce stochastic row/column + sampling that complicates the determinism guarantee. The forecaster uses XGBoost + defaults for every parameter not in the config (so `subsample`/`colsample_bytree` stay + at 1.0 → no stochastic sampling). (User-confirmed: "Conservative (match LightGBM)".) + +5. **Backtesting needs NO backtesting-service change.** Unlike PRP-30 (which deferred + backtesting to B.2), B.2 is merged: `backtesting/service.py` probes + `requires_features` and is fully model-agnostic. An XGBoost model backtests for free. + This PRP only adds the `xgboost` branch to `JobService._execute_backtest` (the job + layer still maps a `model_type` string → a config object). + +6. **No new leakage test.** XGBoost reuses `_build_regression_features` / + `_assemble_regression_rows` (historical frame) and the shared `app/shared/feature_frames` + builders (future + per-fold frames) byte-for-byte. Those are pinned by the load-bearing + `test_regression_features_leakage.py` and `app/shared/feature_frames/tests/test_leakage.py`. + XGBoost is leakage-covered by construction; a duplicate XGBoost-flavoured leakage test + would test the same code twice. State the reuse explicitly in the PR description. + +7. **Determinism: `n_jobs=1` + `tree_method="hist"` + fixed `random_state` + conservative + config (no subsample/colsample).** XGBoost has no `deterministic`/`force_col_wise` + switch (LightGBM does). Single-threaded `hist` with a fixed seed and no stochastic + sampling is bit-reproducible within one hardware+build environment — which is exactly + the determinism unit test's scope. Keep `np.testing.assert_array_equal` (the repo + idiom). See Gotchas for the residual-risk note. + +8. **`POST /forecasting/predict` is NOT changed.** An XGBoost model is feature-aware + (`requires_features=True`) and is rejected by the existing capability-based predict + branch — identical to `regression`/`lightgbm`. It forecasts through + `POST /scenarios/simulate`. + +### Known Gotchas of our codebase & Library Quirks + +```python +# CRITICAL: lazy import. `import xgboost` goes INSIDE XGBoostForecaster.fit(), not at the +# top of models.py and not in __init__. models.py is imported for naive/seasonal/mavg/ +# regression/lightgbm too; a module-level xgboost import would make every forecast path +# require the optional extra. Mirror LightGBMForecaster.fit (models.py:657-659). + +# CRITICAL: determinism. XGBoost has NO `deterministic=True` flag. Pin n_jobs=1 + +# tree_method="hist" + a fixed random_state, and rely on the conservative config leaving +# subsample/colsample_bytree at their 1.0 defaults (no stochastic sampling). Single- +# threaded `hist` is bit-reproducible within one environment — which is the +# test_determinism_same_random_state scope. Keep np.testing.assert_array_equal. +# IF (and only if) that test proves genuinely flaky in CI across runs on the SAME +# environment, that is a real signal — investigate the XGBoost build, do NOT paper over +# it by switching to assert_allclose. (Cross-environment bit-equality is never promised +# and is not what the test checks.) + +# GOTCHA: mypy --strict + warn_unused_ignores=true. xgboost ships a py.typed marker, so +# `import xgboost` resolves WITHOUT an override in most cases. Start WITHOUT a +# [[tool.mypy.overrides]] xgboost.* block. ONLY if `uv run mypy app/` flags xgboost.* +# internals, add `module = ["xgboost.*"] ignore_missing_imports = true` (mirroring the +# lightgbm.* override at pyproject.toml:150-152). Do NOT add both an override AND an +# inline `# type: ignore` — warn_unused_ignores would flag the redundant one. Type the +# estimator `Any` (mirror `estimator: Any = lgb.LGBMRegressor(...)` at models.py:661). + +# GOTCHA: pyright --strict excludes tests/ but scans app/. With ml-xgboost installed +# (CI: --all-extras; locally: Validation Level 0) pyright resolves `import xgboost`. +# reportUnknownMemberType is already "warning" (pyproject:177) so dynamic XGBRegressor +# attribute access does not fail the gate. + +# GOTCHA: uv.lock + --frozen. CI installs with `uv sync --frozen` — `--frozen` REFUSES to +# update the lockfile. After editing pyproject.toml you MUST run `uv lock` and commit the +# refreshed uv.lock, or every CI job fails at the install step. + +# GOTCHA: tests must not hard-require the optional dep. test_xgboost_forecaster.py starts +# with `pytest.importorskip("xgboost")` so a dev who ran `uv sync --extra dev` (no +# ml-xgboost) sees the suite SKIP, not ERROR. CI installs --all-extras so it RUNS there. + +# GOTCHA: loading an XGBoost bundle requires the ml-xgboost extra. joblib.load unpickles +# the embedded XGBRegressor, which needs `xgboost` importable. Inherent to an optional +# ML dependency — document it; do not engineer around it. + +# GOTCHA: silence training output with `verbosity=0` in the XGBRegressor constructor +# (default is 1 = warnings). `verbose` is a fit() arg for eval-set printing, not a +# constructor param — not needed here (no eval_set). + +# GOTCHA: line endings — repo has mixed CRLF/LF, no .gitattributes. Run `git diff --stat` +# before committing; if a modified file shows a whole-file diff, re-normalise to its +# original ending so the review shows only the real change. + +# SIBLING-PRP integration: PRP-MLZOO-C2 also edits the ModelType Literal (models.py:736) +# and the ModelConfig union (schemas.py:192-199). Both edits are purely additive (one +# new literal entry, one new union member). If C2 merged first you will see its +# "prophet_like" entry already present — just add "xgboost" alongside. A trivial +# one-line rebase, never a semantic conflict. +``` + +--- + +## Implementation Blueprint + +### Data models and structure + +No ORM model, no migration. One new Pydantic schema and one new forecaster class: + +```python +# app/features/forecasting/schemas.py — mirrors LightGBMModelConfig (schemas.py:107-144) + +class XGBoostModelConfig(ModelConfigBase): + """Configuration for the XGBoost regressor (feature-flagged). + + XGBoost is an advanced, feature-aware gradient-boosted-tree model. Like + ``LightGBMModelConfig`` the field set is deliberately conservative — + ``n_estimators`` / ``max_depth`` / ``learning_rate`` only — so the schema + surface stays small and training stays deterministic (no stochastic + subsampling). Only available when ``forecast_enable_xgboost=True``. + """ + + model_type: Literal["xgboost"] = "xgboost" + n_estimators: int = Field(default=100, ge=10, le=1000, description="Number of boosting rounds") + max_depth: int = Field(default=6, ge=1, le=20, description="Maximum depth of trees") + learning_rate: float = Field( + default=0.1, ge=0.001, le=1.0, description="Learning rate for gradient boosting" + ) + feature_config_hash: str | None = Field( + default=None, description="Hash of FeatureSetConfig used for training" + ) + + +# app/features/forecasting/models.py — mirrors LightGBMForecaster (models.py:580-732) + +class XGBoostForecaster(BaseForecaster): + """Feature-aware forecaster wrapping ``xgboost.XGBRegressor``. + + The second ADVANCED feature-aware tree model (MLZOO-C1). Structurally a + twin of ``LightGBMForecaster``: REQUIRES a non-``None`` exogenous ``X`` for + both ``fit`` and ``predict``; ``xgboost`` is imported LAZILY inside ``fit``. + + Determinism: ``XGBRegressor`` has no ``deterministic`` switch — bit- + reproducibility comes from ``n_jobs=1`` + ``tree_method="hist"`` + a fixed + ``random_state`` + the conservative config leaving ``subsample`` / + ``colsample_bytree`` at 1.0 (no stochastic sampling). XGBoost tolerates + ``NaN`` natively (``missing=np.nan``). + """ + + requires_features: ClassVar[bool] = True + + def __init__( + self, *, n_estimators: int = 100, learning_rate: float = 0.1, + max_depth: int = 6, random_state: int = 42, + ) -> None: + super().__init__(random_state) + self.n_estimators = n_estimators + self.learning_rate = learning_rate + self.max_depth = max_depth + self._estimator: Any = None +``` + +### list of tasks (dependency-ordered) + +```yaml +# ════════ STEP 1 — Optional dependency + runtime flag ════════ + +Task 1 — MODIFY pyproject.toml + regenerate uv.lock: + - ADD under [project.optional-dependencies], after the `ml-lightgbm` line (pyproject:47): + # Opt-in advanced forecasting model (MLZOO-C1). Same optional-extra + # pattern as ml-lightgbm; CI installs it via --all-extras. + ml-xgboost = ["xgboost>=2.1.0"] + - DO NOT add a [[tool.mypy.overrides]] xgboost.* block yet — xgboost ships py.typed. + Add it ONLY if Validation Level 2 (mypy --strict) complains about xgboost.*. + - RUN `uv lock` to refresh uv.lock (CI uses `uv sync --frozen`). + - RUN `uv sync --extra dev --extra ml-lightgbm --extra ml-xgboost` locally. + - VALIDATE: uv run python -c "import xgboost; print(xgboost.__version__)" + +Task 2 — MODIFY app/core/config.py — add the runtime flag: + - ADD after `forecast_enable_lightgbm: bool = False` (config.py:101): + forecast_enable_xgboost: bool = False + - Mirror the surrounding comment style of the Forecasting settings block. + - VALIDATE: uv run python -c "from app.core.config import get_settings; \ + print(get_settings().forecast_enable_xgboost)" + +# ════════ STEP 2 — Schema ════════ + +Task 3 — MODIFY app/features/forecasting/schemas.py — ADD XGBoostModelConfig: + - PLACE the new class immediately AFTER LightGBMModelConfig (after schemas.py:144), + BEFORE RegressionModelConfig. + - MIRROR LightGBMModelConfig field-for-field (see Data models above). + - ADD `XGBoostModelConfig` to the ModelConfig union (schemas.py:192-199), e.g. between + LightGBMModelConfig and RegressionModelConfig. + - VALIDATE: uv run mypy app/features/forecasting/schemas.py + +# ════════ STEP 3 — The forecaster + factory ════════ + +Task 4 — MODIFY app/features/forecasting/models.py — ADD XGBoostForecaster: + - PLACE the new class immediately AFTER LightGBMForecaster (after models.py:732), + BEFORE the `ModelType` alias (models.py:736). + - MIRROR LightGBMForecaster byte-for-byte: __init__ shape, fit guards (X is None -> + ValueError "XGBoostForecaster requires exogenous features X for fit()"; empty y -> + "Cannot fit on empty array"; row mismatch -> f"X has {X.shape[0]} rows but y has + {len(y)} — feature/target rows must match"), predict guards (not fitted -> + RuntimeError "Model must be fitted before predict"; X is None -> ValueError + "XGBoostForecaster requires exogenous features X for predict()"; shape mismatch -> + f"X has {X.shape[0]} rows but horizon is {horizon} — they must match"), + get_params, set_params. + - INSIDE fit(): `import xgboost as xgb` (LAZY), then + `estimator: Any = xgb.XGBRegressor(n_estimators=self.n_estimators, + learning_rate=self.learning_rate, max_depth=self.max_depth, + random_state=self.random_state, n_jobs=1, tree_method="hist", verbosity=0)`; + `estimator.fit(X, y)`. + - set requires_features: ClassVar[bool] = True. + - get_params returns {n_estimators, learning_rate, max_depth, random_state}. + - PRESERVE the error-message substrings EXACTLY — the cloned tests `match=` on them. + - VALIDATE: uv run mypy app/features/forecasting/models.py && uv run pyright app/features/forecasting/ + +Task 5 — MODIFY app/features/forecasting/models.py — ModelType literal + model_factory: + - ADD "xgboost" to the ModelType Literal (models.py:736): + ModelType = Literal["naive", "seasonal_naive", "moving_average", "xgboost", + "lightgbm", "regression"] + - ADD an `elif model_type == "xgboost":` branch to model_factory, mirroring the + lightgbm branch (models.py:778-792) — gate FIRST on forecast_enable_xgboost: + elif model_type == "xgboost": + if not settings.forecast_enable_xgboost: + raise ValueError( + "XGBoost is not enabled. Set forecast_enable_xgboost=True in settings." + ) + from app.features.forecasting.schemas import XGBoostModelConfig + if isinstance(config, XGBoostModelConfig): + return XGBoostForecaster( + n_estimators=config.n_estimators, + learning_rate=config.learning_rate, + max_depth=config.max_depth, + random_state=random_state, + ) + raise ValueError("Invalid config type for xgboost") + - VALIDATE: uv run mypy app/ && uv run pyright app/ + +# ════════ STEP 4 — Route gate ════════ + +Task 6 — MODIFY app/features/forecasting/routes.py — add the xgboost flag gate: + - ADD, immediately after the lightgbm gate (routes.py:67-72), a parallel gate: + if request.config.model_type == "xgboost" and not settings.forecast_enable_xgboost: + raise HTTPException( + status_code=status.HTTP_400_BAD_REQUEST, + detail="XGBoost is disabled. Set forecast_enable_xgboost=True in settings.", + ) + - VALIDATE: uv run mypy app/features/forecasting/ && uv run pyright app/features/forecasting/ + +# ════════ STEP 5 — Jobs integration ════════ + +Task 7 — MODIFY app/features/jobs/service.py — _execute_train + _execute_backtest: + - ADD `XGBoostModelConfig` to the forecasting-schemas import block (jobs/service.py:426-433). + - ADD an xgboost branch to the _execute_train if/elif chain (jobs/service.py:454-478), + BEFORE the final `else`, mirroring the lightgbm branch: + elif model_type == "xgboost": + # forecast_enable_xgboost gate lives in model_factory — a disabled + # flag surfaces as a loud failed job. + config = XGBoostModelConfig( + n_estimators=params.get("n_estimators", 100), + learning_rate=params.get("learning_rate", 0.1), + max_depth=params.get("max_depth", 6), + ) + - ADD an xgboost branch to the _execute_backtest if/elif chain (jobs/service.py:641-658), + mirroring its lightgbm branch: + elif model_type == "xgboost": + # Feature-aware — the backtest builds per-fold leakage-safe X. + model_config = XGBoostModelConfig() + - VALIDATE: uv run mypy app/features/jobs/ && uv run pyright app/features/jobs/ + +# ════════ STEP 6 — Reproducibility metadata ════════ + +Task 8 — MODIFY app/features/forecasting/persistence.py — ModelBundle.xgboost_version: + - ADD field to ModelBundle (after `lightgbm_version: str | None = None`, persistence.py:56): + xgboost_version: str | None = None + - UPDATE the ModelBundle docstring Attributes block to mention xgboost_version (mirror + the lightgbm_version wording at persistence.py:43-44). + - In save_model_bundle, AFTER the lightgbm best-effort capture (persistence.py:102-108), + ADD an identical block: + try: + import xgboost + bundle.xgboost_version = str(xgboost.__version__) + except ImportError: + bundle.xgboost_version = None + - In load_model_bundle, AFTER the lightgbm mismatch-warning block (persistence.py:185-199), + ADD an identical block logging `forecasting.xgboost_version_mismatch` (saved vs + current) only when both are non-None and differ; guard the current-version lookup + in try/except ImportError. + - compute_hash (persistence.py:59-72) is unchanged — confirm no bundle hash shifts. + - VALIDATE: uv run mypy app/features/forecasting/ && uv run pyright app/features/forecasting/ + +Task 9 — MODIFY app/features/registry/service.py — _capture_runtime_info: + - ADD, after the lightgbm block (registry/service.py:124-129): + # XGBoost is an optional dependency — only recorded when installed. + try: + import xgboost + runtime_info["xgboost_version"] = xgboost.__version__ + except ImportError: + pass + - VALIDATE: uv run mypy app/features/registry/ && uv run pyright app/features/registry/ + +# ════════ STEP 7 — Tests ════════ + +Task 10 — CREATE app/features/forecasting/tests/test_xgboost_forecaster.py: + - CLONE test_lightgbm_forecaster.py 1:1. Module-scope `pytest.importorskip("xgboost")`. + - Swap LightGBMForecaster -> XGBoostForecaster, LightGBMModelConfig -> XGBoostModelConfig, + forecast_enable_lightgbm -> forecast_enable_xgboost throughout. + - COPY the `_synthetic_data` helper verbatim. + - Keep test_determinism_same_random_state with np.testing.assert_array_equal. + - VALIDATE: uv run pytest -v app/features/forecasting/tests/test_xgboost_forecaster.py + +Task 11 — MODIFY app/features/forecasting/tests/test_service.py: + - In TestFeatureAwareContract.test_requires_features_flag, ADD: + from app.features.forecasting.models import XGBoostForecaster + assert XGBoostForecaster.requires_features is True + - ADD test_xgboost_factory_respects_flag mirroring test_lightgbm_factory_respects_flag + (flag off -> ValueError "not enabled"; flag on -> isinstance XGBoostForecaster). + - VALIDATE: uv run pytest -v -m "not integration" app/features/forecasting/tests/test_service.py + +Task 12 — MODIFY app/features/forecasting/tests/test_routes.py: + - ADD test_train_xgboost_rejected_when_disabled: POST /forecasting/train with + config={"model_type":"xgboost"} and forecast_enable_xgboost at its default (False) + -> 400, problem+json detail mentioning XGBoost disabled. Mirror the lightgbm route + test if one exists; otherwise follow the file's ASGITransport client fixture idiom. + - VALIDATE: uv run pytest -v app/features/forecasting/tests/test_routes.py + +Task 13 — MODIFY app/features/jobs/tests/test_service.py: + - ADD test_execute_train_builds_xgboost_config mirroring + test_execute_train_builds_lightgbm_config (lines 222-241). + - ADD test_execute_backtest_builds_xgboost_config mirroring + test_execute_backtest_builds_lightgbm_config (lines 286-304). + - The rejects-unsupported test (lines 243-249) already uses "arima" — DO NOT touch it. + - VALIDATE: uv run pytest -v app/features/jobs/tests/test_service.py + +Task 14 — MODIFY app/features/forecasting/tests/test_persistence.py: + - ADD test_xgboost_version_recorded: after `pytest.importorskip("xgboost")`, save a + ModelBundle and assert `bundle.xgboost_version` is a non-empty str. + - VALIDATE: uv run pytest -v -m "not integration" app/features/forecasting/tests/test_persistence.py + +Task 15 — MODIFY app/features/scenarios/tests/test_routes_integration.py: + - ADD an integration test that trains an `xgboost` model then POSTs /scenarios/simulate + with its run_id and asserts the response `method == "model_exogenous"`. Mirror the + existing lightgbm/regression model_exogenous test; gate with + `pytest.importorskip("xgboost")` and enable forecast_enable_xgboost. + - VALIDATE: uv run pytest -v -m integration app/features/scenarios/tests/test_routes_integration.py + +Task 16 — MODIFY app/features/backtesting/tests/test_feature_aware_backtest.py: + - ADD a light test that runs the feature-aware backtest with an XGBoostModelConfig and + asserts per-fold metrics + `feature_aware=True` — mirroring + test_feature_aware_backtest_produces_per_fold_metrics. Gate with + `pytest.importorskip("xgboost")` and enable forecast_enable_xgboost. This satisfies + INITIAL-MLZOO-B's "backtesting integration test comparing baseline and advanced + model path" for the XGBoost model. + - VALIDATE: uv run pytest -v app/features/backtesting/tests/test_feature_aware_backtest.py + +Task 17 — MODIFY app/features/registry/tests/test_service.py: + - ADD/extend a runtime_info test: with `pytest.importorskip("xgboost")` a created run's + runtime_info contains the `xgboost_version` key. Mirror the lightgbm assertion. + - VALIDATE: uv run pytest -v app/features/registry/tests/test_service.py + +# ════════ STEP 8 — Docs & example ════════ + +Task 18 — CREATE examples/models/advanced_xgboost.py: + - CLONE examples/models/advanced_lightgbm.py, swapping LightGBMForecaster -> + XGBoostForecaster and the docstring/install line (`--extra ml-xgboost`). + - VALIDATE: uv run python examples/models/advanced_xgboost.py (requires ml-xgboost) + +Task 19 — MODIFY examples/models/model_interface.md + feature_frame_contract.md: + - model_interface.md: ADDITIVE — add an XGBoostModelConfig entry under "## Model + Configurations" and an "### XGBoost Forecaster" entry under "## Model Formulas"; + note requires_features=True and the ml-xgboost optional extra. + - feature_frame_contract.md: ADDITIVE — record XGBoost as an IMPLEMENTED feature-aware + model in the relevant sentence/list. Do NOT rewrite the file. + - VALIDATE: uv run ruff check . && uv run ruff format --check . + +Task 20 — MODIFY README.md: + - ADDITIVE: extend the install-section opt-in note and the Supported Model Types list + (README.md:344 area) — `xgboost` is an opt-in model installed via + `uv sync --extra dev --extra ml-xgboost` and enabled with + `forecast_enable_xgboost=true`. Mirror the existing ml-lightgbm wording. + - VALIDATE: uv run ruff format --check . (README is markdown — visual check only) +``` + +### Per-task pseudocode (critical details only) + +```python +# ── Task 4 — XGBoostForecaster.fit (lazy import + determinism is the crux) ── +def fit(self, y, X=None): + if X is None: + raise ValueError("XGBoostForecaster requires exogenous features X for fit()") + if len(y) == 0: + raise ValueError("Cannot fit on empty array") + if X.shape[0] != len(y): + raise ValueError( + f"X has {X.shape[0]} rows but y has {len(y)} — feature/target rows must match" + ) + import xgboost as xgb # LAZY — optional dependency; never module-scope + estimator: Any = xgb.XGBRegressor( + n_estimators=self.n_estimators, + learning_rate=self.learning_rate, + max_depth=self.max_depth, + random_state=self.random_state, + n_jobs=1, # single-threaded — removes float-summation + # non-determinism (XGBoost has no `deterministic`) + tree_method="hist", # explicit; the default, and the reproducible path + verbosity=0, # silence XGBoost training chatter + ) + estimator.fit(X, y) # NaN in X is fine — missing=np.nan is the default + self._estimator = estimator + self._last_values = np.asarray(y[-1:], dtype=np.float64) + self._is_fitted = True + return self + +# predict() is byte-identical to LightGBMForecaster.predict (models.py:677-706), +# only the error-string prefix changes: "XGBoostForecaster requires exogenous features ...". +``` + +### Integration Points + +```yaml +DEPENDENCY: + - pyproject.toml: + [project.optional-dependencies] ml-xgboost = ["xgboost>=2.1.0"]. + - uv.lock: regenerated by `uv lock` (CI installs with --frozen). + - CI: NO workflow change — ci.yml already runs `uv sync --frozen --all-extras --dev`. + +CONFIG: + - app/core/config.py: + forecast_enable_xgboost: bool = False (the runtime gate). + - forecast_random_seed (config.py:97) is the determinism source threaded through + model_factory — UNCHANGED. + +TRAIN / PREDICT / SCENARIOS / BACKTESTING: + - ForecastingService.train_model, ForecastingService.predict, + scenarios/service.py, backtesting/service.py — ALL UNCHANGED. Each branches on + `requires_features`; an XGBoost model (requires_features=True) routes through every + path automatically. + +JOBS: + - jobs/service.py: + xgboost branch in _execute_train AND _execute_backtest (the job + layer maps a model_type string -> a config object — the one place a string compare + still lives by design). + +PERSISTENCE / REGISTRY: + - ModelBundle: + xgboost_version field (best-effort on save, mismatch-warn on load). + compute_hash unchanged -> no bundle hash shifts. + - runtime_info JSONB: + "xgboost_version" key when xgboost is importable. NO migration. + +NO MIGRATION: this PRP touches no SQLAlchemy model and no Alembic version. +NO API CONTRACT CHANGE: no route path, response schema, or WebSocket frame changes + (a new request-body `model_type` value is an additive, pre-1.0-permitted change). +``` + +--- + +## Validation Loop + +### Level 0: Environment + +```bash +uv lock # refresh lock after pyproject edit +uv sync --extra dev --extra ml-lightgbm --extra ml-xgboost +uv run python -c "import xgboost; print('xgboost', xgboost.__version__)" +# Expected: prints a 2.x/3.x version. Without this, mypy/pyright on the lazy import and +# the XGBoost tests cannot run locally (CI installs --all-extras automatically). +``` + +### Level 1: Syntax & Style + +```bash +uv run ruff check . --fix && uv run ruff format --check . +# Expected: no errors. Fix everything before Level 2. +``` + +### Level 2: Type Checks + +```bash +uv run mypy app/ # --strict; gates merge +uv run pyright app/ # --strict; gates merge +# If mypy flags xgboost.* internals, add the [[tool.mypy.overrides]] xgboost.* block +# (see Task 1). Do NOT add both an override and an inline `# type: ignore`. +``` + +### Level 3: Unit Tests + +```bash +uv run pytest -v app/features/forecasting/tests/test_xgboost_forecaster.py +uv run pytest -v -m "not integration" app/features/forecasting/tests/test_service.py +uv run pytest -v app/features/jobs/tests/test_service.py +uv run pytest -v app/features/backtesting/tests/test_feature_aware_backtest.py + +# Regression — these must stay green with NO behaviour change +uv run pytest -v -m "not integration" app/features/forecasting/tests/ +uv run pytest -v -m "not integration" app/features/backtesting/tests/ +uv run pytest -v -m "not integration" # whole fast suite +# Expected: all green. Every baseline / regression / lightgbm test passes UNEDITED. +# If xgboost is somehow absent, test_xgboost_forecaster.py SKIPS — never ERRORs. +``` + +### Level 4: Integration Tests + +```bash +docker compose up -d && uv run alembic upgrade head +uv run pytest -v -m integration app/features/forecasting/ app/features/scenarios/ \ + app/features/jobs/ app/features/registry/ +# CRITICAL: the scenarios xgboost model_exogenous test (Task 15) must report +# method="model_exogenous". No migration in this PRP. +``` + +### Level 5: Manual Validation (dogfood — REQUIRED) + +```bash +# 1. Determinism +uv run python -c " +import numpy as np +from app.features.forecasting.models import XGBoostForecaster +rng = np.random.default_rng(0) +X = rng.normal(size=(80, 14)); y = (3.0 * X[:, 0] + rng.normal(size=80)).astype(np.float64) +a = XGBoostForecaster(random_state=7).fit(y, X).predict(12, X[:12]) +b = XGBoostForecaster(random_state=7).fit(y, X).predict(12, X[:12]) +np.testing.assert_array_equal(a, b); print('xgboost deterministic OK', a[:3])" + +# 2. requires_features +uv run python -c " +from app.features.forecasting.models import XGBoostForecaster +assert XGBoostForecaster.requires_features is True; print('requires_features OK')" + +# 3. End-to-end: set FORECAST_ENABLE_XGBOOST=true in .env, restart uvicorn, then +# POST /forecasting/train with config {"model_type":"xgboost"} -> 200; take the run_id +# and POST /scenarios/simulate -> ScenarioComparison "method" == "model_exogenous"; +# submit an xgboost backtest job -> completes with per-fold metrics. + +# 4. The optional dep stays optional — in a venv WITHOUT ml-xgboost, training a naive +# model still succeeds and `import app.features.forecasting.models` does not raise. +``` + +--- + +## Final Validation Checklist + +- [ ] `uv run ruff check .` and `uv run ruff format --check .` clean. +- [ ] `uv run mypy app/` and `uv run pyright app/` clean (both --strict). +- [ ] `uv run pytest -v -m "not integration"` fully green; `test_xgboost_forecaster.py` + runs (xgboost installed) and passes — never ERRORs. +- [ ] `uv run pytest -v -m integration app/features/{forecasting,scenarios,jobs,registry}/` + green, including the scenarios `xgboost` `model_exogenous` test. +- [ ] `model_factory(XGBoostModelConfig())` returns an `XGBoostForecaster` with the flag + on, raises a clear `ValueError` with the flag off. +- [ ] An `xgboost` backtest produces per-fold metrics with **no edit to + `backtesting/service.py`**. +- [ ] Every baseline / `regression` / `lightgbm` test passes with **no edit**. +- [ ] `uv.lock` is regenerated and committed; the core `[project] dependencies` list is + UNCHANGED (XGBoost is only in `[project.optional-dependencies]`). +- [ ] No Alembic migration; no route-path/response-schema/WebSocket change. +- [ ] `git diff --stat` shows only intended files — no whole-file CRLF/LF noise diffs. +- [ ] An OPEN GitHub issue exists (`gh issue view --json state` → `OPEN`); commit + `feat(forecast): add XGBoost feature-aware forecasting model (#)`; branch + `feat/forecasting-xgboost-model` off `dev`. +- [ ] The PR description states C1 is one of two MLZOO-C review units and links the + sibling `PRP-MLZOO-C2`. + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't implement the Prophet-like model — that is `PRP-MLZOO-C2`, a separate branch. +- ❌ Don't combine C1 and C2 into one branch or one PR (DECISIONS LOCKED #1). +- ❌ Don't add hyperparameter search, portfolio/global models, or an explainability change. +- ❌ Don't add `xgboost` to the core `[project] dependencies` — it is an OPTIONAL extra. + Don't `import xgboost` at module scope — lazy-import inside `fit()`. +- ❌ Don't add `subsample` / `colsample_bytree` / `reg_alpha` / `reg_lambda` to + `XGBoostModelConfig` — DECISIONS LOCKED #4 keeps it conservative (and stochastic + subsampling would complicate determinism). +- ❌ Don't edit `ForecastingService.train_model` / `predict`, `scenarios/service.py`, or + `backtesting/service.py` — they already branch on `requires_features`. +- ❌ Don't write a new leakage test — XGBoost reuses the already-pinned shared builders. +- ❌ Don't "fix" a determinism-test flake with `assert_allclose` — pin `n_jobs=1` + + `tree_method="hist"` + a fixed `random_state` and keep `assert_array_equal`. A genuine + flake on the same environment is a real signal to investigate, not to silence. +- ❌ Don't forget `uv lock` — CI's `uv sync --frozen` fails on a stale lockfile. +- ❌ Don't make `test_xgboost_forecaster.py` hard-require the extra — `pytest.importorskip`. + +## Open Questions — RESOLVED + +`INITIAL-MLZOO-C`'s open points are resolved for the XGBoost half: +- **Scope** → XGBoost only; Prophet-like is the sibling PRP-MLZOO-C2 (DECISIONS LOCKED #1). +- **Dependency strategy** → optional `ml-xgboost` extra (#2), mirroring `ml-lightgbm`. +- **Config fields** → conservative, matching `LightGBMModelConfig` (#4). +- **Determinism** → `n_jobs=1` + `tree_method="hist"` + fixed seed + no stochastic + sampling (#7); residual cross-environment non-determinism is documented, not tested. +- **Holiday/regressor features** → already carried as columns in the canonical 14-column + frame (`is_holiday`, `price_factor`, `promo_active`); no XGBoost-specific handling. + +Nothing is left to litigate at implementation time. + +## Confidence Score + +**9 / 10** for one-pass implementation success. + +Rationale: this is the lowest-risk PRP in the MLZOO sequence. The merged `LightGBMForecaster` +(PRP-30) is a *proven, tested* template — `XGBoostForecaster` is a near-mechanical clone +with two library swaps (`lgb.LGBMRegressor` → `xgb.XGBRegressor`, and +`deterministic/force_col_wise` → `tree_method="hist"`). Every consuming path — +train, predict, scenarios, **backtesting** — already branches on `requires_features`, so the +only genuinely new wiring is two `model_factory`/jobs branches and the metadata field. The +−1 risk is XGBoost determinism: unlike LightGBM there is no `deterministic` flag, so +`assert_array_equal` rests on single-threaded `hist` + fixed seed + no stochastic sampling +being reproducible within one environment — which the research confirms it is, and the +conservative config guarantees no subsampling. The risk is caught immediately by the Level 3 +determinism test, and the "every existing test passes unedited" gate makes any accidental +regression impossible to miss. diff --git a/PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md b/PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md new file mode 100644 index 00000000..9114b8c9 --- /dev/null +++ b/PRPs/PRP-MLZOO-C2-prophet-like-additive-model.md @@ -0,0 +1,997 @@ +name: "PRP-MLZOO-C2 — Prophet-like Additive Forecasting Model" +description: | + +## Purpose + +The second half of MLZOO-C (`PRPs/INITIAL/INITIAL-MLZOO-C-xgboost-prophet-extensions.md`). +It adds a **Prophet-like additive forecasting model** — `ProphetLikeForecaster` — a +*deterministic, regularized, additive linear* model that decomposes demand into **trend**, +**seasonality**, and **holiday/regressor** components. + +This is **not** a clone of the LightGBM/XGBoost tree models. It is a distinct model-family +design task. The two tree models are gradient-boosted, non-additive, and opaque; the +Prophet-like model is a transparent additive linear model whose fitted coefficients *are* +the component decomposition. Concretely it is a scikit-learn `Pipeline` of a +`SimpleImputer` + a `Ridge` regressor over the canonical 14-column feature frame, plus a +`decompose()` method that splits any forecast into its additive trend / seasonality / +regressor contributions. + +> **Sibling PRP:** `PRPs/PRP-MLZOO-C1-xgboost-model.md` ships the XGBoost model. C1 and C2 +> are intentionally **separate branches and separate review units** — never combine them. +> They are additive and order-independent; whichever merges second rebases cleanly (see +> "Sibling-PRP integration" below). + +> **Naming honesty.** The model is "Prophet-**like**", never "Prophet". It deliberately +> approximates Prophet's *additive decomposition* shape using a linear model over +> already-engineered features. It does **not** add the real `prophet`/Stan dependency and +> does **not** replicate Prophet's changepoint trend, posterior uncertainty intervals, or +> automatic seasonality discovery. `docs/optional-features/05-advanced-ml-model-zoo.md` +> explicitly endorses "Prophet-like" as the intentional term for exactly this. Every +> docstring and doc section MUST set this expectation plainly (see Risks). + +## What this PRP already inherits (DO NOT re-build) + +PRP-29 (MLZOO-A), PRP-30 (MLZOO-B), PRP-27 (the `regression` model), and PRP-MLZOO-B.2 +(feature-aware backtesting) already shipped the structural foundation. Re-use it: + +- **The feature-aware model contract.** `BaseForecaster.requires_features: ClassVar[bool]` + (`app/features/forecasting/models.py:64`). `RegressionForecaster` (`models.py:438`) is the + **closest structural template** — like the Prophet-like model it wraps a pure-scikit-learn + estimator, needs **no optional dependency**, and needs **no feature flag**. (The LightGBM/ + XGBoost forecasters are *less* relevant here — they carry optional-dependency machinery + this model does not need.) +- **The shared feature-frame contract.** `app/shared/feature_frames/` owns + `canonical_feature_columns()` — the fixed, ordered 14-column set: + `["lag_1","lag_7","lag_14","lag_28","dow_sin","dow_cos","month_sin","month_cos", + "is_weekend","is_month_end","price_factor","promo_active","is_holiday", + "days_since_launch"]`. The Prophet-like model consumes this frame **unchanged** and + writes **zero** new contract code (DECISIONS LOCKED #3). +- **Train / predict / scenarios / backtesting** — all branch on + `model.requires_features`, capability-based, never on a `model_type` string + (`forecasting/service.py:219,383`, `scenarios/service.py:114`, + `backtesting/service.py:384-409`). A new feature-aware model trains, is predict-rejected, + re-forecasts in scenarios (`method="model_exogenous"`), and backtests with **zero + changes to those four service layers**. +- **The leakage spec.** `app/features/forecasting/tests/test_regression_features_leakage.py` + and `app/shared/feature_frames/tests/test_leakage.py` pin the historical and future + builders. Because the Prophet-like model consumes the **same** builders, its training and + future feature matrices are leakage-covered by construction (DECISIONS LOCKED #6). + +The **problem this PRP fixes**: `docs/optional-features/05-advanced-ml-model-zoo.md` calls +for "Prophet-like models with trend, seasonality, holiday, and regressor components" as the +third model family — to make ForecastLabAI a credible model-*comparison* platform with more +than tree models. No additive/decomposable model exists today (`regression` is a tree; +`naive`/`seasonal_naive`/`moving_average` are target-only). + +## DEPENDS ON — read before starting + +- `PRPs/INITIAL/INITIAL-MLZOO-C-xgboost-prophet-extensions.md` — the shared C brief. +- `PRPs/INITIAL/INITIAL-MLZOO-index.md` — the MLZOO roadmap. +- `docs/optional-features/05-advanced-ml-model-zoo.md` — § "Prophet-like Models" is the + vision: trend, weekly/yearly seasonality, holiday/event regressors, optional changepoints, + optional external regressors; and the explicit option "Implement a lightweight additive + model using sklearn regression over generated trend/seasonal features." +- `PRPs/PRP-27-scenario-simulation-full-version.md` & `PRPs/ai_docs/exogenous-regressor-forecasting.md` + — how the `regression` model (the structural template) consumes a future feature frame. +- `examples/models/feature_frame_contract.md` — the historical/future frame shapes. + +--- + +## Goal + +Implement `ProphetLikeForecaster` — a deterministic, feature-aware, **additive** forecasting +model — and wire it end-to-end. It is a scikit-learn `Pipeline([SimpleImputer, Ridge])` over +the canonical 14-column feature frame. It exposes the standard `BaseForecaster` interface +(`fit`/`predict`/`get_params`/`set_params`, `requires_features = True`) **plus** a model- +specific `decompose()` method that returns the additive trend / seasonality / holiday- +regressor contribution breakdown of a forecast. Because it is pure scikit-learn (already a +core dependency), it ships **always-enabled** — no optional dependency group, no feature +flag, no lazy import — exactly like the `regression` model. + +**End state:** a user can train a `prophet_like` model (HTTP or job), re-forecast it in a +what-if scenario (`method="model_exogenous"`), and backtest it, with no extra install and no +flag — exactly as they can a `regression` model today. Every existing model behaves +**identically** before and after. + +## Why + +- **The model zoo needs a non-tree, transparent model.** The comparison platform currently + has three target-only baselines and two opaque gradient-boosted trees (`regression`, + `lightgbm` — and `xgboost` from sibling C1). An *additive linear* model is a genuinely + different model family: interpretable, fast, and the natural seam for explainability + (MLZOO-D). It answers "how much of this forecast is trend vs seasonality vs the promo?". +- **Dependency-free.** Unlike the tree models, this needs no native dependency, no extra, + no flag — it ships on the already-pinned `scikit-learn`. Zero install-friction; perfectly + aligned with the single-host vision. +- **The foundation is fully paid for.** Train, predict, scenarios, and backtesting all + branch on `requires_features`. A new feature-aware model is a contained change. +- **Low blast radius.** No migration, no API-contract change, no existing-model change, no + new dependency, no new vertical slice. + +## What + +A backend-only feature PRP. User-visible behaviour gains exactly one thing: `model_type: +"prophet_like"` becomes a real, trainable, scenario-re-forecastable, backtestable model. +Everything else is identical. + +### The model design (READ THIS — it is the core of the PRP) + +**Decomposition mapping.** The canonical 14 columns are partitioned into three +Prophet-style components. Define this as a module-level constant in `models.py`: + +| Component | Canonical columns | Prophet analogue | +|-----------|-------------------|------------------| +| `trend` | `lag_1`, `lag_7`, `lag_14`, `lag_28`, `days_since_launch` | growth `g(t)` — autoregressive level + lifecycle ramp | +| `seasonality` | `dow_sin`, `dow_cos`, `month_sin`, `month_cos`, `is_weekend`, `is_month_end` | seasonal `s(t)` — weekly/monthly cycle (these are exactly `CALENDAR_COLUMNS`) | +| `holiday_regressor` | `price_factor`, `promo_active`, `is_holiday` | holiday + extra-regressor `h(t)` — known-in-advance exogenous effects | + +**The additive math.** A `Ridge` fit gives `y_hat = intercept + Σ_i coef_i · x_i`. Group the +sum by component: `y_hat = intercept + trend_contrib + seasonality_contrib + +regressor_contrib`, where `_contrib = Σ_{i ∈ component} coef_i · x_i`. This is the +literal additive decomposition — each component contribution is just the partial sum of that +component's columns. `decompose(X)` returns the four-way breakdown; the **additive +invariant** is `intercept + trend + seasonality + holiday_regressor == predict(...)` +(within float tolerance) and is a model-specific validation test. + +**NaN tolerance.** Linear models reject `NaN` (`Ridge.fit` raises `ValueError: Input +contains NaN`). The future feature frame intentionally emits `NaN` for un-resolvable lag +cells. Mitigation: a `SimpleImputer(strategy="median")` as the first `Pipeline` step. The +imputer learns its per-column medians on **training `X` only** (`Pipeline.fit` enforces +this) and re-applies them at predict time — no leakage. `decompose()` therefore computes +`coef_ · x` on the **imputed** `X`, not the raw `X`. + +**Determinism.** `Ridge(solver="cholesky")` has a closed-form, deterministic solution (no +`random_state` needed). `SimpleImputer` (median) is deterministic. The whole `Pipeline` is +deterministic — two fits on the same data produce identical coefficients and forecasts. + +**Why `Ridge`, not `LinearRegression`.** The 14 engineered columns are heavily collinear +(`lag_1` vs `lag_7`, the calendar columns). Plain OLS is unstable under collinearity; +`Ridge`'s L2 penalty makes coefficients robust while staying closed-form and deterministic. +`ElasticNet` is rejected — its L1 term zeros coefficients (feature selection), which would +silently drop a curated calendar column and corrupt the seasonal component; it is also +iterative. (See `https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression-and-classification`.) + +### Technical requirements + +1. **No new dependency.** `scikit-learn` is already a core dependency (`pyproject.toml:21`) + and ships `Ridge`, `SimpleImputer`, `Pipeline`. **No** `pyproject.toml` change, **no** + `uv.lock` change, **no** new optional extra (DECISIONS LOCKED #2). +2. **No feature flag.** The model is always available, exactly like `regression`. **No** + `app/core/config.py` change, **no** `forecast_enable_*` setting, **no** route gate + (DECISIONS LOCKED #2). +3. **`ProphetLikeModelConfig`** in `app/features/forecasting/schemas.py` — a + `ModelConfigBase` subclass: `model_type: Literal["prophet_like"]`, `alpha: float` + (Ridge regularization strength, `ge=0.0`, `le=10000.0`, default `1.0`), + `feature_config_hash: str | None`. Conservative — no `seasonality_mode`, no Fourier + order (DECISIONS LOCKED #4). Added to the `ModelConfig` union. +4. **`ProphetLikeForecaster`** in `app/features/forecasting/models.py` — a `BaseForecaster` + subclass, `requires_features: ClassVar[bool] = True`, structurally closest to + `RegressionForecaster`. It builds a `Pipeline([("impute", SimpleImputer(strategy= + "median")), ("ridge", Ridge(alpha=self.alpha, solver="cholesky"))])` inside `fit()`, + stores it as `self._estimator`, and stores the fitted column-component grouping. It + exposes `decompose()` in addition to the base interface. +5. **`model_factory`** — a new `prophet_like` branch (no flag gate, mirroring the + `regression` branch at `models.py:793-803`). The `ModelType` literal (`models.py:736`) + gains `"prophet_like"`. +6. **Jobs integration.** `JobService._execute_train` (`jobs/service.py:454-478`) and + `_execute_backtest` (`jobs/service.py:641-658`) each gain a `prophet_like` branch + building `ProphetLikeModelConfig` — mirroring the `regression` branches. +7. **Persistence/metadata.** **No `ModelBundle` change.** The fitted `Pipeline` is pickled + by joblib exactly like `HistGradientBoostingRegressor`; `sklearn_version` (already + captured, `persistence.py:55,100`) and `runtime_info["sklearn_version"]` (already + captured, `registry/service.py:96-100`) fully cover it. No new version field — there is + no new library to version. (DECISIONS LOCKED #5.) +8. **Tests** — a new `test_prophet_like_forecaster.py` (no `importorskip` — pure sklearn, + always runs) with the standard contract tests **plus** model-specific tests (additive + invariant, imputer NaN tolerance, decomposition determinism); an + `examples/models/prophet_like_additive.py` example; additive docs. + +### Success Criteria + +- [ ] `model_factory(ProphetLikeModelConfig(), random_state=42)` returns a + `ProphetLikeForecaster` — **no flag, never raises a "not enabled" error**. +- [ ] `ProphetLikeForecaster.requires_features is True`; `fit`/`predict` require a + non-`None` `X` and raise the same error-message substrings as `RegressionForecaster` + (`"requires exogenous features"`, `"rows must match"`, `"horizon"`, `"fitted"`). +- [ ] A `predict` over a future frame containing `NaN` lag cells succeeds (the + `SimpleImputer` fills them) — a plain `Ridge` would raise `ValueError: Input contains + NaN`. +- [ ] Two fits on the same data produce **identical** forecasts + (`np.testing.assert_array_equal`). +- [ ] **Additive invariant:** for any fitted model and any `X`, + `decompose(X)` returns `{intercept, trend, seasonality, holiday_regressor}` summing + (within `1e-9` relative tolerance) to `predict(len(X), X)`. +- [ ] `decompose()` uses the **imputed** `X` and the **trained** imputer statistics — a + future-frame `NaN` is imputed with the *training* median, not a predict-time median. +- [ ] `ForecastingService.train_model` trains a `prophet_like` model with **no edit to + `train_model`**; `POST /scenarios/simulate` returns `method="model_exogenous"`; a + backtest produces per-fold metrics — all with **no edit to the four service layers**. +- [ ] `JobService._execute_train` and `_execute_backtest` accept `model_type="prophet_like"`. +- [ ] Every existing model and every existing test pass **with no behaviour change**. +- [ ] `uv run ruff check . && uv run ruff format --check . && uv run mypy app/ && uv run pyright app/ && uv run pytest -v -m "not integration"` all green. +- [ ] No Alembic migration; no new dependency; no `pyproject.toml`/`uv.lock`/`config.py` + change; no route-path/response-schema/WebSocket change. + +--- + +## All Needed Context + +### Documentation & References + +```yaml +- file: app/features/forecasting/models.py + why: RegressionForecaster (lines 438-577) is the STRUCTURAL TEMPLATE — pure-sklearn + wrapper, no optional dep, no flag, estimator constructed inside fit() and typed + `Any`. Copy its __init__/fit/predict guard shape and error strings. The + model_factory `regression` branch (lines 793-803) is the template for the + prophet_like branch (NO flag gate). The ModelType literal is at line 736. The + module already imports sklearn with `# type: ignore[import-untyped]` (lines 20-22) + — add the Ridge/SimpleImputer/Pipeline imports the same way. + +- file: app/features/forecasting/schemas.py + why: RegressionModelConfig (lines 147-189) is the template for ProphetLikeModelConfig — + same ModelConfigBase base, same Field(ge=, le=, default=) idiom, same + feature_config_hash field. The ModelConfig union is at lines 192-199. + +- file: app/features/forecasting/service.py + why: train_model (lines 180-297) branches `if model.requires_features:` (line 219) — + MODEL-AGNOSTIC, builds the historical frame via _build_regression_features and + calls model.fit(features.y, features.X). predict (lines 383-393) rejects + feature-aware models. DO NOT EDIT service.py — a prophet_like model trains and is + predict-rejected purely because requires_features=True. + +- file: app/features/scenarios/service.py + why: model_exogenous dispatch branches on `bundle.model.requires_features` (line 114) — + no model_type strings remain in app/features/scenarios/. A prophet_like bundle + takes the genuine re-forecast path with ZERO scenarios changes. + +- file: app/features/backtesting/service.py + why: lines 384-409 probe `model_factory(...).requires_features` and build per-fold + leakage-safe X. MODEL-AGNOSTIC. A prophet_like model backtests with ZERO + backtesting-service changes. + +- file: app/features/jobs/service.py + why: _execute_train model_type chain at lines 454-478 (the `regression` branch is the + template; final `else: raise ValueError("Unsupported model_type: ...")`). + _execute_backtest has an IDENTICAL chain at lines 641-658. Forecasting-schemas + import block at lines 426-433. ADD a prophet_like branch to BOTH chains. + +- file: app/features/forecasting/persistence.py + why: CONFIRM no change is needed. ModelBundle (lines 48-57) captures sklearn_version + (line 55, 100). The prophet_like Pipeline pickles like HistGradientBoostingRegressor + — sklearn_version covers it. No new field. + +- file: app/features/forecasting/tests/test_regression_forecaster.py + why: The 10-test contract template — clone the contract tests (fit/predict roundtrip, + rejects-None-X, rejects-mismatched-rows, predict-before-fit, get/set params, + determinism, factory creation). Copy the `_synthetic_data` helper verbatim. The + prophet_like test file ADDS model-specific tests on top (see Tasks). + +- file: app/features/forecasting/tests/test_service.py + why: TestFeatureAwareContract (lines 349-412) — extend test_requires_features_flag with + a prophet_like assertion. + +- file: app/features/jobs/tests/test_service.py + why: test_execute_train_builds_regression_config (lines 204-220) and + test_execute_backtest_builds_regression_config (lines 263-284) are the templates + for the prophet_like job tests. test_execute_train_rejects_unsupported_model_type + (lines 243-249) uses "arima" — DO NOT touch it. + +- url: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html + why: Ridge(alpha=1.0, solver="cholesky") — closed-form, deterministic. solver `"sag"`/ + `"saga"` are STOCHASTIC and need random_state — never use them. `"cholesky"`/`"svd"`/ + `"lsqr"` are deterministic; pin `"cholesky"` explicitly. + critical: Ridge.fit raises `ValueError: Input contains NaN` on any NaN in X — hence the + SimpleImputer. + +- url: https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html + why: SimpleImputer(strategy="median", missing_values=np.nan) — deterministic per-column + medians, robust to right-skewed sales lag/rolling columns. Learns statistics on + fit() only. + +- url: https://scikit-learn.org/stable/modules/compose.html#pipeline + why: Pipeline([("impute", SimpleImputer(...)), ("ridge", Ridge(...))]) — fit() learns + imputer medians on the TRAINING X, predict()/transform() reuses them. Folding the + imputer inside the Pipeline is what keeps the no-leakage invariant intact. + +- url: https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression-and-classification + section: Ridge regression and classification + critical: Ridge's L2 penalty makes coefficients robust to the collinear 14-column frame; + OLS (LinearRegression) is "highly sensitive" under collinearity. ElasticNet's L1 + term zeros coefficients (unwanted feature selection) — rejected. + +- docfile: docs/optional-features/05-advanced-ml-model-zoo.md + why: § "Prophet-like Models" — the design vision (trend/seasonality/holiday/regressor, + optional changepoints, optional regressors) and the explicit endorsement of the + "lightweight additive model using sklearn regression" option this PRP implements. +``` + +### Current Codebase tree (relevant — all already exist) + +```bash +app/features/forecasting/ +├── models.py # RegressionForecaster (the template), model_factory, ModelType +├── schemas.py # RegressionModelConfig (the template), ModelConfig union +├── service.py # train_model + predict branch on requires_features (untouched) +├── persistence.py # ModelBundle — sklearn_version already covers prophet_like (untouched) +├── routes.py # /forecasting/train — NO gate needed (no flag) (untouched) +└── tests/ + ├── test_regression_forecaster.py # the contract-test template to clone + ├── test_service.py # TestFeatureAwareContract + └── test_regression_features_leakage.py # load-bearing — already covers prophet_like's frame +app/features/scenarios/service.py # model_exogenous on requires_features (untouched) +app/features/backtesting/service.py # feature-aware fold loop on requires_features (untouched) +app/features/jobs/service.py # _execute_train + _execute_backtest model_type chains +app/shared/feature_frames/ # the shared 14-column contract — reused, untouched +pyproject.toml # scikit-learn already core — NO change +``` + +### Desired Codebase tree — files to ADD + +```bash +app/features/forecasting/tests/ +└── test_prophet_like_forecaster.py # contract tests + model-specific (additive) tests +examples/models/ +└── prophet_like_additive.py # minimal train / predict / decompose example +``` + +### Files to MODIFY (all additive or behaviour-preserving) + +```bash +app/features/forecasting/schemas.py # + ProphetLikeModelConfig; + to ModelConfig union +app/features/forecasting/models.py # + Ridge/SimpleImputer/Pipeline imports; + # + _PROPHET_LIKE_COMPONENTS constant; + # + ProphetLikeForecaster; + "prophet_like" + # in ModelType; + model_factory branch +app/features/jobs/service.py # _execute_train + _execute_backtest: + prophet_like +app/features/forecasting/tests/test_service.py # extend TestFeatureAwareContract +app/features/jobs/tests/test_service.py # + prophet_like train + backtest job tests +app/features/scenarios/tests/test_routes_integration.py # + prophet_like model_exogenous test +app/features/backtesting/tests/test_feature_aware_backtest.py # + prophet_like backtest test +examples/models/model_interface.md # additive: prophet_like row +examples/models/feature_frame_contract.md # additive: prophet_like is a feature-aware model +README.md # additive: prophet_like model type +``` + +> Note the **absence**: no `pyproject.toml`, no `uv.lock`, no `app/core/config.py`, no +> `forecasting/routes.py`, no `persistence.py`, no `registry/service.py`. That absence is +> the design — a pure-sklearn model needs none of the optional-dependency machinery the +> tree models carry. + +### DECISIONS LOCKED (resolved during planning — do NOT re-litigate) + +1. **C1 (XGBoost) and C2 (Prophet-like) are separate PRPs, branches, and review units.** + This PRP touches **only** the Prophet-like model. (User-confirmed.) + +2. **Dependency strategy = no new dependency, no optional extra, no feature flag.** The + model is built from `Ridge` + `SimpleImputer` + `Pipeline`, all in `scikit-learn`, which + is already a core dependency. There is therefore nothing to gate — the model ships + always-enabled exactly like `regression`. No `ml-prophet` extra (the real `prophet`/Stan + package is explicitly **not** used). This directly answers `INITIAL-MLZOO-C`'s + "dependency strategy" requirement. (User-confirmed: "Lightweight sklearn additive + model".) + +3. **The model consumes the canonical 14-column frame UNCHANGED — no new columns.** It does + **not** add Fourier seasonal columns. Rationale: (a) the frame already carries calendar + columns (`dow_sin/cos`, `month_sin/cos`, `is_weekend`, `is_month_end`) that a linear + model regresses on to capture weekly/monthly seasonality; (b) adding new columns would + create a **new leakage surface** outside the pinned `test_leakage.py` specs — a + disproportionate risk for a v1. Continuous yearly-Fourier terms are an explicit Open + Question, not v1 scope. + +4. **`ProphetLikeModelConfig` is conservative — `alpha` + `feature_config_hash` only.** No + `seasonality_mode` (the model is strictly additive — multiplicative seasonality is an + Open Question), no Fourier-order field (per #3), no changepoint field (changepoint trend + is an Open Question). `alpha` is the one genuinely model-shaping knob (Ridge L2 + strength). Mirrors the conservative-config precedent (PRP-30 DECISIONS LOCKED #3). + +5. **No `ModelBundle` / `runtime_info` change.** The fitted `Pipeline` pickles like any + sklearn estimator; the existing `sklearn_version` capture (bundle + registry runtime + info) fully covers it. There is no new library, so there is no new version to record. + This answers `INITIAL-MLZOO-C`'s "persistence/metadata shape" requirement: the metadata + shape is **unchanged** — and that is the correct, intentional answer. + +6. **No new leakage test.** The model consumes `_build_regression_features` / + `_assemble_regression_rows` and the shared `app/shared/feature_frames` builders + byte-for-byte — already pinned by the load-bearing leakage specs. The model-specific + tests this PRP adds (additive invariant, imputer NaN tolerance) test the *model*, not the + frame. The SimpleImputer is leakage-safe **because** the `Pipeline` learns medians on + train `X` only — a property covered by a model-specific test (Task 8), not a frame + leakage test. + +7. **`Ridge(solver="cholesky")` — deterministic, pinned explicitly.** `solver="auto"` would + pick a deterministic solver for a dense matrix anyway, but pinning `"cholesky"` makes the + determinism guarantee explicit and immune to a future sklearn default change. Never use + `"sag"`/`"saga"` (stochastic). `SimpleImputer(strategy="median")` — median over `mean` + for robustness to right-skewed retail lag/rolling columns. + +8. **`model_type = "prophet_like"`, class `ProphetLikeForecaster`.** The `_like` suffix is + the honesty marker — it states "approximates Prophet, is not Prophet". + `docs/optional-features/05-advanced-ml-model-zoo.md` endorses "Prophet-like" as the + intentional term. Docstrings and docs MUST reinforce that changepoint trend, uncertainty + intervals, and automatic seasonality are out of scope (see Risks). + +### Known Gotchas of our codebase & Library Quirks + +```python +# CRITICAL: Ridge rejects NaN. `Ridge.fit(X, y)` and `.predict(X)` raise +# `ValueError: Input contains NaN` on ANY NaN cell. The future feature frame +# intentionally emits NaN for un-resolvable lag cells. The SimpleImputer as the FIRST +# Pipeline step is mandatory, not optional — without it, every scenario re-forecast and +# every backtest fold of a prophet_like model raises. + +# CRITICAL: imputer leakage. The SimpleImputer MUST learn its medians on the TRAINING X +# only. `Pipeline.fit(X_train, y)` does this automatically; `Pipeline.predict(X_future)` +# reuses the trained medians. NEVER call SimpleImputer().fit_transform(X_future) +# separately — that would leak future-window statistics. Keep the imputer INSIDE the +# Pipeline; never impute X by hand. + +# CRITICAL: decompose() operates on IMPUTED X. The Ridge coef_ multiply the imputed +# feature values, not the raw NaN-containing values. decompose() must run +# `self._estimator.named_steps["impute"].transform(X)` first, then compute +# coef_ · imputed_X grouped by component. Computing coef_ · raw_X would (a) propagate +# NaN and (b) break the additive invariant (sum != predict()). + +# GOTCHA: interface argument order. BaseForecaster is fit(y, X) / predict(horizon, X); +# the sklearn Pipeline is fit(X, y) / predict(X). ProphetLikeForecaster.fit adapts: +# internally `self._estimator.fit(X, y)`. Mirror RegressionForecaster.fit (models.py:483) +# exactly — it already does this adaptation. + +# GOTCHA: mypy --strict + sklearn. models.py imports sklearn with +# `# type: ignore[import-untyped]` (models.py:20-22). Add the Ridge/SimpleImputer/ +# Pipeline imports the SAME way. Type the estimator `Any` (mirror `estimator: Any = +# HistGradientBoostingRegressor(...)` at models.py:510). decompose()'s return type is a +# concrete typed dict/dataclass — define it explicitly so mypy --strict is satisfied. + +# GOTCHA: no importorskip. test_prophet_like_forecaster.py needs NO `pytest.importorskip` +# — scikit-learn is a core dependency, always installed. The test file always RUNS +# (unlike the lightgbm/xgboost test files which skip without their optional extra). + +# GOTCHA: Ridge with alpha=0 degenerates to OLS. ProphetLikeModelConfig.alpha has ge=0.0 +# so alpha=0 is permitted; that is fine (OLS is still deterministic with solver= +# "cholesky") but loses the collinearity robustness. The default 1.0 is the sane value; +# document that alpha=0 is OLS. + +# GOTCHA: line endings — repo has mixed CRLF/LF, no .gitattributes. Run `git diff --stat` +# before committing; re-normalise any whole-file noise diff to its original ending. + +# SIBLING-PRP integration: PRP-MLZOO-C1 also edits the ModelType Literal (models.py:736) +# and the ModelConfig union (schemas.py:192-199). Both edits are purely additive (one new +# literal entry, one new union member). If C1 merged first you will see its "xgboost" +# entry already present — just add "prophet_like" alongside. A trivial one-line rebase, +# never a semantic conflict. C1 also edits config.py/pyproject.toml/persistence.py — +# files this PRP does NOT touch, so no overlap there. +``` + +--- + +## Implementation Blueprint + +### Data models and structure + +```python +# app/features/forecasting/schemas.py — mirrors RegressionModelConfig (schemas.py:147-189) + +class ProphetLikeModelConfig(ModelConfigBase): + """Configuration for the Prophet-like additive forecaster (MLZOO-C2). + + A deterministic, regularized ADDITIVE linear model — a ``Ridge`` regressor + over the canonical 14-column feature frame — that decomposes demand into + trend / seasonality / holiday-regressor components. It approximates + Prophet's additive shape WITHOUT the real ``prophet``/Stan dependency: it + does not model changepoint trend, posterior uncertainty, or automatic + seasonality discovery. Pure scikit-learn — no optional dependency, no + feature flag, always available (like ``RegressionModelConfig``). + + Attributes: + alpha: Ridge L2 regularization strength. 0.0 degenerates to ordinary + least squares; the default 1.0 keeps coefficients robust to the + collinear engineered-feature frame. + feature_config_hash: Optional hash of the feature contract used. + """ + + model_type: Literal["prophet_like"] = "prophet_like" + alpha: float = Field( + default=1.0, ge=0.0, le=10000.0, description="Ridge L2 regularization strength" + ) + feature_config_hash: str | None = Field( + default=None, description="Hash of the feature contract used for training" + ) + + +# app/features/forecasting/models.py — additions + +# Module scope, near the existing sklearn import (models.py:20-22): +from sklearn.impute import SimpleImputer # type: ignore[import-untyped] +from sklearn.linear_model import Ridge # type: ignore[import-untyped] +from sklearn.pipeline import Pipeline # type: ignore[import-untyped] + +# Module-scope constant — the decomposition column grouping (canonical 14-column order): +_PROPHET_LIKE_COMPONENTS: dict[str, tuple[str, ...]] = { + "trend": ("lag_1", "lag_7", "lag_14", "lag_28", "days_since_launch"), + "seasonality": ("dow_sin", "dow_cos", "month_sin", "month_cos", "is_weekend", "is_month_end"), + "holiday_regressor": ("price_factor", "promo_active", "is_holiday"), +} + +# A typed return type for decompose(): +@dataclass +class ForecastDecomposition: + """Additive component breakdown of a Prophet-like forecast. + + Invariant: ``intercept + trend + seasonality + holiday_regressor`` equals + ``predict(...)`` for the same X (within float tolerance), element-wise. + Each array has shape ``[n_rows]`` — one value per forecast row. + """ + intercept: float + trend: np.ndarray[Any, np.dtype[np.floating[Any]]] + seasonality: np.ndarray[Any, np.dtype[np.floating[Any]]] + holiday_regressor: np.ndarray[Any, np.dtype[np.floating[Any]]] + + +class ProphetLikeForecaster(BaseForecaster): + """Feature-aware ADDITIVE forecaster — Ridge over the canonical frame. + + Prophet-LIKE, not Prophet: it approximates Prophet's additive trend + + seasonality + holiday/regressor decomposition with a regularized linear + model over the already-engineered 14-column feature frame. It REQUIRES a + non-None exogenous X for fit and predict. A SimpleImputer (median) handles + the NaN lag cells the future frame emits; a Ridge(solver="cholesky") gives + a closed-form, deterministic fit. ``decompose()`` returns the per-component + additive contributions. + + NOT modelled (see PRP Risks): changepoint trend, posterior uncertainty + intervals, automatic seasonality discovery, multiplicative seasonality. + """ + + requires_features: ClassVar[bool] = True + + def __init__(self, *, alpha: float = 1.0, random_state: int = 42) -> None: + super().__init__(random_state) # random_state kept for interface parity; + self.alpha = alpha # Ridge(solver="cholesky") needs no seed + self._estimator: Any = None +``` + +### list of tasks (dependency-ordered) + +```yaml +# ════════ STEP 1 — Schema ════════ + +Task 1 — MODIFY app/features/forecasting/schemas.py — ADD ProphetLikeModelConfig: + - PLACE the new class AFTER RegressionModelConfig (after schemas.py:189), before the + ModelConfig union. + - MIRROR RegressionModelConfig's ModelConfigBase idiom (see Data models above). + - ADD `ProphetLikeModelConfig` to the ModelConfig union (schemas.py:192-199). + - VALIDATE: uv run mypy app/features/forecasting/schemas.py + +# ════════ STEP 2 — The forecaster + factory ════════ + +Task 2 — MODIFY app/features/forecasting/models.py — imports + _PROPHET_LIKE_COMPONENTS: + - ADD the three sklearn imports (Ridge, SimpleImputer, Pipeline) near models.py:20-22, + each with `# type: ignore[import-untyped]` (mirror the existing + HistGradientBoostingRegressor import). + - ADD the module-scope `_PROPHET_LIKE_COMPONENTS` dict and the `ForecastDecomposition` + dataclass (see Data models above). Place the dataclass near FitResult (models.py:28). + - VALIDATE: uv run ruff check app/features/forecasting/models.py + +Task 3 — MODIFY app/features/forecasting/models.py — ADD ProphetLikeForecaster: + - PLACE the new class AFTER LightGBMForecaster (after models.py:732), BEFORE the + ModelType alias. + - MIRROR RegressionForecaster for the guard shape + error strings: fit guards (X None -> + ValueError "ProphetLikeForecaster requires exogenous features X for fit()"; empty y + -> "Cannot fit on empty array"; row mismatch -> f"X has {X.shape[0]} rows but y has + {len(y)} — feature/target rows must match"); predict guards (not fitted -> + RuntimeError "Model must be fitted before predict"; X None -> ValueError + "ProphetLikeForecaster requires exogenous features X for predict()"; shape mismatch + -> f"X has {X.shape[0]} rows but horizon is {horizon} — they must match"). + - INSIDE fit(): build the Pipeline and fit it: + estimator: Any = Pipeline([ + ("impute", SimpleImputer(strategy="median")), + ("ridge", Ridge(alpha=self.alpha, solver="cholesky")), + ]) + estimator.fit(X, y) # Pipeline is fit(X, y); imputer learns medians on X here + self._estimator = estimator + - set requires_features: ClassVar[bool] = True; get_params returns {alpha, random_state}; + set_params mirrors RegressionForecaster.set_params. + - ADD the decompose() method (see Per-task pseudocode) — model-specific, NOT on + BaseForecaster. + - VALIDATE: uv run mypy app/features/forecasting/models.py && uv run pyright app/features/forecasting/ + +Task 4 — MODIFY app/features/forecasting/models.py — ModelType literal + model_factory: + - ADD "prophet_like" to the ModelType Literal (models.py:736). + - ADD an `elif model_type == "prophet_like":` branch to model_factory, mirroring the + `regression` branch (models.py:793-803) — NO flag gate: + elif model_type == "prophet_like": + from app.features.forecasting.schemas import ProphetLikeModelConfig + if isinstance(config, ProphetLikeModelConfig): + return ProphetLikeForecaster(alpha=config.alpha, random_state=random_state) + raise ValueError("Invalid config type for prophet_like") + - VALIDATE: uv run mypy app/ && uv run pyright app/ + +# ════════ STEP 3 — Jobs integration ════════ + +Task 5 — MODIFY app/features/jobs/service.py — _execute_train + _execute_backtest: + - ADD `ProphetLikeModelConfig` to the forecasting-schemas import (jobs/service.py:426-433). + - ADD a prophet_like branch to _execute_train (jobs/service.py:454-478), before the final + `else`, mirroring the `regression` branch: + elif model_type == "prophet_like": + config = ProphetLikeModelConfig(alpha=params.get("alpha", 1.0)) + - ADD a prophet_like branch to _execute_backtest (jobs/service.py:641-658): + elif model_type == "prophet_like": + # Feature-aware — the backtest builds per-fold leakage-safe X. + model_config = ProphetLikeModelConfig() + - VALIDATE: uv run mypy app/features/jobs/ && uv run pyright app/features/jobs/ + +# ════════ STEP 4 — Tests ════════ + +Task 6 — CREATE app/features/forecasting/tests/test_prophet_like_forecaster.py: + - NO importorskip — pure sklearn, always runs. + - COPY the `_synthetic_data` helper from test_regression_forecaster.py verbatim, but use + n_features=14 so the component grouping lines up with the canonical contract (the + decompose tests need exactly the 14 canonical columns). + - CLONE the contract tests: fit_predict_roundtrip, fit_rejects_none_features, + fit_rejects_mismatched_rows, predict_rejects_none_features, + predict_rejects_wrong_shape_features, predict_before_fit_raises, + determinism_same_data (np.testing.assert_array_equal), get_and_set_params, + requires_features_is_true, model_factory_creates_prophet_like_forecaster (NO flag). + - VALIDATE: uv run pytest -v app/features/forecasting/tests/test_prophet_like_forecaster.py + +Task 7 — MODIFY app/features/forecasting/tests/test_prophet_like_forecaster.py — model-specific tests: + - test_handles_nan_features: a future frame with NaN lag cells predicts finite values + (the SimpleImputer fills them) — a plain Ridge would raise. Assert np.all(isfinite). + - test_additive_invariant: for a fitted model, `d = model.decompose(X)`; + np.testing.assert_allclose( + d.intercept + d.trend + d.seasonality + d.holiday_regressor, + model.predict(len(X), X), rtol=1e-9) + - test_decompose_components_have_horizon_length: each of d.trend/seasonality/ + holiday_regressor has shape (len(X),). + - test_decompose_uses_trained_imputer_statistics: fit on X_train (no NaN), then call + decompose on an X_future whose lag cell is NaN; assert the imputed value used is + the TRAINING-column median (not the future-column median) — i.e. decompose's + imputed X equals `model._estimator.named_steps["impute"].transform(X_future)`. + - test_decompose_before_fit_raises: decompose() before fit() raises RuntimeError. + - VALIDATE: uv run pytest -v app/features/forecasting/tests/test_prophet_like_forecaster.py + +Task 8 — MODIFY app/features/forecasting/tests/test_service.py: + - In TestFeatureAwareContract.test_requires_features_flag, ADD: + from app.features.forecasting.models import ProphetLikeForecaster + from app.features.forecasting.schemas import ProphetLikeModelConfig + assert model_factory(ProphetLikeModelConfig()).requires_features is True + - (No flag test — prophet_like has no feature flag.) + - VALIDATE: uv run pytest -v -m "not integration" app/features/forecasting/tests/test_service.py + +Task 9 — MODIFY app/features/jobs/tests/test_service.py: + - ADD test_execute_train_builds_prophet_like_config mirroring + test_execute_train_builds_regression_config (lines 204-220). + - ADD test_execute_backtest_builds_prophet_like_config mirroring + test_execute_backtest_builds_regression_config (lines 263-284). + - VALIDATE: uv run pytest -v app/features/jobs/tests/test_service.py + +Task 10 — MODIFY app/features/scenarios/tests/test_routes_integration.py: + - ADD an integration test that trains a `prophet_like` model then POSTs + /scenarios/simulate with its run_id and asserts `method == "model_exogenous"`. + Mirror the existing regression model_exogenous test. NO importorskip, NO flag. + - VALIDATE: uv run pytest -v -m integration app/features/scenarios/tests/test_routes_integration.py + +Task 11 — MODIFY app/features/backtesting/tests/test_feature_aware_backtest.py: + - ADD a test that runs the feature-aware backtest with a ProphetLikeModelConfig and + asserts per-fold metrics + feature_aware=True — mirroring + test_feature_aware_backtest_produces_per_fold_metrics. Satisfies INITIAL-MLZOO-B's + "backtesting integration test comparing baseline and advanced model path". + - VALIDATE: uv run pytest -v app/features/backtesting/tests/test_feature_aware_backtest.py + +# ════════ STEP 5 — Docs & example ════════ + +Task 12 — CREATE examples/models/prophet_like_additive.py: + - A runnable script: build a synthetic [n, 14] frame matching + canonical_feature_columns(), fit ProphetLikeForecaster(alpha=1.0), predict a + horizon, AND call decompose() and print the trend/seasonality/holiday_regressor + split for the first few rows. Mirror the structure/header of + examples/models/advanced_lightgbm.py. + - VALIDATE: uv run python examples/models/prophet_like_additive.py + +Task 13 — MODIFY examples/models/model_interface.md + feature_frame_contract.md: + - model_interface.md: ADDITIVE — add a ProphetLikeModelConfig entry under "## Model + Configurations" and a "### Prophet-like Forecaster" entry under "## Model + Formulas" (give the additive formula y = intercept + trend + seasonality + + holiday_regressor and the component column grouping). Note requires_features=True, + no optional extra, and the decompose() affordance. + - feature_frame_contract.md: ADDITIVE — record prophet_like as an IMPLEMENTED + feature-aware model. Do NOT rewrite the file. + - VALIDATE: uv run ruff check . && uv run ruff format --check . + +Task 14 — MODIFY README.md: + - ADDITIVE: add `prophet_like` to the Supported Model Types list (README.md:344 area) — + "Prophet-like additive linear model (trend / seasonality / regressor + decomposition); pure scikit-learn, always available, no extra to install". Mirror + the existing tone. + - VALIDATE: uv run ruff format --check . +``` + +### Per-task pseudocode (critical details only) + +```python +# ── Task 3 — ProphetLikeForecaster.fit / predict / decompose ── + +def fit(self, y, X=None): + if X is None: + raise ValueError("ProphetLikeForecaster requires exogenous features X for fit()") + if len(y) == 0: + raise ValueError("Cannot fit on empty array") + if X.shape[0] != len(y): + raise ValueError( + f"X has {X.shape[0]} rows but y has {len(y)} — feature/target rows must match" + ) + estimator: Any = Pipeline([ + ("impute", SimpleImputer(strategy="median")), # learns medians on THIS X only + ("ridge", Ridge(alpha=self.alpha, solver="cholesky")), # deterministic, closed-form + ]) + estimator.fit(X, y) # sklearn order is fit(X, y); imputer NaN-safe + self._estimator = estimator + self._last_values = np.asarray(y[-1:], dtype=np.float64) + self._is_fitted = True + return self + +def predict(self, horizon, X=None): + # guards identical in shape to RegressionForecaster.predict (models.py:522-546) + if not self._is_fitted or self._estimator is None: + raise RuntimeError("Model must be fitted before predict") + if X is None: + raise ValueError("ProphetLikeForecaster requires exogenous features X for predict()") + if X.shape[0] != horizon: + raise ValueError(f"X has {X.shape[0]} rows but horizon is {horizon} — they must match") + return np.asarray(self._estimator.predict(X), dtype=np.float64) # Pipeline imputes then Ridge + +def decompose(self, X): + """Additive trend / seasonality / holiday-regressor breakdown of a forecast. + + Operates on the IMPUTED X (the trained imputer's transform) so the + contributions sum exactly to predict(). Returns a ForecastDecomposition. + """ + if not self._is_fitted or self._estimator is None: + raise RuntimeError("Model must be fitted before decompose") + imputer = self._estimator.named_steps["impute"] + ridge = self._estimator.named_steps["ridge"] + x_imputed = imputer.transform(X) # trained medians fill any NaN + columns = canonical_feature_columns() # the 14-name ordered contract + contributions: dict[str, np.ndarray] = {} + for component, comp_cols in _PROPHET_LIKE_COMPONENTS.items(): + idx = [columns.index(c) for c in comp_cols] # column positions for this component + # additive contribution = Σ coef_i · x_i over this component's columns + contributions[component] = x_imputed[:, idx] @ ridge.coef_[idx] + return ForecastDecomposition( + intercept=float(ridge.intercept_), + trend=contributions["trend"], + seasonality=contributions["seasonality"], + holiday_regressor=contributions["holiday_regressor"], + ) + # Invariant: intercept + trend + seasonality + holiday_regressor == predict(len(X), X) + # because the three component column-sets partition all 14 columns exactly. + +# ── Task 4 — model_factory: no flag gate (unlike lightgbm/xgboost) ── +elif model_type == "prophet_like": + from app.features.forecasting.schemas import ProphetLikeModelConfig + if isinstance(config, ProphetLikeModelConfig): + return ProphetLikeForecaster(alpha=config.alpha, random_state=random_state) + raise ValueError("Invalid config type for prophet_like") +``` + +### Integration Points + +```yaml +DEPENDENCY: none. scikit-learn is already core. NO pyproject.toml / uv.lock change. +CONFIG: none. No feature flag. NO app/core/config.py change. +ROUTES: none. No flag -> no route gate. /forecasting/train accepts the new + model_type with no code change (additive ModelConfig union member). +TRAIN/PREDICT/SCENARIOS/BACKTESTING: all UNCHANGED — every path branches on + requires_features; a prophet_like model routes through automatically. +JOBS: jobs/service.py — + prophet_like branch in _execute_train AND + _execute_backtest (the one place a model_type string compare lives). +PERSISTENCE: ModelBundle UNCHANGED — sklearn_version covers the pickled Pipeline. +REGISTRY: _capture_runtime_info UNCHANGED — sklearn_version already recorded. +NO MIGRATION. NO API CONTRACT CHANGE (a new request-body model_type value is additive + and pre-1.0-permitted). +``` + +### Model-specific validation rules (required by INITIAL-MLZOO-C) + +Beyond the shared contract tests, the Prophet-like model has four invariants that the tree +models do not, each pinned by a test in `test_prophet_like_forecaster.py`: + +1. **Additive invariant** — `decompose()`'s four parts sum (rtol `1e-9`) to `predict()`. + This is what makes the model "Prophet-like": the forecast genuinely *is* the sum of its + components. (Task 7 `test_additive_invariant`.) +2. **NaN tolerance via the imputer** — a future frame with `NaN` lag cells must predict + finite values; a model-specific guarantee the bare `Ridge` does not have. (Task 7 + `test_handles_nan_features`.) +3. **Imputer leakage-safety** — `decompose()`/`predict()` impute future-frame `NaN` with + *training-window* medians, never future-window medians. (Task 7 + `test_decompose_uses_trained_imputer_statistics`.) This is the model-specific + leakage rule; the frame-level leakage is already covered by the pinned shared specs. +4. **Determinism** — `Ridge(solver="cholesky")` + `SimpleImputer(median)` are deterministic; + two fits give identical forecasts. (Task 6 `test_determinism_same_data`.) + +--- + +## Validation Loop + +### Level 1: Syntax & Style + +```bash +uv run ruff check . --fix && uv run ruff format --check . +``` + +### Level 2: Type Checks + +```bash +uv run mypy app/ # --strict +uv run pyright app/ # --strict +# The sklearn imports carry `# type: ignore[import-untyped]` (mirror models.py:20-22). +# ForecastDecomposition is a concretely-typed dataclass — no `Any` leakage in decompose()'s +# public return. +``` + +### Level 3: Unit Tests + +```bash +uv run pytest -v app/features/forecasting/tests/test_prophet_like_forecaster.py +uv run pytest -v -m "not integration" app/features/forecasting/tests/test_service.py +uv run pytest -v app/features/jobs/tests/test_service.py +uv run pytest -v app/features/backtesting/tests/test_feature_aware_backtest.py + +# Regression — must stay green, no behaviour change +uv run pytest -v -m "not integration" +# Expected: all green. test_prophet_like_forecaster.py RUNS unconditionally (no +# importorskip — sklearn is core). Every existing model's tests pass UNEDITED. +``` + +### Level 4: Integration Tests + +```bash +docker compose up -d && uv run alembic upgrade head +uv run pytest -v -m integration app/features/forecasting/ app/features/scenarios/ \ + app/features/jobs/ +# The scenarios prophet_like model_exogenous test (Task 10) must report +# method="model_exogenous". +``` + +### Level 5: Manual Validation (dogfood — REQUIRED) + +```bash +# 1. Determinism + the additive invariant +uv run python -c " +import numpy as np +from app.features.forecasting.models import ProphetLikeForecaster +rng = np.random.default_rng(0) +X = rng.normal(size=(120, 14)); y = (3.0*X[:,0] - 2.0*X[:,4] + rng.normal(size=120)).astype(float) +m1 = ProphetLikeForecaster(alpha=1.0).fit(y, X) +m2 = ProphetLikeForecaster(alpha=1.0).fit(y, X) +np.testing.assert_array_equal(m1.predict(12, X[:12]), m2.predict(12, X[:12])) +d = m1.decompose(X[:12]) +np.testing.assert_allclose( + d.intercept + d.trend + d.seasonality + d.holiday_regressor, m1.predict(12, X[:12]), rtol=1e-9) +print('prophet_like deterministic + additive invariant OK')" + +# 2. NaN tolerance +uv run python -c " +import numpy as np +from app.features.forecasting.models import ProphetLikeForecaster +rng = np.random.default_rng(1) +X = rng.normal(size=(80, 14)); y = X[:,0].astype(float) +m = ProphetLikeForecaster().fit(y, X) +fut = X[:6].copy(); fut[2, 0] = np.nan # un-resolvable lag cell +preds = m.predict(6, fut) +assert np.all(np.isfinite(preds)); print('prophet_like NaN-tolerant OK', preds[:3])" + +# 3. End-to-end: POST /forecasting/train with config {"model_type":"prophet_like"} -> 200 +# (no flag needed); POST /scenarios/simulate -> method == "model_exogenous"; +# submit a prophet_like backtest job -> completes with per-fold metrics. +``` + +--- + +## Final Validation Checklist + +- [ ] `uv run ruff check .` and `uv run ruff format --check .` clean. +- [ ] `uv run mypy app/` and `uv run pyright app/` clean (both --strict). +- [ ] `uv run pytest -v -m "not integration"` fully green; `test_prophet_like_forecaster.py` + RUNS unconditionally (no importorskip) and passes — including the additive-invariant, + NaN-tolerance, and imputer-leakage-safety tests. +- [ ] `uv run pytest -v -m integration app/features/{forecasting,scenarios,jobs}/` green, + including the scenarios `prophet_like` `model_exogenous` test. +- [ ] `model_factory(ProphetLikeModelConfig())` returns a `ProphetLikeForecaster` with + **no flag and no "not enabled" path**. +- [ ] A `prophet_like` backtest produces per-fold metrics with **no edit to + `backtesting/service.py`**. +- [ ] Every baseline / `regression` / `lightgbm` test passes with **no edit**. +- [ ] **No** `pyproject.toml`, `uv.lock`, `app/core/config.py`, `forecasting/routes.py`, + `persistence.py`, or `registry/service.py` change — confirm via `git diff --name-only`. +- [ ] No Alembic migration; no new dependency; no route-path/response-schema/WebSocket + change. +- [ ] `git diff --stat` shows only intended files — no whole-file CRLF/LF noise diffs. +- [ ] An OPEN GitHub issue exists (`gh issue view --json state` → `OPEN`); commit + `feat(forecast): add Prophet-like additive forecasting model (#)`; branch + `feat/forecasting-prophet-like-model` off `dev`. +- [ ] The PR description states C2 is one of two MLZOO-C review units, links the sibling + `PRP-MLZOO-C1`, and explicitly states the model is Prophet-LIKE (additive linear + approximation), not the real `prophet` package. + +--- + +## Anti-Patterns to Avoid + +- ❌ Don't implement the XGBoost model — that is `PRP-MLZOO-C1`, a separate branch. +- ❌ Don't combine C1 and C2 into one branch or one PR (DECISIONS LOCKED #1). +- ❌ Don't add the real `prophet` package, `cmdstanpy`, Stan, or an `ml-prophet` extra — + this model is deliberately pure scikit-learn (DECISIONS LOCKED #2). +- ❌ Don't add a `forecast_enable_prophet_like` flag or a route gate — a pure-sklearn model + ships always-on, like `regression`. +- ❌ Don't add Fourier seasonal columns or any new feature-frame columns — the model + consumes the canonical 14-column frame unchanged (DECISIONS LOCKED #3); new columns are a + new leakage surface. +- ❌ Don't impute `X` by hand or call `SimpleImputer().fit_transform(X_future)` — keep the + imputer INSIDE the `Pipeline` so it learns medians on train `X` only (leakage). +- ❌ Don't compute `decompose()` on the raw NaN-containing `X` — it must use the trained + imputer's `transform(X)`, or the additive invariant breaks and NaN propagates. +- ❌ Don't use `LinearRegression` (unstable on the collinear frame) or `ElasticNet` (L1 + zeros curated columns; iterative) — use `Ridge(solver="cholesky")`. +- ❌ Don't use `Ridge(solver="sag"/"saga")` — they are stochastic and break determinism. +- ❌ Don't add `seasonality_mode`, Fourier-order, or changepoint fields to + `ProphetLikeModelConfig` — DECISIONS LOCKED #4 keeps it to `alpha`. +- ❌ Don't edit `train_model`/`predict`, `scenarios/service.py`, or `backtesting/service.py` + — they branch on `requires_features`. +- ❌ Don't write a new frame leakage test — the model reuses the pinned shared builders. +- ❌ Don't claim this is "Prophet" anywhere — it is "Prophet-like" / "additive linear". + +## Risks & Open Questions + +### Risks (document honestly in docstrings + docs) + +- **Not real Prophet.** A `Ridge`-over-features model genuinely cannot do what Prophet does: + - **No changepoint trend.** Prophet fits a piecewise-linear growth curve with automatic + rate changes; this model's "trend" is only what the lag/`days_since_launch` columns + encode — a long-horizon forecast trends roughly linearly. + - **No uncertainty intervals.** `Ridge` returns a point forecast only; Prophet returns + `yhat_lower`/`yhat_upper` via posterior simulation. Prediction intervals are an Open + Question (residual quantiles / conformal prediction / `BayesianRidge`). + - **No automatic seasonality discovery.** Seasonality is fixed at feature-engineering + time — only the periodicity already in the 14 columns is visible. + - **Strictly additive.** No multiplicative seasonality (`seasonality_mode`). +- **Extrapolation fragility.** Linear models extrapolate unboundedly; at long horizons the + lag columns are increasingly imputed (median fill), degrading accuracy. The tree models + and Prophet degrade more gracefully. +- **Component-grouping is a modelling choice.** Putting the lag columns under `trend` (vs a + separate `autoregressive` component) is a deliberate, documented simplification — the + additive invariant holds regardless, but the *labels* are an interpretation. + +### Open Questions — to resolve at PRP review + +- [ ] **Prediction/uncertainty intervals.** Should v1 expose `yhat_lower`/`yhat_upper` + (e.g. from training-residual quantiles)? Currently out of scope — point forecast only. +- [ ] **Fourier seasonal columns.** A continuous yearly cycle is not in the canonical + 14-column frame. Adding Fourier yearly terms would improve long-period seasonality but + requires new frame columns (new leakage surface). Deferred (DECISIONS LOCKED #3) — + confirm deferral or scope a follow-up. +- [ ] **Changepoint trend.** A piecewise-linear trend basis would close the biggest gap + vs real Prophet but is a substantial modelling addition. Deferred — flag if wanted. +- [ ] **Surfacing `decompose()`.** v1 keeps `decompose()` as a model method used by tests + and the example only. Exposing it via an API endpoint / agent tool / the + explainability slice is a natural MLZOO-D item — confirm it stays out of C2 scope. +- [ ] **`alpha` tuning.** v1 ships a fixed default `alpha=1.0` (caller-overridable). Per- + series `alpha` selection (e.g. `RidgeCV`) is deferred to a tuning-focused future PRP. + +## Confidence Score + +**8 / 10** for one-pass implementation success. + +Rationale: the consuming infrastructure is fully paid for — train, predict, scenarios, and +backtesting all branch on `requires_features`, and `RegressionForecaster` is a proven +pure-sklearn template, so the wiring is contained (one class, one config, one factory +branch, two jobs branches — and *fewer* touch-points than C1 because there is no +dependency/flag/metadata machinery). The −2 risk is concentrated in the genuinely *new* +design surface: (a) the `decompose()` additive math — the column-index mapping and +imputed-X discipline must be exact for the additive invariant to hold, but the invariant is +a precise, fast unit test that catches any error immediately; and (b) the imputer-leakage +discipline — keeping `SimpleImputer` inside the `Pipeline` is the one rule that, if broken, +silently leaks, and it too is pinned by a model-specific test. Both risks are caught at +Level 3. The "every existing test passes unedited" gate makes any regression impossible to +miss. diff --git a/README.md b/README.md index 3466a525..12faaa6a 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,9 @@ docker-compose up -d ```bash uv sync --extra dev # or: pip install -e ".[dev]" +# LightGBM and XGBoost are opt-in advanced models — add the extra to enable each: +# uv sync --extra dev --extra ml-lightgbm (then set forecast_enable_lightgbm=true) +# uv sync --extra dev --extra ml-xgboost (then set forecast_enable_xgboost=true) ``` 4. **Run database migrations** @@ -338,7 +341,10 @@ curl -X POST http://localhost:8123/forecasting/predict \ - `naive` - Last observed value (simple baseline) - `seasonal_naive` - Same period from previous season - `moving_average` - Mean of last N observations -- `lightgbm` - LightGBM regressor (requires `forecast_enable_lightgbm=True`) +- `regression` - Gradient-boosted exogenous-feature regressor (feature-aware) +- `lightgbm` - LightGBM feature-aware regressor — opt-in: install the `ml-lightgbm` extra and set `forecast_enable_lightgbm=True` +- `xgboost` - XGBoost feature-aware regressor — opt-in: install the `ml-xgboost` extra and set `forecast_enable_xgboost=True` +- `prophet_like` - Prophet-like additive linear model (trend / seasonality / regressor decomposition); pure scikit-learn, always available, no extra to install See [examples/models/](examples/models/) for baseline model examples. @@ -390,6 +396,12 @@ curl -X POST http://localhost:8123/backtesting/run \ **Baseline Comparisons:** When `include_baselines=true`, automatically compares against naive and seasonal_naive models. +**Feature-Aware Models:** +`regression`, `lightgbm`, and `xgboost` models can be backtested too — set +`model_config_main.model_type` accordingly. Each fold builds a leakage-safe +per-fold feature matrix (`min_train_size >= 30` required); the result carries +`feature_aware: true` and `exogenous_policy: "observed"`. + See [examples/backtest/](examples/backtest/) for usage examples. ### Model Registry diff --git a/app/core/config.py b/app/core/config.py index d3ac4a24..b30253d7 100644 --- a/app/core/config.py +++ b/app/core/config.py @@ -99,6 +99,7 @@ class Settings(BaseSettings): forecast_max_horizon: int = 90 forecast_model_artifacts_dir: str = "./artifacts/models" forecast_enable_lightgbm: bool = False + forecast_enable_xgboost: bool = False # Backtesting backtest_max_splits: int = 20 diff --git a/app/features/backtesting/schemas.py b/app/features/backtesting/schemas.py index 537809f0..747d25c2 100644 --- a/app/features/backtesting/schemas.py +++ b/app/features/backtesting/schemas.py @@ -173,6 +173,12 @@ class ModelBacktestResult(BaseModel): fold_results: Results for each fold. aggregated_metrics: Mean metrics across folds. metric_std: Standard deviation of metrics across folds. + feature_aware: True when the model consumed a per-fold feature matrix + (``requires_features``); False for target-only baseline models. + exogenous_policy: How the test-window exogenous columns were sourced. + ``"observed"`` (the recorded price/promotion plan) for a + feature-aware result; ``None`` for a target-only model. Recorded so + the metric is read honestly as "accuracy given a known plan". """ model_type: str @@ -180,6 +186,8 @@ class ModelBacktestResult(BaseModel): fold_results: list[FoldResult] aggregated_metrics: dict[str, float] metric_std: dict[str, float] + feature_aware: bool = False + exogenous_policy: Literal["observed"] | None = None # ============================================================================= diff --git a/app/features/backtesting/service.py b/app/features/backtesting/service.py index b4b8e8f4..209e9081 100644 --- a/app/features/backtesting/service.py +++ b/app/features/backtesting/service.py @@ -17,6 +17,7 @@ import uuid from dataclasses import dataclass, field from datetime import date as date_type +from datetime import timedelta from pathlib import Path from typing import TYPE_CHECKING, Any @@ -34,20 +35,56 @@ ModelBacktestResult, SplitBoundary, ) -from app.features.backtesting.splitter import TimeSeriesSplitter -from app.features.data_platform.models import SalesDaily +from app.features.backtesting.splitter import TimeSeriesSplit, TimeSeriesSplitter +from app.features.data_platform.models import Calendar, Product, Promotion, SalesDaily from app.features.forecasting.models import model_factory from app.features.forecasting.schemas import ( ModelConfig, NaiveModelConfig, SeasonalNaiveModelConfig, ) +from app.shared.feature_frames import ( + HISTORY_TAIL_DAYS, + build_future_feature_rows, + build_historical_feature_rows, +) if TYPE_CHECKING: pass logger = structlog.get_logger() +# Minimum observed train rows a feature-aware model needs per fold to resolve +# its lag features — mirrors ``forecasting.service._MIN_REGRESSION_TRAIN_ROWS``. +# A feature-aware backtest with a smaller ``min_train_size`` fails loud in +# ``_validate_config`` rather than producing all-NaN lag columns silently. +_MIN_FEATURE_AWARE_TRAIN_ROWS = 30 + + +@dataclass +class ExogenousFrame: + """Pre-loaded exogenous data for one series — resolved async, consumed sync. + + A feature-aware backtest needs price / promotion / holiday / launch-date + data to build its per-fold feature matrices, but the fold loop is sync and + DB-free by design. ``run_backtest`` resolves all of it once into this pure + in-memory carrier; the fold builders read it without touching the database. + + Attributes: + prices: Observed unit prices, aligned index-for-index with + :attr:`SeriesData.dates`. + baseline_price: Median positive price (``>0``); fallback ``1.0``. + promo_dates: Days a promotion covered anywhere in the data window. + holiday_dates: Calendar holiday days in the data window. + launch_date: The product's launch date, or ``None``. + """ + + prices: list[float] + baseline_price: float + promo_dates: set[date_type] + holiday_dates: set[date_type] + launch_date: date_type | None + @dataclass class SeriesData: @@ -58,6 +95,8 @@ class SeriesData: values: Target values as numpy array. store_id: Store ID. product_id: Product ID. + exogenous: Pre-loaded exogenous data — present only for a feature-aware + backtest; ``None`` for a target-only run. n_observations: Number of observations. """ @@ -65,6 +104,7 @@ class SeriesData: values: np.ndarray[Any, np.dtype[np.floating[Any]]] store_id: int product_id: int + exogenous: ExogenousFrame | None = None n_observations: int = field(init=False) def __post_init__(self) -> None: @@ -126,6 +166,19 @@ def _validate_config(self, config: BacktestConfig) -> None: message="Using provided min_train_size below recommended default", ) + # Feature-aware models need enough train rows per fold to resolve their + # lag features. Build a cheap probe (no fit) and branch on the + # capability flag — never on a model_type string. Loud, not silent. + probe = model_factory( + config.model_config_main, random_state=self.settings.forecast_random_seed + ) + if probe.requires_features and split_config.min_train_size < _MIN_FEATURE_AWARE_TRAIN_ROWS: + raise ValueError( + f"A feature-aware model ({config.model_config_main.model_type}) needs " + f"min_train_size of at least {_MIN_FEATURE_AWARE_TRAIN_ROWS} to resolve its " + f"lag features per fold; got {split_config.min_train_size}." + ) + def save_results( self, response: BacktestResponse, @@ -217,6 +270,21 @@ async def run_backtest( f"between {start_date} and {end_date}" ) + # Feature-aware models consume a per-fold feature matrix. Branch on the + # capability flag (not a model_type string) and resolve the exogenous + # data once, here in the async entry point — the fold loop stays sync + # and DB-free. Target-only models skip this entirely. + probe = model_factory( + config.model_config_main, random_state=self.settings.forecast_random_seed + ) + if probe.requires_features: + series_data.exogenous = await self._load_exogenous_frame( + db=db, + store_id=store_id, + product_id=product_id, + dates=series_data.dates, + ) + # Create splitter and validate splitter = TimeSeriesSplitter(config.split_config) @@ -284,30 +352,61 @@ def _run_model_backtest( ) -> ModelBacktestResult: """Run backtest for a single model configuration. + Branches on the model's ``requires_features`` capability flag (never a + ``model_type`` string). A target-only model takes the unchanged + target-only path; a feature-aware model builds the full historical + feature matrix once (a local — never instance state) and runs each fold + through :meth:`_run_feature_aware_fold` with a leakage-safe per-fold + ``X_train`` slice and a rebuilt ``X_future``. The method signature is + unchanged — ``gap`` is read from ``splitter.config.gap``. Sync and + DB-free: all exogenous I/O happened in :meth:`run_backtest`. + Args: - series_data: Loaded time series data. + series_data: Loaded time series data (carries ``exogenous`` for a + feature-aware run). splitter: Time series splitter. model_config: Model configuration. store_fold_details: Whether to store per-fold details. Returns: - ModelBacktestResult with all fold results. + ModelBacktestResult with all fold results; ``feature_aware`` and + ``exogenous_policy`` are set for a feature-aware model. + + Raises: + ValueError: If a feature-aware model has no loaded ``ExogenousFrame``. """ fold_results: list[FoldResult] = [] fold_metrics: list[dict[str, float]] = [] + # Probe the capability flag, then build the historical matrix once for + # the whole run (feature-aware path only) — sliced, never rebuilt, for + # each fold's X_train. + probe = model_factory(model_config, random_state=self.settings.forecast_random_seed) + feature_aware: bool = probe.requires_features + historical_matrix: np.ndarray[Any, np.dtype[np.floating[Any]]] | None = None + if feature_aware: + historical_matrix = self._build_historical_matrix(series_data) + for split in splitter.split(series_data.dates, series_data.values): # Extract train and test data - y_train = series_data.values[split.train_indices] y_test = series_data.values[split.test_indices] - - # Create and fit model - model = model_factory(model_config, random_state=self.settings.forecast_random_seed) - model.fit(y_train) - - # Generate predictions horizon = len(split.test_indices) - predictions = model.predict(horizon) + + if historical_matrix is not None: + # Feature-aware path — per-fold leakage-safe X_train / X_future. + predictions = self._run_feature_aware_fold( + series_data=series_data, + split=split, + model_config=model_config, + historical_matrix=historical_matrix, + gap=splitter.config.gap, + ) + else: + # Target-only path — unchanged. + y_train = series_data.values[split.train_indices] + model = model_factory(model_config, random_state=self.settings.forecast_random_seed) + model.fit(y_train) + predictions = model.predict(horizon) # Calculate metrics metrics = self.metrics_calculator.calculate_all( @@ -360,7 +459,113 @@ def _run_model_backtest( fold_results=fold_results, aggregated_metrics=aggregated_metrics, metric_std=metric_std, + feature_aware=feature_aware, + exogenous_policy="observed" if feature_aware else None, + ) + + def _build_historical_matrix( + self, series_data: SeriesData + ) -> np.ndarray[Any, np.dtype[np.floating[Any]]]: + """Build the full-series historical feature matrix for a backtest. + + Built once per :meth:`_run_model_backtest` call as a local — never + instance state. Each row is leakage-safe *as a training row*: its lag + columns read only strictly-earlier observed targets. Per-fold + ``X_train`` is a positional slice of this matrix; ``X_future`` is NEVER + sliced from it (that would leak an adjacent test-day target). + + Args: + series_data: Loaded time series data — must carry ``exogenous``. + + Returns: + Row-major feature matrix aligned with ``series_data.dates``. + + Raises: + ValueError: If ``series_data.exogenous`` is ``None`` — the + genuinely-unsupported path for a feature-aware backtest. + """ + exo = series_data.exogenous + if exo is None: + raise ValueError( + "feature-aware backtest requires a loaded ExogenousFrame on series_data; " + "run_backtest must resolve exogenous data before the fold loop" + ) + rows = build_historical_feature_rows( + dates=series_data.dates, + quantities=[float(v) for v in series_data.values], + prices=exo.prices, + baseline_price=exo.baseline_price, + promo_dates=exo.promo_dates, + holiday_dates=exo.holiday_dates, + launch_date=exo.launch_date, ) + return np.array(rows, dtype=np.float64) + + def _run_feature_aware_fold( + self, + *, + series_data: SeriesData, + split: TimeSeriesSplit, + model_config: ModelConfig, + historical_matrix: np.ndarray[Any, np.dtype[np.floating[Any]]], + gap: int, + ) -> np.ndarray[Any, np.dtype[np.floating[Any]]]: + """Fit + predict one fold of a feature-aware backtest — pure, sync. + + ``X_train`` is a positional slice of the full historical matrix (built + once, leakage-safe by position). ``X_future`` is rebuilt here per fold + via :func:`build_future_feature_rows`: its ``history_tail`` ends at the + fold origin ``T`` (the last train day, gap days excluded), so a lag + cell whose source day falls in the test window is ``NaN``. + + Args: + series_data: Loaded time series data — carries ``exogenous``. + split: The fold's train/test split. + model_config: Model configuration (feature-aware). + historical_matrix: The full-series historical feature matrix. + gap: Gap days between train end and test start. + + Returns: + Per-day predictions for the fold's test window. + + Raises: + ValueError: If ``series_data.exogenous`` is ``None``. + """ + exo = series_data.exogenous + if exo is None: # defensive — the caller guarantees this is non-None + raise ValueError("feature-aware backtest requires a loaded ExogenousFrame") + + # X_train — slice the historical matrix (leakage-safe by position). + x_train = historical_matrix[split.train_indices] + y_train = series_data.values[split.train_indices] + + # X_future — rebuilt per fold. history_tail ends at T = last train day + # and EXCLUDES the gap days (data "not yet available" at forecast time). + train_end_idx = int(split.train_indices[-1]) + 1 + history_tail = [float(v) for v in series_data.values[:train_end_idx][-HISTORY_TAIL_DAYS:]] + test_indices = [int(i) for i in split.test_indices] + test_prices = [exo.prices[i] for i in test_indices] + test_promo_dates = { + series_data.dates[i] for i in test_indices if series_data.dates[i] in exo.promo_dates + } + test_holiday_dates = {d for d in split.test_dates if d in exo.holiday_dates} + x_future = np.array( + build_future_feature_rows( + test_dates=split.test_dates, + history_tail=history_tail, + gap=gap, + test_prices=test_prices, + baseline_price=exo.baseline_price, + test_promo_dates=test_promo_dates, + test_holiday_dates=test_holiday_dates, + launch_date=exo.launch_date, + ), + dtype=np.float64, + ) + + model = model_factory(model_config, random_state=self.settings.forecast_random_seed) + model.fit(y_train, x_train) + return model.predict(len(test_indices), x_future) def _run_baseline_comparisons( self, @@ -543,3 +748,105 @@ async def _load_series_data( store_id=store_id, product_id=product_id, ) + + async def _load_exogenous_frame( + self, + db: AsyncSession, + store_id: int, + product_id: int, + dates: list[date_type], + ) -> ExogenousFrame: + """Load exogenous data for a feature-aware backtest — async, once. + + Mirrors ``ForecastingService._build_regression_features``: resolves the + recorded unit price per date, the promotion-covered days, the calendar + holidays, and the product launch date into a pure :class:`ExogenousFrame` + the sync fold loop consumes. The only ``y``-free reads — never a target. + + Args: + db: Database session. + store_id: Store ID. + product_id: Product ID. + dates: The series dates (from :meth:`_load_series_data`) the prices + must align with index-for-index. + + Returns: + The resolved :class:`ExogenousFrame`. + """ + start_date = dates[0] + end_date = dates[-1] + + # Recorded unit price per date — aligned with the series dates. Every + # series date came from the same SalesDaily window, so each resolves. + price_rows = ( + await db.execute( + select(SalesDaily.date, SalesDaily.unit_price).where( + (SalesDaily.store_id == store_id) + & (SalesDaily.product_id == product_id) + & (SalesDaily.date >= start_date) + & (SalesDaily.date <= end_date) + ) + ) + ).all() + price_by_date = {row.date: float(row.unit_price) for row in price_rows} + prices = [price_by_date.get(day, 0.0) for day in dates] + + # Baseline price = median of the positive prices, so price_factor is + # ~1.0 on a typical day and < 1.0 on a markdown/promo day. + positive_prices = sorted(price for price in prices if price > 0.0) + baseline_price = positive_prices[len(positive_prices) // 2] if positive_prices else 1.0 + + holiday_dates: set[date_type] = set( + ( + await db.execute( + select(Calendar.date).where( + Calendar.date >= start_date, + Calendar.date <= end_date, + Calendar.is_holiday.is_(True), + ) + ) + ) + .scalars() + .all() + ) + + # Promotion-active days: store-specific OR chain-wide rows overlapping + # the data window, expanded to the set of dates they cover. + promo_rows = ( + await db.execute( + select(Promotion.start_date, Promotion.end_date).where( + Promotion.product_id == product_id, + (Promotion.store_id == store_id) | (Promotion.store_id.is_(None)), + Promotion.start_date <= end_date, + Promotion.end_date >= start_date, + ) + ) + ).all() + promo_dates: set[date_type] = set() + for promo in promo_rows: + day = max(promo.start_date, start_date) + last = min(promo.end_date, end_date) + while day <= last: + promo_dates.add(day) + day += timedelta(days=1) + + launch_date: date_type | None = await db.scalar( + select(Product.launch_date).where(Product.id == product_id) + ) + + logger.info( + "backtesting.exogenous_frame_loaded", + store_id=store_id, + product_id=product_id, + n_dates=len(dates), + n_holidays=len(holiday_dates), + n_promo_days=len(promo_dates), + ) + + return ExogenousFrame( + prices=prices, + baseline_price=baseline_price, + promo_dates=promo_dates, + holiday_dates=holiday_dates, + launch_date=launch_date, + ) diff --git a/app/features/backtesting/tests/test_feature_aware_backtest.py b/app/features/backtesting/tests/test_feature_aware_backtest.py new file mode 100644 index 00000000..0ad216ca --- /dev/null +++ b/app/features/backtesting/tests/test_feature_aware_backtest.py @@ -0,0 +1,296 @@ +"""Unit tests for feature-aware backtesting (MLZOO-B.2). + +Pure, DB-free tests of the per-fold feature-aware path wired into +``BacktestingService._run_model_backtest``: the historical matrix build, the +per-fold ``X_train`` slice / ``X_future`` rebuild, the ``feature_aware`` / +``exogenous_policy`` result fields, the gap > 0 fold, and the loud failure when +a feature-aware model reaches the fold loop with no ``ExogenousFrame`` loaded. + +The leakage invariants of the row builders themselves live in the load-bearing +``app/shared/feature_frames/tests/test_leakage.py``; this file pins the +backtesting *integration* of those builders. +""" + +from __future__ import annotations + +from datetime import date + +import numpy as np +import pytest + +from app.features.backtesting.schemas import SplitConfig +from app.features.backtesting.service import ( + BacktestingService, + ExogenousFrame, + SeriesData, +) +from app.features.backtesting.splitter import TimeSeriesSplitter +from app.features.forecasting.schemas import ( + NaiveModelConfig, + ProphetLikeModelConfig, + RegressionModelConfig, + XGBoostModelConfig, +) +from app.shared.feature_frames import canonical_feature_columns + +_N_FEATURES = len(canonical_feature_columns()) # 14 — 4 lags + 6 calendar + 4 exogenous + + +def _exogenous(n: int) -> ExogenousFrame: + """A flat, no-promo, no-holiday ExogenousFrame aligned with an n-day series.""" + return ExogenousFrame( + prices=[9.99] * n, + baseline_price=9.99, + promo_dates=set(), + holiday_dates=set(), + launch_date=None, + ) + + +def _series(dates: list[date], values: np.ndarray, *, with_exogenous: bool) -> SeriesData: + """Build SeriesData, optionally carrying a loaded ExogenousFrame.""" + return SeriesData( + dates=dates, + values=values, + store_id=1, + product_id=1, + exogenous=_exogenous(len(dates)) if with_exogenous else None, + ) + + +def test_canonical_feature_set_is_fourteen_columns() -> None: + """The feature-aware matrices use exactly the 14-column canonical set.""" + assert _N_FEATURES == 14 + + +def test_build_historical_matrix_shape_matches_series_and_columns( + sample_dates_120: list[date], + sample_values_120: np.ndarray, +) -> None: + """The historical matrix has one row per series day, canonical column width.""" + service = BacktestingService() + series = _series(sample_dates_120, sample_values_120, with_exogenous=True) + + matrix = service._build_historical_matrix(series) + + assert matrix.shape == (120, _N_FEATURES) + + +def test_build_historical_matrix_without_exogenous_fails_loud( + sample_dates_120: list[date], + sample_values_120: np.ndarray, +) -> None: + """No ExogenousFrame -> loud ValueError, never a silent all-NaN matrix.""" + service = BacktestingService() + series = _series(sample_dates_120, sample_values_120, with_exogenous=False) + + with pytest.raises(ValueError, match="ExogenousFrame"): + service._build_historical_matrix(series) + + +def test_feature_aware_fold_predicts_one_value_per_test_day( + sample_dates_120: list[date], + sample_values_120: np.ndarray, + sample_split_config_expanding: SplitConfig, +) -> None: + """A single feature-aware fold yields exactly horizon predictions.""" + service = BacktestingService() + series = _series(sample_dates_120, sample_values_120, with_exogenous=True) + splitter = TimeSeriesSplitter(sample_split_config_expanding) + historical_matrix = service._build_historical_matrix(series) + split = next(splitter.split(series.dates, series.values)) + + predictions = service._run_feature_aware_fold( + series_data=series, + split=split, + model_config=RegressionModelConfig(), + historical_matrix=historical_matrix, + gap=sample_split_config_expanding.gap, + ) + + assert predictions.shape == (len(split.test_indices),) + assert np.all(np.isfinite(predictions)) + + +def test_feature_aware_backtest_produces_per_fold_metrics( + sample_dates_120: list[date], + sample_values_120: np.ndarray, + sample_split_config_expanding: SplitConfig, +) -> None: + """A regression backtest runs end-to-end and yields per-fold metrics. + + Repurposed positive assertion — feature-aware models are backtestable now + (supersedes the PRP-29 interim loud-fail contract). + """ + service = BacktestingService() + series = _series(sample_dates_120, sample_values_120, with_exogenous=True) + splitter = TimeSeriesSplitter(sample_split_config_expanding) + + result = service._run_model_backtest( + series_data=series, + splitter=splitter, + model_config=RegressionModelConfig(), + store_fold_details=True, + ) + + assert result.model_type == "regression" + assert len(result.fold_results) > 0 + assert "mae" in result.aggregated_metrics + for fold in result.fold_results: + assert "mae" in fold.metrics + + +def test_feature_aware_backtest_runs_with_xgboost_model( + sample_dates_120: list[date], + sample_values_120: np.ndarray, + sample_split_config_expanding: SplitConfig, + monkeypatch: pytest.MonkeyPatch, +) -> None: + """An XGBoost backtest runs end-to-end and yields per-fold metrics. + + Mirrors ``test_feature_aware_backtest_produces_per_fold_metrics`` for the + XGBoost feature-aware model (PRP-MLZOO-C1) — proving the B.2 + ``requires_features`` probe needs no per-model backtesting-service wiring. + SKIPs when the optional ``ml-xgboost`` dependency is absent; the + ``forecast_enable_xgboost`` flag is enabled so ``model_factory`` dispatches. + """ + pytest.importorskip("xgboost") + from app.core.config import get_settings + + monkeypatch.setattr(get_settings(), "forecast_enable_xgboost", True) + + service = BacktestingService() + series = _series(sample_dates_120, sample_values_120, with_exogenous=True) + splitter = TimeSeriesSplitter(sample_split_config_expanding) + + result = service._run_model_backtest( + series_data=series, + splitter=splitter, + model_config=XGBoostModelConfig(), + store_fold_details=True, + ) + + assert result.model_type == "xgboost" + assert result.feature_aware is True + assert len(result.fold_results) > 0 + assert "mae" in result.aggregated_metrics + for fold in result.fold_results: + assert "mae" in fold.metrics + + +def test_prophet_like_feature_aware_backtest_produces_per_fold_metrics( + sample_dates_120: list[date], + sample_values_120: np.ndarray, + sample_split_config_expanding: SplitConfig, +) -> None: + """A prophet_like backtest runs end-to-end and yields per-fold metrics. + + The Prophet-like additive model is feature-aware (pure scikit-learn, no + flag), so it routes through the SAME per-fold feature-aware path as the + regression model — satisfying INITIAL-MLZOO-B's "backtesting integration + test comparing baseline and advanced model path". + """ + service = BacktestingService() + series = _series(sample_dates_120, sample_values_120, with_exogenous=True) + splitter = TimeSeriesSplitter(sample_split_config_expanding) + + result = service._run_model_backtest( + series_data=series, + splitter=splitter, + model_config=ProphetLikeModelConfig(), + store_fold_details=True, + ) + + assert result.model_type == "prophet_like" + assert result.feature_aware is True + assert len(result.fold_results) > 0 + assert "mae" in result.aggregated_metrics + for fold in result.fold_results: + assert "mae" in fold.metrics + assert np.isfinite(fold.metrics["mae"]) + + +def test_feature_aware_result_records_observed_policy( + sample_dates_120: list[date], + sample_values_120: np.ndarray, + sample_split_config_expanding: SplitConfig, +) -> None: + """A feature-aware result is flagged and records the v1 exogenous policy.""" + service = BacktestingService() + series = _series(sample_dates_120, sample_values_120, with_exogenous=True) + splitter = TimeSeriesSplitter(sample_split_config_expanding) + + result = service._run_model_backtest( + series_data=series, + splitter=splitter, + model_config=RegressionModelConfig(), + store_fold_details=True, + ) + + assert result.feature_aware is True + assert result.exogenous_policy == "observed" + + +def test_target_only_result_is_not_flagged_feature_aware( + sample_dates_120: list[date], + sample_values_120: np.ndarray, + sample_split_config_expanding: SplitConfig, +) -> None: + """A target-only baseline keeps feature_aware False and no exogenous policy.""" + service = BacktestingService() + series = _series(sample_dates_120, sample_values_120, with_exogenous=False) + splitter = TimeSeriesSplitter(sample_split_config_expanding) + + result = service._run_model_backtest( + series_data=series, + splitter=splitter, + model_config=NaiveModelConfig(), + store_fold_details=True, + ) + + assert result.feature_aware is False + assert result.exogenous_policy is None + + +def test_feature_aware_backtest_without_exogenous_fails_loud( + sample_dates_120: list[date], + sample_values_120: np.ndarray, + sample_split_config_expanding: SplitConfig, +) -> None: + """A feature-aware model reaching the fold loop with no ExogenousFrame + must fail LOUD — the genuinely-unsupported path (DECISIONS LOCKED #8).""" + service = BacktestingService() + series = _series(sample_dates_120, sample_values_120, with_exogenous=False) + splitter = TimeSeriesSplitter(sample_split_config_expanding) + + with pytest.raises(ValueError, match="ExogenousFrame"): + service._run_model_backtest( + series_data=series, + splitter=splitter, + model_config=RegressionModelConfig(), + store_fold_details=True, + ) + + +def test_feature_aware_backtest_runs_with_a_gap_fold( + sample_dates_120: list[date], + sample_values_120: np.ndarray, + sample_split_config_with_gap: SplitConfig, +) -> None: + """A gap > 0 fold runs — the lag columns drop the gap lead-in correctly.""" + assert sample_split_config_with_gap.gap > 0 + service = BacktestingService() + series = _series(sample_dates_120, sample_values_120, with_exogenous=True) + splitter = TimeSeriesSplitter(sample_split_config_with_gap) + + result = service._run_model_backtest( + series_data=series, + splitter=splitter, + model_config=RegressionModelConfig(), + store_fold_details=True, + ) + + assert result.feature_aware is True + assert len(result.fold_results) > 0 + for fold in result.fold_results: + assert np.isfinite(fold.metrics["mae"]) diff --git a/app/features/backtesting/tests/test_routes_integration.py b/app/features/backtesting/tests/test_routes_integration.py index efe2af33..a0533146 100644 --- a/app/features/backtesting/tests/test_routes_integration.py +++ b/app/features/backtesting/tests/test_routes_integration.py @@ -393,3 +393,85 @@ async def test_response_contains_all_expected_fields( assert "test_end" in split assert "train_size" in split assert "test_size" in split + + +@pytest.mark.integration +@pytest.mark.asyncio +class TestBacktestingRouteFeatureAwareIntegration: + """Integration tests for POST /backtesting/run with a feature-aware model.""" + + async def test_run_backtest_regression_model( + self, + client: AsyncClient, + sample_store: Store, + sample_product: Product, + sample_sales_120: list[SalesDaily], + ) -> None: + """A regression backtest returns 200 with per-fold metrics + baselines.""" + response = await client.post( + "/backtesting/run", + json={ + "store_id": sample_store.id, + "product_id": sample_product.id, + "start_date": "2024-01-01", + "end_date": "2024-04-29", + "config": { + "split_config": { + "strategy": "expanding", + "n_splits": 3, + "min_train_size": 30, + "gap": 0, + "horizon": 14, + }, + "model_config_main": {"model_type": "regression"}, + "include_baselines": True, + "store_fold_details": True, + }, + }, + ) + + assert response.status_code == 200 + data = response.json() + + main = data["main_model_results"] + assert main["model_type"] == "regression" + assert main["feature_aware"] is True + assert main["exogenous_policy"] == "observed" + assert len(main["fold_results"]) > 0 + assert data["leakage_check_passed"] is True + + baseline_types = {b["model_type"] for b in data["baseline_results"]} + assert baseline_types == {"naive", "seasonal_naive"} + + async def test_run_backtest_regression_rejects_small_min_train_size( + self, + client: AsyncClient, + sample_store: Store, + sample_product: Product, + sample_sales_120: list[SalesDaily], + ) -> None: + """A regression backtest with min_train_size < 30 returns RFC 7807 400.""" + response = await client.post( + "/backtesting/run", + json={ + "store_id": sample_store.id, + "product_id": sample_product.id, + "start_date": "2024-01-01", + "end_date": "2024-04-29", + "config": { + "split_config": { + "strategy": "expanding", + "n_splits": 3, + "min_train_size": 20, + "gap": 0, + "horizon": 14, + }, + "model_config_main": {"model_type": "regression"}, + "include_baselines": False, + "store_fold_details": True, + }, + }, + ) + + assert response.status_code == 400 + assert "at least 30" in response.text diff --git a/app/features/backtesting/tests/test_schemas.py b/app/features/backtesting/tests/test_schemas.py index 31eec119..c071fae9 100644 --- a/app/features/backtesting/tests/test_schemas.py +++ b/app/features/backtesting/tests/test_schemas.py @@ -225,6 +225,48 @@ def test_model_backtest_result_creation(self): assert result.model_type == "naive" assert result.aggregated_metrics["mae"] == 5.0 + def test_feature_aware_fields_default_to_target_only(self): + """The MLZOO-B.2 fields default to a target-only result — every existing + construction site (which omits them) stays valid.""" + result = ModelBacktestResult( + model_type="naive", + config_hash="abc123", + fold_results=[], + aggregated_metrics={"mae": 5.0}, + metric_std={"mae_stability": 10.0}, + ) + + assert result.feature_aware is False + assert result.exogenous_policy is None + + def test_feature_aware_fields_accept_explicit_values(self): + """A feature-aware result carries feature_aware=True + the exogenous policy.""" + result = ModelBacktestResult( + model_type="regression", + config_hash="def456", + fold_results=[], + aggregated_metrics={"mae": 3.0}, + metric_std={"mae_stability": 8.0}, + feature_aware=True, + exogenous_policy="observed", + ) + + assert result.feature_aware is True + assert result.exogenous_policy == "observed" + + def test_exogenous_policy_rejects_an_unknown_value(self): + """exogenous_policy is a closed Literal — only 'observed' in v1.""" + with pytest.raises(ValidationError): + ModelBacktestResult( + model_type="regression", + config_hash="def456", + fold_results=[], + aggregated_metrics={"mae": 3.0}, + metric_std={"mae_stability": 8.0}, + feature_aware=True, + exogenous_policy="assumptions", # type: ignore[arg-type] + ) + class TestBacktestRequest: """Tests for BacktestRequest schema.""" diff --git a/app/features/backtesting/tests/test_service.py b/app/features/backtesting/tests/test_service.py index 61dbb220..992ccb80 100644 --- a/app/features/backtesting/tests/test_service.py +++ b/app/features/backtesting/tests/test_service.py @@ -117,6 +117,45 @@ def test_run_model_backtest_without_fold_details( # But metrics should still be present assert fold.metrics is not None + def test_feature_aware_model_fails_loud_in_backtest( + self, + sample_dates_120: list[date], + sample_values_120: np.ndarray, + sample_split_config_expanding: SplitConfig, + ) -> None: + """A feature-aware model must fail LOUD in a backtest, never run silently. + + Repurposed by PRP-MLZOO-B.2 (DECISIONS LOCKED #8): feature-aware + backtesting is now wired, so a ``regression`` backtest SUCCEEDS once + ``run_backtest`` has resolved the exogenous data — the positive + assertions live in ``test_feature_aware_backtest.py``. PRP-29 DECISIONS + LOCKED #7 and PRP-30 DECISIONS LOCKED #6 are therefore superseded. + + What stays loud — and is pinned here — is the genuinely-unsupported + path: a ``requires_features`` model reaching the fold loop with no + ``ExogenousFrame`` loaded on ``series_data`` raises ``ValueError`` + rather than degrading to a silent all-NaN matrix. + """ + from app.features.backtesting.splitter import TimeSeriesSplitter + from app.features.forecasting.schemas import RegressionModelConfig + + service = BacktestingService() + series_data = SeriesData( + dates=sample_dates_120, + values=sample_values_120, + store_id=1, + product_id=1, + ) # NOTE: no ExogenousFrame loaded — the unsupported path. + splitter = TimeSeriesSplitter(sample_split_config_expanding) + + with pytest.raises(ValueError, match="ExogenousFrame"): + service._run_model_backtest( + series_data=series_data, + splitter=splitter, + model_config=RegressionModelConfig(), + store_fold_details=True, + ) + class TestBacktestingServiceBaselineComparisons: """Tests for baseline comparison functionality.""" diff --git a/app/features/backtesting/tests/test_service_integration.py b/app/features/backtesting/tests/test_service_integration.py index d1b0fbd7..81e661ca 100644 --- a/app/features/backtesting/tests/test_service_integration.py +++ b/app/features/backtesting/tests/test_service_integration.py @@ -295,3 +295,99 @@ async def test_backtest_with_gap_produces_correct_splits( actual_gap = (test_start - train_end).days # Gap should be at least gap_days (could be more if data is sparse) assert actual_gap >= gap_days, f"Expected gap >= {gap_days}, got {actual_gap}" + + +@pytest.mark.integration +@pytest.mark.asyncio +class TestBacktestingServiceFeatureAwareIntegration: + """Integration tests for feature-aware backtesting (MLZOO-B.2). + + A ``regression`` model is evaluated end-to-end against the real database: + ``run_backtest`` resolves the exogenous frame, the fold loop builds the + per-fold leakage-safe ``X_train`` / ``X_future``, and the result is + compared head-to-head with the naive / seasonal baselines. + """ + + async def test_regression_backtest_runs_with_baseline_comparison( + self, + db_session: AsyncSession, + sample_store: Store, + sample_product: Product, + sample_sales_120: list[SalesDaily], + ) -> None: + """A regression backtest yields per-fold metrics + a baseline comparison.""" + from app.features.forecasting.schemas import RegressionModelConfig + + service = BacktestingService() + config = BacktestConfig( + split_config=SplitConfig( + strategy="expanding", + n_splits=3, + min_train_size=30, + gap=0, + horizon=14, + ), + model_config_main=RegressionModelConfig(), + include_baselines=True, + store_fold_details=True, + ) + + response = await service.run_backtest( + db=db_session, + store_id=sample_store.id, + product_id=sample_product.id, + start_date=date(2024, 1, 1), + end_date=date(2024, 4, 29), + config=config, + ) + + main = response.main_model_results + assert main.model_type == "regression" + assert main.feature_aware is True + assert main.exogenous_policy == "observed" + assert len(main.fold_results) > 0 + assert "mae" in main.aggregated_metrics + assert response.leakage_check_passed is True + + # Baselines stay target-only and unflagged. + assert response.baseline_results is not None + baseline_types = {b.model_type for b in response.baseline_results} + assert baseline_types == {"naive", "seasonal_naive"} + for baseline in response.baseline_results: + assert baseline.feature_aware is False + assert baseline.exogenous_policy is None + assert response.comparison_summary is not None + + async def test_regression_backtest_rejects_small_min_train_size( + self, + db_session: AsyncSession, + sample_store: Store, + sample_product: Product, + sample_sales_120: list[SalesDaily], + ) -> None: + """A feature-aware backtest with min_train_size < 30 fails loud.""" + from app.features.forecasting.schemas import RegressionModelConfig + + service = BacktestingService() + config = BacktestConfig( + split_config=SplitConfig( + strategy="expanding", + n_splits=3, + min_train_size=20, + gap=0, + horizon=14, + ), + model_config_main=RegressionModelConfig(), + include_baselines=False, + store_fold_details=True, + ) + + with pytest.raises(ValueError, match="at least 30"): + await service.run_backtest( + db=db_session, + store_id=sample_store.id, + product_id=sample_product.id, + start_date=date(2024, 1, 1), + end_date=date(2024, 4, 29), + config=config, + ) diff --git a/app/features/forecasting/models.py b/app/features/forecasting/models.py index ecb510b8..1c43ea32 100644 --- a/app/features/forecasting/models.py +++ b/app/features/forecasting/models.py @@ -14,17 +14,62 @@ from abc import ABC, abstractmethod from dataclasses import dataclass, field from datetime import date as date_type -from typing import TYPE_CHECKING, Any, Literal +from typing import TYPE_CHECKING, Any, ClassVar, Literal import numpy as np from sklearn.ensemble import ( # type: ignore[import-untyped] HistGradientBoostingRegressor, ) +from sklearn.impute import SimpleImputer # type: ignore[import-untyped] +from sklearn.linear_model import Ridge # type: ignore[import-untyped] +from sklearn.pipeline import Pipeline # type: ignore[import-untyped] if TYPE_CHECKING: from app.features.forecasting.schemas import ModelConfig +# Canonical 14-column feature frame partitioned into the three Prophet-style +# additive components. Together the three column tuples cover all 14 canonical +# columns exactly — which is what makes the additive invariant hold (the +# component contributions partition the full coef_ · x sum). See +# ``canonical_feature_columns()`` in ``app/shared/feature_frames``. +_PROPHET_LIKE_COMPONENTS: dict[str, tuple[str, ...]] = { + "trend": ("lag_1", "lag_7", "lag_14", "lag_28", "days_since_launch"), + "seasonality": ( + "dow_sin", + "dow_cos", + "month_sin", + "month_cos", + "is_weekend", + "is_month_end", + ), + "holiday_regressor": ("price_factor", "promo_active", "is_holiday"), +} + + +@dataclass +class ForecastDecomposition: + """Additive component breakdown of a Prophet-like forecast. + + Invariant: ``intercept + trend + seasonality + holiday_regressor`` equals + ``predict(...)`` for the same ``X`` (within float tolerance), element-wise. + Each component array has shape ``[n_rows]`` — one value per forecast row. + + Attributes: + intercept: The fitted Ridge intercept (a scalar, broadcast over rows). + trend: Per-row contribution of the trend columns (autoregressive lags + + ``days_since_launch``). + seasonality: Per-row contribution of the calendar/seasonal columns. + holiday_regressor: Per-row contribution of the holiday + extra-regressor + columns (price, promotion, holiday flag). + """ + + intercept: float + trend: np.ndarray[Any, np.dtype[np.floating[Any]]] + seasonality: np.ndarray[Any, np.dtype[np.floating[Any]]] + holiday_regressor: np.ndarray[Any, np.dtype[np.floating[Any]]] + + @dataclass class FitResult: """Result of model fitting. @@ -57,6 +102,16 @@ class BaseForecaster(ABC): Attributes: random_state: Random seed for reproducibility. + requires_features: True when ``fit``/``predict`` require a non-None + ``X`` feature frame; baseline (target-only) models leave it False. + """ + + requires_features: ClassVar[bool] = False + """True when ``fit()``/``predict()`` REQUIRE a non-None ``X`` feature frame. + + Baseline (target-only) models leave this ``False``; feature-aware models + override it to ``True``. ``ForecastingService`` branches on this flag + rather than an ``isinstance`` check or a ``model_type`` string comparison. """ def __init__(self, random_state: int = 42) -> None: @@ -445,6 +500,9 @@ class RegressionForecaster(BaseForecaster): max_depth: Maximum depth of each tree. """ + requires_features: ClassVar[bool] = True + """A feature-aware model — ``fit``/``predict`` REQUIRE a non-None ``X``.""" + def __init__( self, *, @@ -564,8 +622,503 @@ def set_params(self, **params: Any) -> RegressionForecaster: # noqa: ANN401 return self +class LightGBMForecaster(BaseForecaster): + """Feature-aware forecaster wrapping ``lightgbm.LGBMRegressor``. + + The first ADVANCED feature-aware model (MLZOO-B). Like + ``RegressionForecaster`` it REQUIRES a non-``None`` exogenous ``X`` for both + ``fit`` and ``predict``; unlike it, the estimator is gradient-boosted + leaf-wise trees from the optional ``lightgbm`` package. + + ``lightgbm`` is imported LAZILY inside ``fit`` — never at module scope and + never in ``__init__`` — so importing this module (which every forecasting + code path does, baseline models included) never requires the optional + ``ml-lightgbm`` dependency. + + Determinism: ``LGBMRegressor`` is bit-reproducible only with ``n_jobs=1`` + AND ``deterministic=True`` AND ``force_col_wise=True`` AND a fixed + ``random_state`` — all four are pinned in ``fit``. LightGBM also tolerates + ``NaN`` natively, which matters because the future feature frame leaves lag + cells ``NaN`` when their source target lies in the un-observed horizon. + + Attributes: + n_estimators: Number of boosting rounds. + learning_rate: Gradient-boosting learning rate. + max_depth: Maximum depth of each tree. + """ + + requires_features: ClassVar[bool] = True + """A feature-aware model — ``fit``/``predict`` REQUIRE a non-None ``X``.""" + + def __init__( + self, + *, + n_estimators: int = 100, + learning_rate: float = 0.1, + max_depth: int = 6, + random_state: int = 42, + ) -> None: + """Initialize the LightGBM forecaster. + + Args: + n_estimators: Number of boosting rounds. + learning_rate: Gradient-boosting learning rate. + max_depth: Maximum depth of each tree. + random_state: Random seed for reproducibility (determinism). + """ + super().__init__(random_state) + self.n_estimators = n_estimators + self.learning_rate = learning_rate + self.max_depth = max_depth + self._estimator: Any = None + + def fit( + self, + y: np.ndarray[Any, np.dtype[np.floating[Any]]], + X: np.ndarray[Any, np.dtype[np.floating[Any]]] | None = None, + ) -> LightGBMForecaster: + """Fit the gradient-boosted regressor on historical features. + + Args: + y: Target values (1D array of shape ``[n_samples]``). + X: Exogenous features (2D array of shape ``[n_samples, n_features]``). + REQUIRED — unlike the baseline forecasters. + + Returns: + self (for method chaining). + + Raises: + ValueError: If ``X`` is ``None``, ``y`` is empty, or the row counts + of ``X`` and ``y`` do not match. + """ + if X is None: + raise ValueError("LightGBMForecaster requires exogenous features X for fit()") + if len(y) == 0: + raise ValueError("Cannot fit on empty array") + if X.shape[0] != len(y): + raise ValueError( + f"X has {X.shape[0]} rows but y has {len(y)} — feature/target rows must match" + ) + # LAZY import — the optional ``ml-lightgbm`` dependency is only needed + # the first time a LightGBM model is actually fitted. + import lightgbm as lgb + + estimator: Any = lgb.LGBMRegressor( + n_estimators=self.n_estimators, + learning_rate=self.learning_rate, + max_depth=self.max_depth, + random_state=self.random_state, + n_jobs=1, # \ + deterministic=True, # } all four required for a bit-reproducible fit + force_col_wise=True, # / + verbosity=-1, # silence LightGBM's training chatter + ) + estimator.fit(X, y) + self._estimator = estimator + self._last_values = np.asarray(y[-1:], dtype=np.float64) + self._is_fitted = True + return self + + def predict( + self, + horizon: int, + X: np.ndarray[Any, np.dtype[np.floating[Any]]] | None = None, + ) -> np.ndarray[Any, np.dtype[np.floating[Any]]]: + """Generate forecasts from a future feature frame. + + Args: + horizon: Number of steps to forecast. + X: Exogenous features for the forecast period, shape + ``[horizon, n_features]``. REQUIRED. + + Returns: + Array of forecasts with shape ``[horizon]``. + + Raises: + RuntimeError: If the model has not been fitted. + ValueError: If ``X`` is ``None`` or its row count is not ``horizon``. + """ + if not self._is_fitted or self._estimator is None: + raise RuntimeError("Model must be fitted before predict") + if X is None: + raise ValueError("LightGBMForecaster requires exogenous features X for predict()") + if X.shape[0] != horizon: + raise ValueError(f"X has {X.shape[0]} rows but horizon is {horizon} — they must match") + predictions = self._estimator.predict(X) + result: np.ndarray[Any, np.dtype[np.floating[Any]]] = np.asarray( + predictions, dtype=np.float64 + ) + return result + + def get_params(self) -> dict[str, Any]: + """Get model parameters. + + Returns: + Dictionary with n_estimators, learning_rate, max_depth, random_state. + """ + return { + "n_estimators": self.n_estimators, + "learning_rate": self.learning_rate, + "max_depth": self.max_depth, + "random_state": self.random_state, + } + + def set_params(self, **params: Any) -> LightGBMForecaster: # noqa: ANN401 + """Set model parameters. + + Args: + **params: Parameter names and values to set. + + Returns: + self (for method chaining). + """ + for key, value in params.items(): + setattr(self, key, value) + return self + + +class XGBoostForecaster(BaseForecaster): + """Feature-aware forecaster wrapping ``xgboost.XGBRegressor``. + + The second ADVANCED feature-aware tree model (MLZOO-C1). Structurally a + twin of ``LightGBMForecaster``: it REQUIRES a non-``None`` exogenous ``X`` + for both ``fit`` and ``predict``; the estimator is gradient-boosted trees + from the optional ``xgboost`` package. + + ``xgboost`` is imported LAZILY inside ``fit`` — never at module scope and + never in ``__init__`` — so importing this module (which every forecasting + code path does, baseline models included) never requires the optional + ``ml-xgboost`` dependency. + + Determinism: ``XGBRegressor`` has no ``deterministic`` switch (unlike + LightGBM). Bit-reproducibility comes from ``n_jobs=1`` + ``tree_method="hist"`` + + a fixed ``random_state`` + the conservative config leaving ``subsample`` / + ``colsample_bytree`` at their ``1.0`` defaults (no stochastic sampling) — + all pinned in ``fit``. XGBoost tolerates ``NaN`` natively (``missing=np.nan``), + which matters because the future feature frame leaves lag cells ``NaN`` + when their source target lies in the un-observed horizon. + + Attributes: + n_estimators: Number of boosting rounds. + learning_rate: Gradient-boosting learning rate. + max_depth: Maximum depth of each tree. + """ + + requires_features: ClassVar[bool] = True + """A feature-aware model — ``fit``/``predict`` REQUIRE a non-None ``X``.""" + + def __init__( + self, + *, + n_estimators: int = 100, + learning_rate: float = 0.1, + max_depth: int = 6, + random_state: int = 42, + ) -> None: + """Initialize the XGBoost forecaster. + + Args: + n_estimators: Number of boosting rounds. + learning_rate: Gradient-boosting learning rate. + max_depth: Maximum depth of each tree. + random_state: Random seed for reproducibility (determinism). + """ + super().__init__(random_state) + self.n_estimators = n_estimators + self.learning_rate = learning_rate + self.max_depth = max_depth + self._estimator: Any = None + + def fit( + self, + y: np.ndarray[Any, np.dtype[np.floating[Any]]], + X: np.ndarray[Any, np.dtype[np.floating[Any]]] | None = None, + ) -> XGBoostForecaster: + """Fit the gradient-boosted regressor on historical features. + + Args: + y: Target values (1D array of shape ``[n_samples]``). + X: Exogenous features (2D array of shape ``[n_samples, n_features]``). + REQUIRED — unlike the baseline forecasters. + + Returns: + self (for method chaining). + + Raises: + ValueError: If ``X`` is ``None``, ``y`` is empty, or the row counts + of ``X`` and ``y`` do not match. + """ + if X is None: + raise ValueError("XGBoostForecaster requires exogenous features X for fit()") + if len(y) == 0: + raise ValueError("Cannot fit on empty array") + if X.shape[0] != len(y): + raise ValueError( + f"X has {X.shape[0]} rows but y has {len(y)} — feature/target rows must match" + ) + # LAZY import — the optional ``ml-xgboost`` dependency is only needed + # the first time an XGBoost model is actually fitted. + import xgboost as xgb + + estimator: Any = xgb.XGBRegressor( + n_estimators=self.n_estimators, + learning_rate=self.learning_rate, + max_depth=self.max_depth, + random_state=self.random_state, + n_jobs=1, # single-threaded — removes float-summation non-determinism + tree_method="hist", # explicit; the default, and the reproducible path + verbosity=0, # silence XGBoost's training chatter + ) + estimator.fit(X, y) + self._estimator = estimator + self._last_values = np.asarray(y[-1:], dtype=np.float64) + self._is_fitted = True + return self + + def predict( + self, + horizon: int, + X: np.ndarray[Any, np.dtype[np.floating[Any]]] | None = None, + ) -> np.ndarray[Any, np.dtype[np.floating[Any]]]: + """Generate forecasts from a future feature frame. + + Args: + horizon: Number of steps to forecast. + X: Exogenous features for the forecast period, shape + ``[horizon, n_features]``. REQUIRED. + + Returns: + Array of forecasts with shape ``[horizon]``. + + Raises: + RuntimeError: If the model has not been fitted. + ValueError: If ``X`` is ``None`` or its row count is not ``horizon``. + """ + if not self._is_fitted or self._estimator is None: + raise RuntimeError("Model must be fitted before predict") + if X is None: + raise ValueError("XGBoostForecaster requires exogenous features X for predict()") + if X.shape[0] != horizon: + raise ValueError(f"X has {X.shape[0]} rows but horizon is {horizon} — they must match") + predictions = self._estimator.predict(X) + result: np.ndarray[Any, np.dtype[np.floating[Any]]] = np.asarray( + predictions, dtype=np.float64 + ) + return result + + def get_params(self) -> dict[str, Any]: + """Get model parameters. + + Returns: + Dictionary with n_estimators, learning_rate, max_depth, random_state. + """ + return { + "n_estimators": self.n_estimators, + "learning_rate": self.learning_rate, + "max_depth": self.max_depth, + "random_state": self.random_state, + } + + def set_params(self, **params: Any) -> XGBoostForecaster: # noqa: ANN401 + """Set model parameters. + + Args: + **params: Parameter names and values to set. + + Returns: + self (for method chaining). + """ + for key, value in params.items(): + setattr(self, key, value) + return self + + +class ProphetLikeForecaster(BaseForecaster): + """Feature-aware ADDITIVE forecaster — Ridge over the canonical frame. + + Prophet-LIKE, not Prophet: it approximates Prophet's additive trend + + seasonality + holiday/regressor decomposition with a regularized linear + model over the already-engineered 14-column feature frame. It REQUIRES a + non-``None`` exogenous ``X`` for both ``fit`` and ``predict``. + + The fitted estimator is a scikit-learn ``Pipeline`` of two deterministic + steps: a ``SimpleImputer(strategy="median")`` that fills the ``NaN`` lag + cells the future feature frame emits (a bare ``Ridge`` raises + ``ValueError: Input contains NaN``), followed by a + ``Ridge(solver="cholesky")`` whose closed-form L2-regularized fit is + robust to the collinear engineered columns. Folding the imputer INSIDE the + pipeline keeps the no-leakage invariant: it learns its medians on the + training ``X`` only and re-applies them at predict time. + + ``decompose()`` returns the per-component additive contributions of a + forecast — the literal ``y_hat = intercept + trend + seasonality + + holiday_regressor`` split, computed on the IMPUTED ``X``. + + NOT modelled (deliberately — see PRP-MLZOO-C2 Risks): changepoint trend, + posterior uncertainty intervals, automatic seasonality discovery, + multiplicative seasonality. This is an additive linear approximation, not + the real ``prophet`` package. + + Attributes: + alpha: Ridge L2 regularization strength (0.0 degenerates to OLS). + """ + + requires_features: ClassVar[bool] = True + """A feature-aware model — ``fit``/``predict`` REQUIRE a non-None ``X``.""" + + def __init__(self, *, alpha: float = 1.0, random_state: int = 42) -> None: + """Initialize the Prophet-like additive forecaster. + + Args: + alpha: Ridge L2 regularization strength. The default 1.0 keeps + coefficients robust to the collinear engineered-feature frame. + random_state: Kept for interface parity with the other forecasters; + ``Ridge(solver="cholesky")`` is closed-form and needs no seed. + """ + super().__init__(random_state) + self.alpha = alpha + self._estimator: Any = None + + def fit( + self, + y: np.ndarray[Any, np.dtype[np.floating[Any]]], + X: np.ndarray[Any, np.dtype[np.floating[Any]]] | None = None, + ) -> ProphetLikeForecaster: + """Fit the additive Ridge pipeline on historical features. + + Args: + y: Target values (1D array of shape ``[n_samples]``). + X: Exogenous features (2D array of shape ``[n_samples, n_features]``). + REQUIRED — unlike the baseline forecasters. + + Returns: + self (for method chaining). + + Raises: + ValueError: If ``X`` is ``None``, ``y`` is empty, or the row counts + of ``X`` and ``y`` do not match. + """ + if X is None: + raise ValueError("ProphetLikeForecaster requires exogenous features X for fit()") + if len(y) == 0: + raise ValueError("Cannot fit on empty array") + if X.shape[0] != len(y): + raise ValueError( + f"X has {X.shape[0]} rows but y has {len(y)} — feature/target rows must match" + ) + # The imputer learns its per-column medians on THIS training X only; + # the Ridge solver is deterministic and closed-form. + estimator: Any = Pipeline( + [ + ("impute", SimpleImputer(strategy="median")), + ("ridge", Ridge(alpha=self.alpha, solver="cholesky")), + ] + ) + estimator.fit(X, y) + self._estimator = estimator + self._last_values = np.asarray(y[-1:], dtype=np.float64) + self._is_fitted = True + return self + + def predict( + self, + horizon: int, + X: np.ndarray[Any, np.dtype[np.floating[Any]]] | None = None, + ) -> np.ndarray[Any, np.dtype[np.floating[Any]]]: + """Generate forecasts from a future feature frame. + + Args: + horizon: Number of steps to forecast. + X: Exogenous features for the forecast period, shape + ``[horizon, n_features]``. REQUIRED. + + Returns: + Array of forecasts with shape ``[horizon]``. + + Raises: + RuntimeError: If the model has not been fitted. + ValueError: If ``X`` is ``None`` or its row count is not ``horizon``. + """ + if not self._is_fitted or self._estimator is None: + raise RuntimeError("Model must be fitted before predict") + if X is None: + raise ValueError("ProphetLikeForecaster requires exogenous features X for predict()") + if X.shape[0] != horizon: + raise ValueError(f"X has {X.shape[0]} rows but horizon is {horizon} — they must match") + # The Pipeline imputes the NaN lag cells, then the Ridge predicts. + predictions = self._estimator.predict(X) + result: np.ndarray[Any, np.dtype[np.floating[Any]]] = np.asarray( + predictions, dtype=np.float64 + ) + return result + + def decompose(self, X: np.ndarray[Any, np.dtype[np.floating[Any]]]) -> ForecastDecomposition: + """Split a forecast into its additive trend / seasonality / regressor parts. + + Operates on the IMPUTED ``X`` — the trained imputer's ``transform`` — + so the per-component contributions sum EXACTLY to ``predict(...)``: any + ``NaN`` cell is filled with the TRAINING-window median, never a + predict-time median (no leakage). Each component contribution is the + partial sum ``Σ_{i ∈ component} coef_i · x_i``; together the three + component column-sets partition all 14 canonical columns, so + ``intercept + trend + seasonality + holiday_regressor == predict()``. + + Args: + X: Feature matrix of shape ``[n_rows, n_features]`` (the same frame + a ``predict`` call would consume). May contain ``NaN`` cells. + + Returns: + A :class:`ForecastDecomposition` with the four-way breakdown. + + Raises: + RuntimeError: If the model has not been fitted. + """ + from app.shared.feature_frames import canonical_feature_columns + + if not self._is_fitted or self._estimator is None: + raise RuntimeError("Model must be fitted before decompose") + imputer = self._estimator.named_steps["impute"] + ridge = self._estimator.named_steps["ridge"] + x_imputed = imputer.transform(X) + columns = canonical_feature_columns() + coef = np.asarray(ridge.coef_, dtype=np.float64) + contributions: dict[str, np.ndarray[Any, np.dtype[np.floating[Any]]]] = {} + for component, comp_cols in _PROPHET_LIKE_COMPONENTS.items(): + idx = [columns.index(c) for c in comp_cols] + contributions[component] = np.asarray(x_imputed[:, idx] @ coef[idx], dtype=np.float64) + return ForecastDecomposition( + intercept=float(ridge.intercept_), + trend=contributions["trend"], + seasonality=contributions["seasonality"], + holiday_regressor=contributions["holiday_regressor"], + ) + + def get_params(self) -> dict[str, Any]: + """Get model parameters. + + Returns: + Dictionary with alpha and random_state. + """ + return {"alpha": self.alpha, "random_state": self.random_state} + + def set_params(self, **params: Any) -> ProphetLikeForecaster: # noqa: ANN401 + """Set model parameters. + + Args: + **params: Parameter names and values to set. + + Returns: + self (for method chaining). + """ + for key, value in params.items(): + setattr(self, key, value) + return self + + # Type alias for model type literals -ModelType = Literal["naive", "seasonal_naive", "moving_average", "lightgbm", "regression"] +ModelType = Literal[ + "naive", "seasonal_naive", "moving_average", "xgboost", "lightgbm", "regression", "prophet_like" +] def model_factory(config: ModelConfig, random_state: int = 42) -> BaseForecaster: @@ -612,8 +1165,31 @@ def model_factory(config: ModelConfig, random_state: int = 42) -> BaseForecaster raise ValueError( "LightGBM is not enabled. Set forecast_enable_lightgbm=True in settings." ) - # LightGBM implementation would go here when feature-flagged - raise NotImplementedError("LightGBM forecaster not yet implemented") + from app.features.forecasting.schemas import LightGBMModelConfig + + if isinstance(config, LightGBMModelConfig): + return LightGBMForecaster( + n_estimators=config.n_estimators, + learning_rate=config.learning_rate, + max_depth=config.max_depth, + random_state=random_state, + ) + raise ValueError("Invalid config type for lightgbm") + elif model_type == "xgboost": + if not settings.forecast_enable_xgboost: + raise ValueError( + "XGBoost is not enabled. Set forecast_enable_xgboost=True in settings." + ) + from app.features.forecasting.schemas import XGBoostModelConfig + + if isinstance(config, XGBoostModelConfig): + return XGBoostForecaster( + n_estimators=config.n_estimators, + learning_rate=config.learning_rate, + max_depth=config.max_depth, + random_state=random_state, + ) + raise ValueError("Invalid config type for xgboost") elif model_type == "regression": from app.features.forecasting.schemas import RegressionModelConfig @@ -625,5 +1201,13 @@ def model_factory(config: ModelConfig, random_state: int = 42) -> BaseForecaster random_state=random_state, ) raise ValueError("Invalid config type for regression") + elif model_type == "prophet_like": + # No flag gate — the Prophet-like model is pure scikit-learn and ships + # always-enabled, exactly like ``regression``. + from app.features.forecasting.schemas import ProphetLikeModelConfig + + if isinstance(config, ProphetLikeModelConfig): + return ProphetLikeForecaster(alpha=config.alpha, random_state=random_state) + raise ValueError("Invalid config type for prophet_like") else: raise ValueError(f"Unknown model type: {model_type}") diff --git a/app/features/forecasting/persistence.py b/app/features/forecasting/persistence.py index 9f7a9563..e5b055e1 100644 --- a/app/features/forecasting/persistence.py +++ b/app/features/forecasting/persistence.py @@ -40,6 +40,10 @@ class ModelBundle: created_at: Timestamp when bundle was created. python_version: Python version used when saving. sklearn_version: Scikit-learn version used when saving. + lightgbm_version: LightGBM version used when saving, ``None`` when the + optional ``ml-lightgbm`` dependency was not installed. + xgboost_version: XGBoost version used when saving, ``None`` when the + optional ``ml-xgboost`` dependency was not installed. bundle_hash: Deterministic hash of bundle contents. """ @@ -51,6 +55,8 @@ class ModelBundle: created_at: datetime | None = None python_version: str | None = None sklearn_version: str | None = None + lightgbm_version: str | None = None + xgboost_version: str | None = None bundle_hash: str | None = None def compute_hash(self) -> str: @@ -95,6 +101,22 @@ def save_model_bundle(bundle: ModelBundle, path: str | Path) -> Path: bundle.created_at = datetime.now(UTC) bundle.python_version = sys.version bundle.sklearn_version = sklearn.__version__ + # Best-effort: LightGBM is an optional dependency, so a baseline-only + # install legitimately has no version to record. + try: + import lightgbm + + bundle.lightgbm_version = str(lightgbm.__version__) + except ImportError: + bundle.lightgbm_version = None + # Best-effort: XGBoost is an optional dependency, so a baseline-only + # install legitimately has no version to record. + try: + import xgboost + + bundle.xgboost_version = str(xgboost.__version__) + except ImportError: + bundle.xgboost_version = None bundle.bundle_hash = bundle.compute_hash() # Save with compression @@ -171,6 +193,38 @@ def load_model_bundle(path: str | Path, base_dir: str | Path | None = None) -> M current_sklearn=sklearn.__version__, ) + # LightGBM is optional — only warn when the bundle recorded a version AND + # the optional dependency is importable here AND the two differ. + if bundle.lightgbm_version: + try: + import lightgbm + + current_lightgbm: str | None = str(lightgbm.__version__) + except ImportError: + current_lightgbm = None + if current_lightgbm is not None and bundle.lightgbm_version != current_lightgbm: + logger.warning( + "forecasting.lightgbm_version_mismatch", + saved_lightgbm=bundle.lightgbm_version, + current_lightgbm=current_lightgbm, + ) + + # XGBoost is optional — only warn when the bundle recorded a version AND + # the optional dependency is importable here AND the two differ. + if bundle.xgboost_version: + try: + import xgboost + + current_xgboost: str | None = str(xgboost.__version__) + except ImportError: + current_xgboost = None + if current_xgboost is not None and bundle.xgboost_version != current_xgboost: + logger.warning( + "forecasting.xgboost_version_mismatch", + saved_xgboost=bundle.xgboost_version, + current_xgboost=current_xgboost, + ) + logger.info( "forecasting.model_bundle_loaded", path=str(path), diff --git a/app/features/forecasting/routes.py b/app/features/forecasting/routes.py index f9fbf007..9d84d003 100644 --- a/app/features/forecasting/routes.py +++ b/app/features/forecasting/routes.py @@ -71,6 +71,13 @@ async def train_model( detail="LightGBM is disabled. Set forecast_enable_lightgbm=True in settings.", ) + # Check if XGBoost is enabled + if request.config.model_type == "xgboost" and not settings.forecast_enable_xgboost: + raise HTTPException( + status_code=status.HTTP_400_BAD_REQUEST, + detail="XGBoost is disabled. Set forecast_enable_xgboost=True in settings.", + ) + logger.info( "forecasting.train_request_received", store_id=request.store_id, diff --git a/app/features/forecasting/schemas.py b/app/features/forecasting/schemas.py index b019529c..9219ab21 100644 --- a/app/features/forecasting/schemas.py +++ b/app/features/forecasting/schemas.py @@ -144,6 +144,49 @@ class LightGBMModelConfig(ModelConfigBase): ) +class XGBoostModelConfig(ModelConfigBase): + """Configuration for the XGBoost regressor (feature-flagged). + + XGBoost is an advanced, feature-aware gradient-boosted-tree model. Like + ``LightGBMModelConfig`` the field set is deliberately conservative — + ``n_estimators`` / ``max_depth`` / ``learning_rate`` only — so the schema + surface stays small and training stays deterministic (no stochastic + subsampling). + + CRITICAL: Only available when forecast_enable_xgboost=True in settings. + + Attributes: + n_estimators: Number of boosting rounds. + max_depth: Maximum depth of trees. + learning_rate: Learning rate for gradient boosting. + feature_config_hash: Hash of FeatureSetConfig used for training. + """ + + model_type: Literal["xgboost"] = "xgboost" + n_estimators: int = Field( + default=100, + ge=10, + le=1000, + description="Number of boosting rounds", + ) + max_depth: int = Field( + default=6, + ge=1, + le=20, + description="Maximum depth of trees", + ) + learning_rate: float = Field( + default=0.1, + ge=0.001, + le=1.0, + description="Learning rate for gradient boosting", + ) + feature_config_hash: str | None = Field( + default=None, + description="Hash of FeatureSetConfig used for training", + ) + + class RegressionModelConfig(ModelConfigBase): """Configuration for the exogenous-regressor forecaster (PRP-27). @@ -189,13 +232,46 @@ class RegressionModelConfig(ModelConfigBase): ) +class ProphetLikeModelConfig(ModelConfigBase): + """Configuration for the Prophet-like additive forecaster (MLZOO-C2). + + A deterministic, regularized ADDITIVE linear model — a ``Ridge`` regressor + over the canonical 14-column feature frame — that decomposes demand into + trend / seasonality / holiday-regressor components. It approximates + Prophet's additive shape WITHOUT the real ``prophet``/Stan dependency: it + does not model changepoint trend, posterior uncertainty, or automatic + seasonality discovery. Pure scikit-learn — no optional dependency, no + feature flag, always available (like ``RegressionModelConfig``). + + Attributes: + alpha: Ridge L2 regularization strength. 0.0 degenerates to ordinary + least squares; the default 1.0 keeps coefficients robust to the + collinear engineered-feature frame. + feature_config_hash: Optional hash of the feature contract used. + """ + + model_type: Literal["prophet_like"] = "prophet_like" + alpha: float = Field( + default=1.0, + ge=0.0, + le=10000.0, + description="Ridge L2 regularization strength", + ) + feature_config_hash: str | None = Field( + default=None, + description="Hash of the feature contract used for training", + ) + + # Union type for all model configs ModelConfig = ( NaiveModelConfig | SeasonalNaiveModelConfig | MovingAverageModelConfig | LightGBMModelConfig + | XGBoostModelConfig | RegressionModelConfig + | ProphetLikeModelConfig ) diff --git a/app/features/forecasting/service.py b/app/features/forecasting/service.py index 839f1475..ca706803 100644 --- a/app/features/forecasting/service.py +++ b/app/features/forecasting/service.py @@ -11,7 +11,6 @@ from __future__ import annotations -import math import time import uuid from dataclasses import dataclass, field @@ -39,6 +38,11 @@ PredictResponse, TrainResponse, ) +from app.shared.feature_frames import ( + HISTORY_TAIL_DAYS, + build_historical_feature_rows, + canonical_feature_columns, +) if TYPE_CHECKING: pass @@ -72,31 +76,12 @@ def __post_init__(self) -> None: # Minimum observed rows required to train a regression model — enough to # resolve the lag features and still leave training signal (PRP-27 GOTCHA #14). _MIN_REGRESSION_TRAIN_ROWS = 30 -# Observed-target tail persisted in the bundle so the scenario future-frame -# generator can resolve long lags (PRP-27 DECISIONS LOCKED #11 — 90 days). -_REGRESSION_HISTORY_TAIL_DAYS = 90 -# Target lag offsets — PRP-27 DECISIONS LOCKED #10 (EXOGENOUS_LAGS). -_REGRESSION_LAGS: tuple[int, ...] = (1, 7, 14, 28) -# Canonical regression feature columns — a PAIRED CONTRACT with -# ``app/features/scenarios/feature_frame.canonical_feature_columns()``. The -# scenarios slice owns the future-frame generator; this slice owns training. -# A cross-slice import is forbidden (AGENTS.md § Architecture, PRP-27 -# DECISIONS LOCKED #3), so the column names and order are replicated here and -# kept in lock-step by the scenarios integration test (a column mismatch -# surfaces as a non-zero delta on an empty-assumption simulation). -_REGRESSION_FEATURE_COLUMNS: list[str] = [ - *(f"lag_{lag}" for lag in _REGRESSION_LAGS), - "dow_sin", - "dow_cos", - "month_sin", - "month_cos", - "is_weekend", - "is_month_end", - "price_factor", - "promo_active", - "is_holiday", - "days_since_launch", -] +# The regression feature-frame contract — the lag offsets (``EXOGENOUS_LAGS``), +# the observed-target tail length (``HISTORY_TAIL_DAYS``), and the canonical +# column set and order (``canonical_feature_columns()``) — is the single source +# of truth in ``app/shared/feature_frames`` (MLZOO-A). This slice imports it +# rather than re-typing it, so a column-order mismatch with the scenarios +# slice's future-frame generator is structurally impossible. @dataclass @@ -107,8 +92,8 @@ class RegressionFeatureMatrix: X: Feature matrix, shape ``[n_observations, n_features]`` (NaN allowed). y: Target values, shape ``[n_observations]``. feature_columns: Column order — persisted so the future frame matches. - history_tail: The last ``_REGRESSION_HISTORY_TAIL_DAYS`` observed - targets, ending at the forecast origin ``T``. + history_tail: The last ``HISTORY_TAIL_DAYS`` observed targets, ending + at the forecast origin ``T``. history_tail_dates: ISO dates aligned with ``history_tail``. launch_date_iso: The product launch date (ISO) or ``None``. n_observations: Number of training rows. @@ -123,6 +108,59 @@ class RegressionFeatureMatrix: n_observations: int +def _assemble_regression_rows( + *, + dates: list[date_type], + quantities: list[float], + prices: list[float], + baseline_price: float, + promo_dates: set[date_type], + holiday_dates: set[date_type], + launch_date: date_type | None, +) -> list[list[float]]: + """Assemble the historical regression feature matrix — pure, leakage-safe. + + Time-safe by construction: every lag column at row ``i`` reads only the + observed target at ``i - lag`` (a strictly earlier day); calendar columns + are pure functions of the date; ``price_factor`` / ``promo_active`` / + ``is_holiday`` / ``days_since_launch`` read the same-day exogenous + attributes. No row reads a future observation. + + Column order is ``canonical_feature_columns()`` exactly: the target lags, + then the calendar columns, then ``price_factor``, ``promo_active``, + ``is_holiday``, ``days_since_launch``. + + Delegating shim (MLZOO-B.2): the row-assembly body was promoted verbatim + to :func:`app.shared.feature_frames.build_historical_feature_rows` so the + ``backtesting`` slice can reuse it without a forbidden cross-slice import. + This wrapper keeps the name and signature byte-stable, so the load-bearing + leakage spec (``test_regression_features_leakage.py``) imports it unchanged. + + Args: + dates: Observed days in chronological order. + quantities: Observed target values aligned with ``dates``. + prices: Observed unit prices aligned with ``dates``. + baseline_price: The typical price; ``price_factor`` is the ratio to it. + promo_dates: Days a promotion covered. + holiday_dates: Calendar holiday days. + launch_date: The product's launch date, or ``None``. + + Returns: + Row-major feature matrix ``[n_observations][n_features]``; ``NaN`` marks + a lag whose source day precedes the series, and ``days_since_launch`` + when the product has no launch date. + """ + return build_historical_feature_rows( + dates=dates, + quantities=quantities, + prices=prices, + baseline_price=baseline_price, + promo_dates=promo_dates, + holiday_dates=holiday_dates, + launch_date=launch_date, + ) + + class ForecastingService: """Service for training and predicting with forecasting models. @@ -176,11 +214,13 @@ async def train_model( config_hash=config.config_hash(), ) - # Build the model + bundle metadata. The regression path consumes a - # historical feature matrix; every other model trains on the raw - # target series exactly as before. + # Build the model first (cheap — no fit), then branch on its capability + # rather than on a ``model_type`` string. A feature-aware model + # (``requires_features``) consumes a historical feature matrix; every + # target-only model trains on the raw target series exactly as before. + model = model_factory(config, random_state=self.settings.forecast_random_seed) extra_metadata: dict[str, object] = {} - if config.model_type == "regression": + if model.requires_features: features = await self._build_regression_features( db=db, store_id=store_id, @@ -188,7 +228,6 @@ async def train_model( start_date=train_start_date, end_date=train_end_date, ) - model = model_factory(config, random_state=self.settings.forecast_random_seed) model.fit(features.y, features.X) n_observations = features.n_observations extra_metadata = { @@ -210,7 +249,6 @@ async def train_model( f"No training data found for store={store_id}, product={product_id} " f"between {train_start_date} and {train_end_date}" ) - model = model_factory(config, random_state=self.settings.forecast_random_seed) model.fit(training_data.y) n_observations = training_data.n_observations @@ -342,14 +380,16 @@ async def predict( f"but prediction requested for product={product_id}" ) - # Regression models need an exogenous feature frame to forecast — that - # is built (from scenario assumptions) by POST /scenarios/simulate. The - # plain predict endpoint cannot supply one, so it rejects them cleanly. - if bundle.config.model_type == "regression": + # Feature-aware models need an exogenous feature frame to forecast — + # that is built (from scenario assumptions) by POST /scenarios/simulate. + # The plain predict endpoint cannot supply one, so it rejects them + # cleanly. Branching on ``requires_features`` (not a ``model_type`` + # string) keeps this future-proof as the model zoo grows. + if bundle.model.requires_features: raise ValueError( - "Regression models forecast through POST /scenarios/simulate, " + "Feature-aware models forecast through POST /scenarios/simulate, " "which supplies the exogenous feature frame. POST /forecasting/" - "predict does not support model_type='regression'." + "predict does not support them." ) # Generate forecasts @@ -467,8 +507,11 @@ async def _build_regression_features( ``promo_active`` / ``is_holiday`` / ``days_since_launch`` read the same-day exogenous attributes. No row reads a future observation. - The column set is the paired contract with the scenarios slice's - future-frame generator (see ``_REGRESSION_FEATURE_COLUMNS``). + The column set and order are ``canonical_feature_columns()`` from + ``app/shared/feature_frames`` — the single source of truth shared with + the scenarios slice's future-frame generator. The pure row assembly is + factored into :func:`_assemble_regression_rows` (unit-tested for + leakage without a database). Args: db: Database session. @@ -550,44 +593,32 @@ async def _build_regression_features( select(Product.launch_date).where(Product.id == product_id) ) - feature_rows: list[list[float]] = [] - for index, day in enumerate(dates): - row_values: list[float] = [] - # Target long-lag columns — read only strictly-earlier observations. - for lag in _REGRESSION_LAGS: - row_values.append(quantities[index - lag] if index >= lag else math.nan) - # Calendar columns — pure functions of the date. - dow = day.weekday() - row_values.append(math.sin(2.0 * math.pi * dow / 7.0)) - row_values.append(math.cos(2.0 * math.pi * dow / 7.0)) - row_values.append(math.sin(2.0 * math.pi * day.month / 12.0)) - row_values.append(math.cos(2.0 * math.pi * day.month / 12.0)) - row_values.append(1.0 if dow >= 5 else 0.0) - row_values.append(1.0 if (day + timedelta(days=1)).month != day.month else 0.0) - # Exogenous columns — same-day attributes. - row_values.append(prices[index] / baseline_price) - row_values.append(1.0 if day in promo_dates else 0.0) - row_values.append(1.0 if day in holiday_dates else 0.0) - row_values.append( - float((day - launch_date).days) if launch_date is not None else math.nan - ) - feature_rows.append(row_values) + feature_columns = canonical_feature_columns() + feature_rows = _assemble_regression_rows( + dates=dates, + quantities=quantities, + prices=prices, + baseline_price=baseline_price, + promo_dates=promo_dates, + holiday_dates=holiday_dates, + launch_date=launch_date, + ) - tail = quantities[-_REGRESSION_HISTORY_TAIL_DAYS:] - tail_dates = [day.isoformat() for day in dates[-_REGRESSION_HISTORY_TAIL_DAYS:]] + tail = quantities[-HISTORY_TAIL_DAYS:] + tail_dates = [day.isoformat() for day in dates[-HISTORY_TAIL_DAYS:]] logger.info( "forecasting.regression_features_built", store_id=store_id, product_id=product_id, n_observations=len(dates), - n_features=len(_REGRESSION_FEATURE_COLUMNS), + n_features=len(feature_columns), ) return RegressionFeatureMatrix( X=np.array(feature_rows, dtype=np.float64), y=np.array(quantities, dtype=np.float64), - feature_columns=list(_REGRESSION_FEATURE_COLUMNS), + feature_columns=feature_columns, history_tail=[float(value) for value in tail], history_tail_dates=tail_dates, launch_date_iso=launch_date.isoformat() if launch_date is not None else None, diff --git a/app/features/forecasting/tests/test_lightgbm_forecaster.py b/app/features/forecasting/tests/test_lightgbm_forecaster.py new file mode 100644 index 00000000..e915b394 --- /dev/null +++ b/app/features/forecasting/tests/test_lightgbm_forecaster.py @@ -0,0 +1,140 @@ +"""Unit tests for ``LightGBMForecaster`` (PRP-30 / MLZOO-B). + +The LightGBM forecaster is the first ADVANCED feature-aware model. Like +``RegressionForecaster`` it *consumes* the exogenous ``X`` argument, so these +tests mirror that contract: ``X`` is required, its shape is validated, fits are +deterministic, and ``NaN`` features are tolerated (LightGBM handles missing +values natively). + +The whole module SKIPs (never ERRORs) when the optional ``ml-lightgbm`` +dependency is absent — ``pytest.importorskip``. Importing ``LightGBMForecaster`` +itself is leak-free (``lightgbm`` is imported lazily inside ``fit``), so the +class import sits with the other module imports; the ``importorskip`` guard +fires only because every test below actually fits a model. +""" + +from __future__ import annotations + +from typing import Any +from unittest.mock import MagicMock, patch + +import numpy as np +import pytest + +from app.features.forecasting.models import LightGBMForecaster, model_factory +from app.features.forecasting.schemas import LightGBMModelConfig + +pytest.importorskip("lightgbm") + +FloatArray = np.ndarray[Any, np.dtype[np.floating[Any]]] + + +def _synthetic_data( + n: int = 120, n_features: int = 6, seed: int = 0 +) -> tuple[FloatArray, FloatArray]: + """Build a synthetic feature matrix and a target that depends on it.""" + rng = np.random.default_rng(seed) + features = rng.normal(size=(n, n_features)) + target = 50.0 + 5.0 * features[:, 0] - 3.0 * features[:, 1] + rng.normal(scale=0.5, size=n) + return features.astype(np.float64), target.astype(np.float64) + + +def test_fit_predict_roundtrip() -> None: + """A fitted LightGBM model produces a finite forecast of horizon length.""" + features, target = _synthetic_data() + model = LightGBMForecaster() + model.fit(target, features) + assert model.is_fitted + + horizon = 10 + predictions = model.predict(horizon, features[:horizon]) + assert predictions.shape == (horizon,) + assert bool(np.all(np.isfinite(predictions))) + + +def test_fit_rejects_none_features() -> None: + """``fit`` raises when no exogenous features are supplied.""" + _, target = _synthetic_data() + with pytest.raises(ValueError, match="requires exogenous features"): + LightGBMForecaster().fit(target, None) + + +def test_fit_rejects_mismatched_rows() -> None: + """``fit`` raises when feature and target row counts differ.""" + features, target = _synthetic_data() + with pytest.raises(ValueError, match="rows must match"): + LightGBMForecaster().fit(target, features[:-5]) + + +def test_predict_rejects_none_features() -> None: + """``predict`` raises when no exogenous features are supplied.""" + features, target = _synthetic_data() + model = LightGBMForecaster().fit(target, features) + with pytest.raises(ValueError, match="requires exogenous features"): + model.predict(5, None) + + +def test_predict_rejects_wrong_shape_features() -> None: + """``predict`` raises when the feature row count is not the horizon.""" + features, target = _synthetic_data() + model = LightGBMForecaster().fit(target, features) + with pytest.raises(ValueError, match="horizon"): + model.predict(5, features[:8]) + + +def test_predict_before_fit_raises() -> None: + """``predict`` raises a RuntimeError before the model is fitted.""" + model = LightGBMForecaster() + with pytest.raises(RuntimeError, match="fitted"): + model.predict(5, np.zeros((5, 3), dtype=np.float64)) + + +def test_determinism_same_random_state() -> None: + """Two fits with the same random_state yield identical forecasts. + + LightGBM is bit-reproducible only with ``n_jobs=1`` + ``deterministic=True`` + + ``force_col_wise=True`` — all pinned in ``fit`` — so an EXACT + ``assert_array_equal`` is the correct gate. + """ + features, target = _synthetic_data() + future = features[:12] + first = LightGBMForecaster(random_state=7).fit(target, features) + second = LightGBMForecaster(random_state=7).fit(target, features) + np.testing.assert_array_equal(first.predict(12, future), second.predict(12, future)) + + +def test_handles_nan_features() -> None: + """``LGBMRegressor`` tolerates NaN feature cells natively.""" + features, target = _synthetic_data() + model = LightGBMForecaster().fit(target, features) + future = features[:6].copy() + future[2, 0] = np.nan # the future frame emits NaN for un-resolvable lags + predictions = model.predict(6, future) + assert bool(np.all(np.isfinite(predictions))) + + +def test_get_and_set_params() -> None: + """``get_params`` reflects construction; ``set_params`` mutates in place.""" + model = LightGBMForecaster(n_estimators=150, learning_rate=0.03, max_depth=4) + params = model.get_params() + assert params["n_estimators"] == 150 + assert params["learning_rate"] == 0.03 + assert params["max_depth"] == 4 + model.set_params(max_depth=9) + assert model.max_depth == 9 + + +def test_requires_features_is_true() -> None: + """LightGBM is a feature-aware model — the ClassVar is True.""" + assert LightGBMForecaster.requires_features is True + + +def test_model_factory_creates_lightgbm_forecaster() -> None: + """``model_factory`` dispatches a LightGBMModelConfig when the flag is on.""" + enabled = MagicMock() + enabled.forecast_enable_lightgbm = True + with patch("app.core.config.get_settings", return_value=enabled): + model = model_factory(LightGBMModelConfig(n_estimators=120), random_state=42) + assert isinstance(model, LightGBMForecaster) + assert model.n_estimators == 120 + assert model.random_state == 42 diff --git a/app/features/forecasting/tests/test_persistence.py b/app/features/forecasting/tests/test_persistence.py index 7614e45f..9dbae07a 100644 --- a/app/features/forecasting/tests/test_persistence.py +++ b/app/features/forecasting/tests/test_persistence.py @@ -118,6 +118,52 @@ def test_save_adds_metadata(self, sample_naive_config, sample_time_series, tmp_m assert bundle.sklearn_version is not None assert bundle.bundle_hash is not None + def test_lightgbm_version_recorded( + self, sample_naive_config, sample_time_series, tmp_model_path + ): + """save_model_bundle records the LightGBM version best-effort (PRP-30). + + The version is captured regardless of model type whenever the optional + ``ml-lightgbm`` dependency is importable — here a baseline naive bundle. + """ + pytest.importorskip("lightgbm") + model = NaiveForecaster() + model.fit(sample_time_series) + + bundle = ModelBundle( + model=model, + config=sample_naive_config, + metadata={}, + ) + + save_model_bundle(bundle, tmp_model_path) + + assert isinstance(bundle.lightgbm_version, str) + assert bundle.lightgbm_version + + def test_xgboost_version_recorded( + self, sample_naive_config, sample_time_series, tmp_model_path + ): + """save_model_bundle records the XGBoost version best-effort (PRP-MLZOO-C1). + + The version is captured regardless of model type whenever the optional + ``ml-xgboost`` dependency is importable — here a baseline naive bundle. + """ + pytest.importorskip("xgboost") + model = NaiveForecaster() + model.fit(sample_time_series) + + bundle = ModelBundle( + model=model, + config=sample_naive_config, + metadata={}, + ) + + save_model_bundle(bundle, tmp_model_path) + + assert isinstance(bundle.xgboost_version, str) + assert bundle.xgboost_version + def test_save_creates_directory(self, sample_naive_config, sample_time_series): """Test that save creates parent directories if needed.""" with TemporaryDirectory() as tmpdir: diff --git a/app/features/forecasting/tests/test_prophet_like_forecaster.py b/app/features/forecasting/tests/test_prophet_like_forecaster.py new file mode 100644 index 00000000..16f58805 --- /dev/null +++ b/app/features/forecasting/tests/test_prophet_like_forecaster.py @@ -0,0 +1,218 @@ +"""Unit tests for ``ProphetLikeForecaster`` (PRP-MLZOO-C2). + +The Prophet-like forecaster is a deterministic, regularized ADDITIVE linear +model — a ``Ridge`` over the canonical 14-column feature frame, fronted by a +``SimpleImputer`` so the ``NaN`` lag cells the future frame emits do not raise. + +These tests cover the shared feature-aware contract (``X`` required, shape +validated, deterministic fits) PLUS the model-specific invariants the tree +models do not have: the additive decomposition invariant, NaN tolerance via +the imputer, and the imputer's leakage-safety (medians learned on train ``X`` +only). Pure scikit-learn — no ``importorskip``, this file always runs. +""" + +from __future__ import annotations + +from typing import Any + +import numpy as np +import pytest + +from app.features.forecasting.models import ProphetLikeForecaster, model_factory +from app.features.forecasting.schemas import ProphetLikeModelConfig +from app.shared.feature_frames import canonical_feature_columns + +FloatArray = np.ndarray[Any, np.dtype[np.floating[Any]]] + +# The canonical contract is exactly 14 wide — the decompose() component +# grouping partitions these 14 columns, so the synthetic frame must match. +_N_FEATURES = len(canonical_feature_columns()) # 14 + + +def _synthetic_data( + n: int = 120, n_features: int = _N_FEATURES, seed: int = 0 +) -> tuple[FloatArray, FloatArray]: + """Build a synthetic feature matrix and a target that depends on it. + + Defaults to the canonical 14-column width so the decomposition tests line + up with ``canonical_feature_columns()``. + """ + rng = np.random.default_rng(seed) + features = rng.normal(size=(n, n_features)) + target = 50.0 + 5.0 * features[:, 0] - 3.0 * features[:, 1] + rng.normal(scale=0.5, size=n) + return features.astype(np.float64), target.astype(np.float64) + + +# --------------------------------------------------------------------------- +# Shared feature-aware contract tests +# --------------------------------------------------------------------------- + + +def test_fit_predict_roundtrip() -> None: + """A fitted Prophet-like model produces a finite forecast of horizon length.""" + features, target = _synthetic_data() + model = ProphetLikeForecaster() + model.fit(target, features) + assert model.is_fitted + + horizon = 10 + predictions = model.predict(horizon, features[:horizon]) + assert predictions.shape == (horizon,) + assert bool(np.all(np.isfinite(predictions))) + + +def test_fit_rejects_none_features() -> None: + """``fit`` raises when no exogenous features are supplied.""" + _, target = _synthetic_data() + with pytest.raises(ValueError, match="requires exogenous features"): + ProphetLikeForecaster().fit(target, None) + + +def test_fit_rejects_mismatched_rows() -> None: + """``fit`` raises when feature and target row counts differ.""" + features, target = _synthetic_data() + with pytest.raises(ValueError, match="rows must match"): + ProphetLikeForecaster().fit(target, features[:-5]) + + +def test_predict_rejects_none_features() -> None: + """``predict`` raises when no exogenous features are supplied.""" + features, target = _synthetic_data() + model = ProphetLikeForecaster().fit(target, features) + with pytest.raises(ValueError, match="requires exogenous features"): + model.predict(5, None) + + +def test_predict_rejects_wrong_shape_features() -> None: + """``predict`` raises when the feature row count is not the horizon.""" + features, target = _synthetic_data() + model = ProphetLikeForecaster().fit(target, features) + with pytest.raises(ValueError, match="horizon"): + model.predict(5, features[:8]) + + +def test_predict_before_fit_raises() -> None: + """``predict`` raises a RuntimeError before the model is fitted.""" + model = ProphetLikeForecaster() + with pytest.raises(RuntimeError, match="fitted"): + model.predict(5, np.zeros((5, _N_FEATURES), dtype=np.float64)) + + +def test_determinism_same_data() -> None: + """Two fits on the same data yield identical forecasts. + + ``Ridge(solver="cholesky")`` is closed-form and ``SimpleImputer(median)`` + is deterministic, so the whole pipeline is bit-reproducible. + """ + features, target = _synthetic_data() + future = features[:12] + first = ProphetLikeForecaster(alpha=1.0).fit(target, features) + second = ProphetLikeForecaster(alpha=1.0).fit(target, features) + np.testing.assert_array_equal(first.predict(12, future), second.predict(12, future)) + + +def test_get_and_set_params() -> None: + """``get_params`` reflects construction; ``set_params`` mutates in place.""" + model = ProphetLikeForecaster(alpha=2.5) + params = model.get_params() + assert params["alpha"] == 2.5 + assert params["random_state"] == 42 + model.set_params(alpha=0.1) + assert model.alpha == 0.1 + + +def test_requires_features_is_true() -> None: + """The Prophet-like model is feature-aware — the ClassVar is True.""" + assert ProphetLikeForecaster.requires_features is True + + +def test_model_factory_creates_prophet_like_forecaster() -> None: + """``model_factory`` dispatches a ProphetLikeModelConfig with NO flag gate.""" + model = model_factory(ProphetLikeModelConfig(alpha=3.0), random_state=42) + assert isinstance(model, ProphetLikeForecaster) + assert model.alpha == 3.0 + + +# --------------------------------------------------------------------------- +# Model-specific tests — additive decomposition, NaN tolerance, leakage +# --------------------------------------------------------------------------- + + +def test_handles_nan_features() -> None: + """A future frame with NaN lag cells predicts finite values. + + The ``SimpleImputer`` fills the NaN cells — a bare ``Ridge`` would raise + ``ValueError: Input contains NaN``. + """ + features, target = _synthetic_data() + model = ProphetLikeForecaster().fit(target, features) + future = features[:6].copy() + future[2, 0] = np.nan # the future frame emits NaN for un-resolvable lags + predictions = model.predict(6, future) + assert bool(np.all(np.isfinite(predictions))) + + +def test_additive_invariant() -> None: + """``decompose()``'s four parts sum (rtol 1e-9) to ``predict()``. + + This is what makes the model "Prophet-like": the forecast genuinely IS the + sum of its trend / seasonality / holiday-regressor components. + """ + features, target = _synthetic_data() + model = ProphetLikeForecaster(alpha=1.0).fit(target, features) + horizon = 12 + future = features[:horizon] + d = model.decompose(future) + reconstructed = d.intercept + d.trend + d.seasonality + d.holiday_regressor + np.testing.assert_allclose(reconstructed, model.predict(horizon, future), rtol=1e-9) + + +def test_decompose_components_have_horizon_length() -> None: + """Each decomposition component array has shape (len(X),).""" + features, target = _synthetic_data() + model = ProphetLikeForecaster().fit(target, features) + horizon = 9 + d = model.decompose(features[:horizon]) + assert d.trend.shape == (horizon,) + assert d.seasonality.shape == (horizon,) + assert d.holiday_regressor.shape == (horizon,) + assert isinstance(d.intercept, float) + + +def test_decompose_uses_trained_imputer_statistics() -> None: + """``decompose()`` imputes future NaN with the TRAINING-window median. + + The imputed X must equal the trained imputer's ``transform`` of the future + frame — never a fresh imputer fitted on the future window (which would + leak future statistics). + """ + features, target = _synthetic_data() + model = ProphetLikeForecaster().fit(target, features) + future = features[:6].copy() + future[2, 0] = np.nan + + imputer = model._estimator.named_steps["impute"] + expected_imputed = imputer.transform(future) + ridge = model._estimator.named_steps["ridge"] + coef = np.asarray(ridge.coef_, dtype=np.float64) + columns = canonical_feature_columns() + + # The trend component includes lag_1 (column 0) — the NaN cell. Recompute + # its contribution from the trained-imputer transform and assert decompose + # produced exactly that (i.e. it used the trained medians, not new ones). + trend_idx = [ + columns.index(c) for c in ("lag_1", "lag_7", "lag_14", "lag_28", "days_since_launch") + ] + expected_trend = expected_imputed[:, trend_idx] @ coef[trend_idx] + + d = model.decompose(future) + np.testing.assert_allclose(d.trend, expected_trend, rtol=1e-12) + # And the imputed lag_1 cell is the training median, not NaN. + assert np.isfinite(expected_imputed[2, 0]) + + +def test_decompose_before_fit_raises() -> None: + """``decompose()`` raises a RuntimeError before the model is fitted.""" + model = ProphetLikeForecaster() + with pytest.raises(RuntimeError, match="fitted"): + model.decompose(np.zeros((5, _N_FEATURES), dtype=np.float64)) diff --git a/app/features/forecasting/tests/test_regression_features_leakage.py b/app/features/forecasting/tests/test_regression_features_leakage.py new file mode 100644 index 00000000..8bffca19 --- /dev/null +++ b/app/features/forecasting/tests/test_regression_features_leakage.py @@ -0,0 +1,94 @@ +"""Leakage spec for the historical regression feature builder — LOAD-BEARING. + +This file IS the spec, mirroring ``app/features/featuresets/tests/test_leakage.py`` +and ``app/shared/feature_frames/tests/test_leakage.py``: it must NEVER be +weakened to make a feature pass (AGENTS.md § Safety). + +It pins the time-safety of :func:`_assemble_regression_rows` — the pure row +assembler behind ``ForecastingService._build_regression_features``. Sequential +target values (1, 2, 3, …) are used so any leakage is mathematically +detectable: with that input the lag-``k`` cell at row ``i`` MUST equal +``quantity[i-k]`` and MUST be strictly less than ``quantity[i]``. A lag cell +equal to or greater than the current row's target proves the builder read +current-or-future data. + +The cutoff itself (``date <= end_date``) is enforced upstream by the SQL window +in ``_build_regression_features``; the row assembler only ever sees the +already-bounded ``dates`` list, and emits exactly one row per supplied date. +""" + +from __future__ import annotations + +import math +from datetime import date, timedelta + +from app.features.forecasting.service import _assemble_regression_rows +from app.shared.feature_frames import EXOGENOUS_LAGS, canonical_feature_columns + +_N = 60 +_DATES = [date(2026, 1, 1) + timedelta(days=offset) for offset in range(_N)] +# Sequential targets 1.0 … 60.0 — quantity[i] == i + 1, so leakage is detectable. +_QUANTITIES = [float(offset + 1) for offset in range(_N)] +_PRICES = [10.0] * _N +_BASELINE_PRICE = 10.0 + + +def _build_rows() -> list[list[float]]: + """Assemble the regression feature matrix from the sequential fixture.""" + return _assemble_regression_rows( + dates=_DATES, + quantities=_QUANTITIES, + prices=_PRICES, + baseline_price=_BASELINE_PRICE, + promo_dates=set(), + holiday_dates=set(), + launch_date=None, + ) + + +def test_lag_columns_read_only_strictly_earlier_observations() -> None: + """CRITICAL: every lag cell reads a strictly-earlier observation, or NaN. + + With sequential targets the lag-``k`` cell at row ``i`` (for ``i >= k``) + must equal ``quantity[i-k]`` exactly and be strictly below the current + row's target ``quantity[i]``. Any cell ``>= quantity[i]`` is future + leakage. Rows before the lag offset have no source day → ``NaN``. + """ + columns = canonical_feature_columns() + rows = _build_rows() + + for lag in EXOGENOUS_LAGS: + col_index = columns.index(f"lag_{lag}") + for i in range(_N): + cell = rows[i][col_index] + if i < lag: + assert math.isnan(cell), ( + f"row {i}: lag_{lag} has no source day yet — expected NaN, got {cell}" + ) + continue + expected = _QUANTITIES[i - lag] + assert cell == expected, ( + f"LEAKAGE DETECTED at row {i}: lag_{lag}={cell} != expected={expected}. " + "Lag feature is not correctly shifted." + ) + assert cell < _QUANTITIES[i], ( + f"LEAKAGE DETECTED at row {i}: lag_{lag}={cell} >= current=" + f"{_QUANTITIES[i]}. Lag feature is using current or future data!" + ) + + +def test_assembled_matrix_shape_matches_canonical_columns() -> None: + """One row per supplied date; every row matches the canonical column width. + + The assembler emits exactly ``len(dates)`` rows and never invents a row + beyond the (cutoff-bounded) ``dates`` it was given. + """ + columns = canonical_feature_columns() + rows = _build_rows() + assert len(rows) == _N + assert all(len(row) == len(columns) for row in rows) + + +def test_assemble_regression_rows_is_deterministic() -> None: + """Identical inputs produce an identical feature matrix — no hidden state.""" + assert _build_rows() == _build_rows() diff --git a/app/features/forecasting/tests/test_routes.py b/app/features/forecasting/tests/test_routes.py index 328d6f9a..42939637 100644 --- a/app/features/forecasting/tests/test_routes.py +++ b/app/features/forecasting/tests/test_routes.py @@ -60,6 +60,44 @@ async def test_train_accepts_iso_string_dates(client: AsyncClient) -> None: _assert_no_date_type_422(response) +@pytest.mark.integration +async def test_train_lightgbm_rejected_when_disabled(client: AsyncClient) -> None: + """LightGBM training is refused with 400 while the feature flag is off. + + ``forecast_enable_lightgbm`` defaults to ``False``; the route gate returns a + 400 before any DB or model work (PRP-30 / MLZOO-B). + """ + payload = { + "store_id": 1, + "product_id": 2, + "train_start_date": "2024-01-01", + "train_end_date": "2024-01-31", + "config": {"model_type": "lightgbm"}, + } + response = await client.post("/forecasting/train", json=payload) + assert response.status_code == 400 + assert "lightgbm" in response.text.lower() + + +@pytest.mark.integration +async def test_train_xgboost_rejected_when_disabled(client: AsyncClient) -> None: + """XGBoost training is refused with 400 while the feature flag is off. + + ``forecast_enable_xgboost`` defaults to ``False``; the route gate returns a + 400 before any DB or model work (PRP-MLZOO-C1). + """ + payload = { + "store_id": 1, + "product_id": 2, + "train_start_date": "2024-01-01", + "train_end_date": "2024-01-31", + "config": {"model_type": "xgboost"}, + } + response = await client.post("/forecasting/train", json=payload) + assert response.status_code == 400 + assert "xgboost" in response.text.lower() + + @pytest.mark.integration async def test_predict_accepts_request(client: AsyncClient) -> None: # PredictRequest has no date fields; this is a smoke test for completeness diff --git a/app/features/forecasting/tests/test_service.py b/app/features/forecasting/tests/test_service.py index 6b476686..1f387bb9 100644 --- a/app/features/forecasting/tests/test_service.py +++ b/app/features/forecasting/tests/test_service.py @@ -344,3 +344,99 @@ async def test_train_returns_model_path(self): assert Path(response.model_path).exists() assert response.n_observations == 30 assert response.model_type == "naive" + + +class TestFeatureAwareContract: + """Tests for the feature-aware model contract (MLZOO-A / PRP-29).""" + + def test_requires_features_flag(self): + """Baseline forecasters require no features; feature-aware ones do.""" + from app.features.forecasting.models import LightGBMForecaster, XGBoostForecaster + from app.features.forecasting.schemas import ( + ProphetLikeModelConfig, + RegressionModelConfig, + ) + + assert model_factory(NaiveModelConfig()).requires_features is False + assert model_factory(SeasonalNaiveModelConfig()).requires_features is False + assert model_factory(MovingAverageModelConfig()).requires_features is False + assert model_factory(RegressionModelConfig()).requires_features is True + # The Prophet-like model is feature-aware too — pure scikit-learn, so + # the factory needs no flag and no optional dependency. + assert model_factory(ProphetLikeModelConfig()).requires_features is True + # LightGBM is feature-aware too — assert the ClassVar directly so this + # needs neither the factory flag nor the optional lightgbm dependency. + assert LightGBMForecaster.requires_features is True + # XGBoost is feature-aware too — same import-free ClassVar assertion. + assert XGBoostForecaster.requires_features is True + + def test_lightgbm_factory_respects_flag(self): + """model_factory gates LightGBM behind forecast_enable_lightgbm. + + Construction is flag-gated but import-free (``lightgbm`` is imported + lazily inside ``fit``), so neither branch needs the optional extra. + """ + from app.features.forecasting.models import LightGBMForecaster + from app.features.forecasting.schemas import LightGBMModelConfig + + disabled = MagicMock() + disabled.forecast_enable_lightgbm = False + with ( + patch("app.core.config.get_settings", return_value=disabled), + pytest.raises(ValueError, match="not enabled"), + ): + model_factory(LightGBMModelConfig()) + + enabled = MagicMock() + enabled.forecast_enable_lightgbm = True + with patch("app.core.config.get_settings", return_value=enabled): + model = model_factory(LightGBMModelConfig()) + assert isinstance(model, LightGBMForecaster) + + def test_xgboost_factory_respects_flag(self): + """model_factory gates XGBoost behind forecast_enable_xgboost. + + Construction is flag-gated but import-free (``xgboost`` is imported + lazily inside ``fit``), so neither branch needs the optional extra. + """ + from app.features.forecasting.models import XGBoostForecaster + from app.features.forecasting.schemas import XGBoostModelConfig + + disabled = MagicMock() + disabled.forecast_enable_xgboost = False + with ( + patch("app.core.config.get_settings", return_value=disabled), + pytest.raises(ValueError, match="not enabled"), + ): + model_factory(XGBoostModelConfig()) + + enabled = MagicMock() + enabled.forecast_enable_xgboost = True + with patch("app.core.config.get_settings", return_value=enabled): + model = model_factory(XGBoostModelConfig()) + assert isinstance(model, XGBoostForecaster) + + def test_canonical_columns_match_regression_contract(self): + """The canonical column set is the exact 14-name regression contract. + + Pins the contract after the local duplicated column-list constant + was deleted in favour of the shared single source of truth. + """ + from app.shared.feature_frames import canonical_feature_columns + + assert canonical_feature_columns() == [ + "lag_1", + "lag_7", + "lag_14", + "lag_28", + "dow_sin", + "dow_cos", + "month_sin", + "month_cos", + "is_weekend", + "is_month_end", + "price_factor", + "promo_active", + "is_holiday", + "days_since_launch", + ] diff --git a/app/features/forecasting/tests/test_xgboost_forecaster.py b/app/features/forecasting/tests/test_xgboost_forecaster.py new file mode 100644 index 00000000..2f4e5fb1 --- /dev/null +++ b/app/features/forecasting/tests/test_xgboost_forecaster.py @@ -0,0 +1,143 @@ +"""Unit tests for ``XGBoostForecaster`` (PRP-MLZOO-C1). + +The XGBoost forecaster is the second ADVANCED feature-aware tree model and a +structural twin of ``LightGBMForecaster``. Like ``RegressionForecaster`` it +*consumes* the exogenous ``X`` argument, so these tests mirror that contract: +``X`` is required, its shape is validated, fits are deterministic, and ``NaN`` +features are tolerated (XGBoost handles missing values natively via +``missing=np.nan``). + +The whole module SKIPs (never ERRORs) when the optional ``ml-xgboost`` +dependency is absent — ``pytest.importorskip``. Importing ``XGBoostForecaster`` +itself is leak-free (``xgboost`` is imported lazily inside ``fit``), so the +class import sits with the other module imports; the ``importorskip`` guard +fires only because every test below actually fits a model. +""" + +from __future__ import annotations + +from typing import Any +from unittest.mock import MagicMock, patch + +import numpy as np +import pytest + +from app.features.forecasting.models import XGBoostForecaster, model_factory +from app.features.forecasting.schemas import XGBoostModelConfig + +pytest.importorskip("xgboost") + +FloatArray = np.ndarray[Any, np.dtype[np.floating[Any]]] + + +def _synthetic_data( + n: int = 120, n_features: int = 6, seed: int = 0 +) -> tuple[FloatArray, FloatArray]: + """Build a synthetic feature matrix and a target that depends on it.""" + rng = np.random.default_rng(seed) + features = rng.normal(size=(n, n_features)) + target = 50.0 + 5.0 * features[:, 0] - 3.0 * features[:, 1] + rng.normal(scale=0.5, size=n) + return features.astype(np.float64), target.astype(np.float64) + + +def test_fit_predict_roundtrip() -> None: + """A fitted XGBoost model produces a finite forecast of horizon length.""" + features, target = _synthetic_data() + model = XGBoostForecaster() + model.fit(target, features) + assert model.is_fitted + + horizon = 10 + predictions = model.predict(horizon, features[:horizon]) + assert predictions.shape == (horizon,) + assert bool(np.all(np.isfinite(predictions))) + + +def test_fit_rejects_none_features() -> None: + """``fit`` raises when no exogenous features are supplied.""" + _, target = _synthetic_data() + with pytest.raises(ValueError, match="requires exogenous features"): + XGBoostForecaster().fit(target, None) + + +def test_fit_rejects_mismatched_rows() -> None: + """``fit`` raises when feature and target row counts differ.""" + features, target = _synthetic_data() + with pytest.raises(ValueError, match="rows must match"): + XGBoostForecaster().fit(target, features[:-5]) + + +def test_predict_rejects_none_features() -> None: + """``predict`` raises when no exogenous features are supplied.""" + features, target = _synthetic_data() + model = XGBoostForecaster().fit(target, features) + with pytest.raises(ValueError, match="requires exogenous features"): + model.predict(5, None) + + +def test_predict_rejects_wrong_shape_features() -> None: + """``predict`` raises when the feature row count is not the horizon.""" + features, target = _synthetic_data() + model = XGBoostForecaster().fit(target, features) + with pytest.raises(ValueError, match="horizon"): + model.predict(5, features[:8]) + + +def test_predict_before_fit_raises() -> None: + """``predict`` raises a RuntimeError before the model is fitted.""" + model = XGBoostForecaster() + with pytest.raises(RuntimeError, match="fitted"): + model.predict(5, np.zeros((5, 3), dtype=np.float64)) + + +def test_determinism_same_random_state() -> None: + """Two fits with the same random_state yield identical forecasts. + + XGBoost has no ``deterministic`` switch (unlike LightGBM). Bit- + reproducibility comes from ``n_jobs=1`` + ``tree_method="hist"`` + a fixed + ``random_state`` + the conservative config leaving ``subsample`` / + ``colsample_bytree`` at their ``1.0`` defaults — all pinned in ``fit`` — so + an EXACT ``assert_array_equal`` within one environment is the correct gate. + """ + features, target = _synthetic_data() + future = features[:12] + first = XGBoostForecaster(random_state=7).fit(target, features) + second = XGBoostForecaster(random_state=7).fit(target, features) + np.testing.assert_array_equal(first.predict(12, future), second.predict(12, future)) + + +def test_handles_nan_features() -> None: + """``XGBRegressor`` tolerates NaN feature cells natively.""" + features, target = _synthetic_data() + model = XGBoostForecaster().fit(target, features) + future = features[:6].copy() + future[2, 0] = np.nan # the future frame emits NaN for un-resolvable lags + predictions = model.predict(6, future) + assert bool(np.all(np.isfinite(predictions))) + + +def test_get_and_set_params() -> None: + """``get_params`` reflects construction; ``set_params`` mutates in place.""" + model = XGBoostForecaster(n_estimators=150, learning_rate=0.03, max_depth=4) + params = model.get_params() + assert params["n_estimators"] == 150 + assert params["learning_rate"] == 0.03 + assert params["max_depth"] == 4 + model.set_params(max_depth=9) + assert model.max_depth == 9 + + +def test_requires_features_is_true() -> None: + """XGBoost is a feature-aware model — the ClassVar is True.""" + assert XGBoostForecaster.requires_features is True + + +def test_model_factory_creates_xgboost_forecaster() -> None: + """``model_factory`` dispatches an XGBoostModelConfig when the flag is on.""" + enabled = MagicMock() + enabled.forecast_enable_xgboost = True + with patch("app.core.config.get_settings", return_value=enabled): + model = model_factory(XGBoostModelConfig(n_estimators=120), random_state=42) + assert isinstance(model, XGBoostForecaster) + assert model.n_estimators == 120 + assert model.random_state == 42 diff --git a/app/features/jobs/schemas.py b/app/features/jobs/schemas.py index 0f411dfa..b7e9663d 100644 --- a/app/features/jobs/schemas.py +++ b/app/features/jobs/schemas.py @@ -25,7 +25,7 @@ class JobCreate(BaseModel): **Job Types and Required Params**: - **train**: Train a forecasting model - - `model_type`: Required - 'naive', 'seasonal_naive', 'linear_regression', etc. + - `model_type`: Required - 'naive', 'seasonal_naive', 'moving_average', 'regression'. - `store_id`: Required - Store ID from /dimensions/stores - `product_id`: Required - Product ID from /dimensions/products - `start_date`: Required - Training data start (YYYY-MM-DD) diff --git a/app/features/jobs/service.py b/app/features/jobs/service.py index 528bb20e..f1af696b 100644 --- a/app/features/jobs/service.py +++ b/app/features/jobs/service.py @@ -424,9 +424,13 @@ async def _execute_train( from datetime import date as date_type from app.features.forecasting.schemas import ( + LightGBMModelConfig, MovingAverageModelConfig, NaiveModelConfig, + ProphetLikeModelConfig, + RegressionModelConfig, SeasonalNaiveModelConfig, + XGBoostModelConfig, ) from app.features.forecasting.service import ForecastingService @@ -457,6 +461,31 @@ async def _execute_train( elif model_type == "moving_average": window_size = params.get("window_size", 7) config = MovingAverageModelConfig(window_size=window_size) + elif model_type == "regression": + config = RegressionModelConfig( + max_iter=params.get("max_iter", 200), + learning_rate=params.get("learning_rate", 0.05), + max_depth=params.get("max_depth", 6), + ) + elif model_type == "lightgbm": + # The forecast_enable_lightgbm gate lives in model_factory — a + # lightgbm job with the flag off fails loud at train time. + config = LightGBMModelConfig( + n_estimators=params.get("n_estimators", 100), + learning_rate=params.get("learning_rate", 0.1), + max_depth=params.get("max_depth", 6), + ) + elif model_type == "xgboost": + # The forecast_enable_xgboost gate lives in model_factory — an + # xgboost job with the flag off fails loud at train time. + config = XGBoostModelConfig( + n_estimators=params.get("n_estimators", 100), + learning_rate=params.get("learning_rate", 0.1), + max_depth=params.get("max_depth", 6), + ) + elif model_type == "prophet_like": + # Pure scikit-learn additive model — no flag, always available. + config = ProphetLikeModelConfig(alpha=params.get("alpha", 1.0)) else: msg = f"Unsupported model_type: {model_type}" raise ValueError(msg) @@ -593,9 +622,13 @@ async def _execute_backtest( from app.features.backtesting.schemas import BacktestConfig, SplitConfig from app.features.backtesting.service import BacktestingService from app.features.forecasting.schemas import ( + LightGBMModelConfig, MovingAverageModelConfig, NaiveModelConfig, + ProphetLikeModelConfig, + RegressionModelConfig, SeasonalNaiveModelConfig, + XGBoostModelConfig, ) service = BacktestingService() @@ -628,6 +661,20 @@ async def _execute_backtest( elif model_type == "moving_average": window_size = params.get("window_size", 7) model_config = MovingAverageModelConfig(window_size=window_size) + elif model_type == "regression": + # Feature-aware — the backtest builds per-fold leakage-safe X. + model_config = RegressionModelConfig() + elif model_type == "lightgbm": + # Feature-aware — still gated by forecast_enable_lightgbm inside + # model_factory; a disabled flag surfaces as a loud failed job. + model_config = LightGBMModelConfig() + elif model_type == "xgboost": + # Feature-aware — still gated by forecast_enable_xgboost inside + # model_factory; a disabled flag surfaces as a loud failed job. + model_config = XGBoostModelConfig() + elif model_type == "prophet_like": + # Feature-aware — the backtest builds per-fold leakage-safe X. + model_config = ProphetLikeModelConfig() else: msg = f"Unsupported model_type: {model_type}" raise ValueError(msg) diff --git a/app/features/jobs/tests/test_service.py b/app/features/jobs/tests/test_service.py index cd05540d..f377033a 100644 --- a/app/features/jobs/tests/test_service.py +++ b/app/features/jobs/tests/test_service.py @@ -8,6 +8,11 @@ import math from datetime import date +from typing import Any, cast +from unittest.mock import AsyncMock, patch + +import pytest +from sqlalchemy.ext.asyncio import AsyncSession from app.features.backtesting.schemas import ( BacktestResponse, @@ -16,7 +21,16 @@ SplitBoundary, SplitConfig, ) -from app.features.jobs.service import _finite, _shape_backtest_result +from app.features.backtesting.service import BacktestingService +from app.features.forecasting.schemas import ( + LightGBMModelConfig, + ProphetLikeModelConfig, + RegressionModelConfig, + TrainResponse, + XGBoostModelConfig, +) +from app.features.forecasting.service import ForecastingService +from app.features.jobs.service import JobService, _finite, _shape_backtest_result def _fold(idx: int, mae: float, smape: float, wape: float, bias: float) -> FoldResult: @@ -158,3 +172,225 @@ def test_finite_coerces_non_finite_values() -> None: assert _finite(math.nan) == 0.0 assert _finite(math.inf) == 0.0 assert _finite(-math.inf) == 0.0 + + +# ============================================================================= +# _execute_train regression-model support (#229) +# ============================================================================= + + +def _fake_train_response(model_type: str) -> TrainResponse: + """Build a TrainResponse stub for mocking ForecastingService.train_model.""" + return TrainResponse( + store_id=1, + product_id=1, + model_type=model_type, + model_path="/data/artifacts/model_abc123def456.joblib", + config_hash="cfg-hash", + n_observations=400, + train_start_date=date(2024, 1, 1), + train_end_date=date(2024, 12, 31), + duration_ms=12.0, + ) + + +_REGRESSION_PARAMS: dict[str, Any] = { + "model_type": "regression", + "store_id": 1, + "product_id": 1, + "start_date": "2024-01-01", + "end_date": "2024-12-31", +} + + +async def test_execute_train_builds_regression_config() -> None: + """A train job with model_type='regression' builds a RegressionModelConfig (#229).""" + fake = _fake_train_response("regression") + with patch.object( + ForecastingService, "train_model", new=AsyncMock(return_value=fake) + ) as mock_train: + result = await JobService()._execute_train( + db=cast(AsyncSession, AsyncMock()), + params=_REGRESSION_PARAMS, + ) + assert mock_train.call_args is not None + config = mock_train.call_args.kwargs["config"] + assert isinstance(config, RegressionModelConfig) + assert result["model_type"] == "regression" + # run_id is parsed from the model_abc123def456.joblib artifact path. + assert result["run_id"] == "abc123def456" + + +async def test_execute_train_builds_lightgbm_config() -> None: + """A train job with model_type='lightgbm' builds a LightGBMModelConfig (#242). + + ``train_model`` is mocked, so ``model_factory`` (and its feature-flag gate) + is never reached and ``LightGBMModelConfig`` is a pure Pydantic schema — + this test needs neither the flag nor the optional lightgbm dependency. + """ + fake = _fake_train_response("lightgbm") + with patch.object( + ForecastingService, "train_model", new=AsyncMock(return_value=fake) + ) as mock_train: + result = await JobService()._execute_train( + db=cast(AsyncSession, AsyncMock()), + params={**_REGRESSION_PARAMS, "model_type": "lightgbm"}, + ) + assert mock_train.call_args is not None + config = mock_train.call_args.kwargs["config"] + assert isinstance(config, LightGBMModelConfig) + assert result["model_type"] == "lightgbm" + + +async def test_execute_train_builds_xgboost_config() -> None: + """A train job with model_type='xgboost' builds an XGBoostModelConfig (#247). + + ``train_model`` is mocked, so ``model_factory`` (and its feature-flag gate) + is never reached and ``XGBoostModelConfig`` is a pure Pydantic schema — + this test needs neither the flag nor the optional xgboost dependency. + """ + fake = _fake_train_response("xgboost") + with patch.object( + ForecastingService, "train_model", new=AsyncMock(return_value=fake) + ) as mock_train: + result = await JobService()._execute_train( + db=cast(AsyncSession, AsyncMock()), + params={**_REGRESSION_PARAMS, "model_type": "xgboost"}, + ) + assert mock_train.call_args is not None + config = mock_train.call_args.kwargs["config"] + assert isinstance(config, XGBoostModelConfig) + assert result["model_type"] == "xgboost" + + +async def test_execute_train_builds_prophet_like_config() -> None: + """A train job with model_type='prophet_like' builds a ProphetLikeModelConfig (#248). + + ``train_model`` is mocked, so the test is pure (no DB). The Prophet-like + model is pure scikit-learn — no feature flag, no optional dependency. + """ + fake = _fake_train_response("prophet_like") + with patch.object( + ForecastingService, "train_model", new=AsyncMock(return_value=fake) + ) as mock_train: + result = await JobService()._execute_train( + db=cast(AsyncSession, AsyncMock()), + params={**_REGRESSION_PARAMS, "model_type": "prophet_like"}, + ) + assert mock_train.call_args is not None + config = mock_train.call_args.kwargs["config"] + assert isinstance(config, ProphetLikeModelConfig) + assert result["model_type"] == "prophet_like" + + +async def test_execute_train_rejects_unsupported_model_type() -> None: + """_execute_train still rejects a genuinely unsupported model_type.""" + with pytest.raises(ValueError, match="Unsupported model_type"): + await JobService()._execute_train( + db=cast(AsyncSession, AsyncMock()), + params={**_REGRESSION_PARAMS, "model_type": "arima"}, + ) + + +# Parameters for a backtest job — _execute_backtest reads these keys. +_BACKTEST_PARAMS: dict[str, Any] = { + "model_type": "regression", + "store_id": 1, + "product_id": 1, + "start_date": "2024-01-01", + "end_date": "2024-12-01", + "n_splits": 3, +} + + +async def test_execute_backtest_builds_regression_config() -> None: + """A backtest job with model_type='regression' builds a RegressionModelConfig. + + ``run_backtest`` is mocked, so the test is pure (no DB): it pins that + ``_execute_backtest`` widened its allow-list and shaped the result. + """ + response = _make_response() + with patch.object( + BacktestingService, "run_backtest", new=AsyncMock(return_value=response) + ) as mock_run: + result = await JobService()._execute_backtest( + db=cast(AsyncSession, AsyncMock()), + params=_BACKTEST_PARAMS, + ) + assert mock_run.call_args is not None + config = mock_run.call_args.kwargs["config"] + assert isinstance(config.model_config_main, RegressionModelConfig) + assert result["model_type"] == "regression" + # The frontend job-result contract is still shaped (byte-stable keys). + assert "fold_metrics" in result + assert "aggregated_metrics" in result + + +async def test_execute_backtest_builds_lightgbm_config() -> None: + """A backtest job with model_type='lightgbm' builds a LightGBMModelConfig. + + ``run_backtest`` is mocked, so ``model_factory``'s feature-flag gate is + never reached and the optional lightgbm dependency is not required. + """ + response = _make_response() + with patch.object( + BacktestingService, "run_backtest", new=AsyncMock(return_value=response) + ) as mock_run: + result = await JobService()._execute_backtest( + db=cast(AsyncSession, AsyncMock()), + params={**_BACKTEST_PARAMS, "model_type": "lightgbm"}, + ) + assert mock_run.call_args is not None + config = mock_run.call_args.kwargs["config"] + assert isinstance(config.model_config_main, LightGBMModelConfig) + assert result["model_type"] == "lightgbm" + + +async def test_execute_backtest_builds_xgboost_config() -> None: + """A backtest job with model_type='xgboost' builds an XGBoostModelConfig. + + ``run_backtest`` is mocked, so ``model_factory``'s feature-flag gate is + never reached and the optional xgboost dependency is not required. + """ + response = _make_response() + with patch.object( + BacktestingService, "run_backtest", new=AsyncMock(return_value=response) + ) as mock_run: + result = await JobService()._execute_backtest( + db=cast(AsyncSession, AsyncMock()), + params={**_BACKTEST_PARAMS, "model_type": "xgboost"}, + ) + assert mock_run.call_args is not None + config = mock_run.call_args.kwargs["config"] + assert isinstance(config.model_config_main, XGBoostModelConfig) + assert result["model_type"] == "xgboost" + + +async def test_execute_backtest_builds_prophet_like_config() -> None: + """A backtest job with model_type='prophet_like' builds a ProphetLikeModelConfig. + + ``run_backtest`` is mocked, so the test is pure (no DB): it pins that + ``_execute_backtest`` widened its allow-list to the pure-sklearn additive + model and shaped the result. + """ + response = _make_response() + with patch.object( + BacktestingService, "run_backtest", new=AsyncMock(return_value=response) + ) as mock_run: + result = await JobService()._execute_backtest( + db=cast(AsyncSession, AsyncMock()), + params={**_BACKTEST_PARAMS, "model_type": "prophet_like"}, + ) + assert mock_run.call_args is not None + config = mock_run.call_args.kwargs["config"] + assert isinstance(config.model_config_main, ProphetLikeModelConfig) + assert result["model_type"] == "prophet_like" + + +async def test_execute_backtest_rejects_unsupported_model_type() -> None: + """_execute_backtest still rejects a genuinely unsupported model_type.""" + with pytest.raises(ValueError, match="Unsupported model_type"): + await JobService()._execute_backtest( + db=cast(AsyncSession, AsyncMock()), + params={**_BACKTEST_PARAMS, "model_type": "arima"}, + ) diff --git a/app/features/registry/service.py b/app/features/registry/service.py index 076910c1..1170d3af 100644 --- a/app/features/registry/service.py +++ b/app/features/registry/service.py @@ -120,6 +120,22 @@ def _capture_runtime_info(self) -> dict[str, Any]: except ImportError: pass + # LightGBM is an optional dependency — only recorded when installed. + try: + import lightgbm + + runtime_info["lightgbm_version"] = lightgbm.__version__ + except ImportError: + pass + + # XGBoost is an optional dependency — only recorded when installed. + try: + import xgboost + + runtime_info["xgboost_version"] = xgboost.__version__ + except ImportError: + pass + return runtime_info def _compute_config_hash(self, config: dict[str, Any]) -> str: diff --git a/app/features/registry/tests/test_service.py b/app/features/registry/tests/test_service.py index 5a5fde28..684ec0b4 100644 --- a/app/features/registry/tests/test_service.py +++ b/app/features/registry/tests/test_service.py @@ -91,6 +91,22 @@ def test_capture_runtime_info_has_package_versions(self) -> None: assert "numpy_version" in info assert "pandas_version" in info + def test_capture_runtime_info_has_lightgbm_version(self) -> None: + """Captures the LightGBM version when the optional dep is installed (PRP-30).""" + pytest.importorskip("lightgbm") + service = RegistryService() + info = service._capture_runtime_info() + + assert "lightgbm_version" in info + + def test_capture_runtime_info_has_xgboost_version(self) -> None: + """Captures the XGBoost version when the optional dep is installed (PRP-MLZOO-C1).""" + pytest.importorskip("xgboost") + service = RegistryService() + info = service._capture_runtime_info() + + assert "xgboost_version" in info + class TestRegistryServiceConfigHashDuplicate: """Tests for config hash and duplicate detection.""" diff --git a/app/features/scenarios/feature_frame.py b/app/features/scenarios/feature_frame.py index 0c9c2635..f8307288 100644 --- a/app/features/scenarios/feature_frame.py +++ b/app/features/scenarios/feature_frame.py @@ -17,33 +17,35 @@ It may NEVER read an observed target at a horizon day. ``app/features/scenarios/tests/test_future_frame_leakage.py`` is the -load-bearing spec for that rule — it must never be weakened (AGENTS.md -§ Safety), mirroring ``app/features/featuresets/tests/test_leakage.py``. +load-bearing spec for the assumption-driven columns and the assembled frame; the +shared pure builders are spec'd by +``app/shared/feature_frames/tests/test_leakage.py``. Neither may be weakened +(AGENTS.md § Safety), mirroring ``app/features/featuresets/tests/test_leakage.py``. DECISIONS LOCKED (PRP-27): * #3 — no cross-slice ``service.py`` import. This module imports only the - ``data_platform`` ORM (a sanctioned read-only ORM import) and same-slice - schema value-objects; it replicates the small slice of leakage-safe - lag/calendar logic it needs rather than importing - ``FeatureEngineeringService``. + ``data_platform`` ORM (a sanctioned read-only ORM import), the shared + feature-frame contract (``app/shared/feature_frames`` — a leaf-level + package, the allowed ``app/features -> app/shared`` direction), and + same-slice schema value-objects. * #4 — long-lag + calendar + assumption-driven columns ONLY; no recursion. A target lag value for horizon day ``T+j`` is the observed ``y[T+j-k]``; when ``T+j-k > T`` (a future target) the cell is ``NaN`` — the model (``HistGradientBoostingRegressor``) handles ``NaN`` natively. No recursion ever fills those gaps in v1. -* #10/#11/#12 — the PINNED constants ``EXOGENOUS_LAGS``, - ``HISTORY_TAIL_DAYS`` and ``MAX_COMPARE_SCENARIOS`` live here. - -Feature-column contract: ``canonical_feature_columns()`` is the single source -of truth for the regression feature set and column order. The Phase B training -path persists exactly this list in the bundle metadata, and the future frame -reproduces it column-for-column, so a model trained today re-forecasts cleanly. +* #12 — ``MAX_COMPARE_SCENARIOS`` (the Phase-C comparison cap) lives here. + +Feature-column contract: ``app/shared/feature_frames`` is the single source of +truth for the regression feature set, its column order, the pinned constants +(``EXOGENOUS_LAGS``, ``HISTORY_TAIL_DAYS``), and the leakage-safe pure builders +(``build_calendar_columns``, ``build_long_lag_columns``). This module imports — +and re-exports, for back-compat — those names; it owns only the +assumption-driven, DB-touching parts of the future frame. """ from __future__ import annotations import math -from dataclasses import dataclass from datetime import date, timedelta from typing import TYPE_CHECKING @@ -51,6 +53,16 @@ from app.core.logging import get_logger from app.features.data_platform.models import Calendar +from app.shared.feature_frames import ( + CALENDAR_COLUMNS, + EXOGENOUS_COLUMNS, + EXOGENOUS_LAGS, + HISTORY_TAIL_DAYS, + FutureFeatureFrame, + build_calendar_columns, + build_long_lag_columns, + canonical_feature_columns, +) if TYPE_CHECKING: from sqlalchemy.ext.asyncio import AsyncSession @@ -59,73 +71,31 @@ logger = get_logger(__name__) -# ── PINNED modelling constants (PRP-27 DECISIONS LOCKED #10/#11/#12) ── -# Lag offsets (days) for the target long-lag columns: daily, weekly, -# fortnightly, and a four-week lag covering the dominant retail seasonality. -EXOGENOUS_LAGS: tuple[int, ...] = (1, 7, 14, 28) -# Observed-target tail (days, ending at the forecast origin T) fed to the -# generator — 90 comfortably exceeds the largest lag offset (28). -HISTORY_TAIL_DAYS: int = 90 +# Public surface of this module. The first block is the future-frame contract +# re-exported from ``app/shared/feature_frames`` so existing importers of this +# module keep resolving (back-compat); listing them in ``__all__`` marks the +# re-export as intentional for both ruff (F401) and pyright (reportUnusedImport). +__all__ = [ + "CALENDAR_COLUMNS", + "EXOGENOUS_COLUMNS", + "EXOGENOUS_LAGS", + "HISTORY_TAIL_DAYS", + "MAX_COMPARE_SCENARIOS", + "FutureFeatureFrame", + "assemble_future_frame", + "build_calendar_columns", + "build_exogenous_columns", + "build_future_frame", + "build_long_lag_columns", + "canonical_feature_columns", +] + # Upper bound on the multi-scenario comparison (Phase C) so the chart stays -# legible; defined here as the slice's single modelling-constants home. +# legible; defined here as the scenarios slice's single modelling-constants +# home (PRP-27 DECISIONS LOCKED #12). NOT a feature-frame concept, so it stays +# in this slice rather than moving to ``app/shared/feature_frames``. MAX_COMPARE_SCENARIOS: int = 5 -# Fixed calendar columns — each a pure function of the date, never a leak. -CALENDAR_COLUMNS: tuple[str, ...] = ( - "dow_sin", - "dow_cos", - "month_sin", - "month_cos", - "is_weekend", - "is_month_end", -) -# Fixed current-day exogenous columns — driven by the scenario assumptions -# (the planner's posited future inputs) and by timeless attributes (the -# calendar, the product launch date). Every value is knowable at origin T. -EXOGENOUS_COLUMNS: tuple[str, ...] = ( - "price_factor", - "promo_active", - "is_holiday", - "days_since_launch", -) - - -@dataclass -class FutureFeatureFrame: - """A horizon-length feature matrix for one ``(store, product)`` series. - - Attributes: - dates: The horizon days ``T+1 … T+horizon`` (chronological). - feature_columns: Column order — matches the trained bundle exactly. - matrix: Row-major ``[horizon][n_features]``; ``NaN`` is allowed and - expected (a long-lag cell whose source target lies in the future, - or ``days_since_launch`` when the product has no launch date). - """ - - dates: list[date] - feature_columns: list[str] - matrix: list[list[float]] - - -def canonical_feature_columns(lags: tuple[int, ...] = EXOGENOUS_LAGS) -> list[str]: - """Return the fixed, ordered regression feature-column list. - - This is the single source of truth for the regression feature set. The - Phase B training path persists exactly this list in the model bundle's - metadata; the future frame reproduces it column-for-column. The column - set is deliberately *fixed* (not horizon-dependent): for a long horizon - some target-lag columns are mostly ``NaN``, which the NaN-tolerant - estimator handles — far safer than a horizon-varying column set. - - Args: - lags: Target long-lag offsets (defaults to the pinned ``EXOGENOUS_LAGS``). - - Returns: - Ordered column names: target lags, then calendar, then exogenous. - """ - target_lags = [f"lag_{k}" for k in lags] - return [*target_lags, *CALENDAR_COLUMNS, *EXOGENOUS_COLUMNS] - def _in_window(point_date: date, start: date, end: date) -> bool: """True when ``point_date`` is inside the inclusive ``[start, end]`` window. @@ -138,84 +108,6 @@ def _in_window(point_date: date, start: date, end: date) -> bool: return lo <= point_date <= hi -def _is_month_end(point_date: date) -> bool: - """True when ``point_date`` is the last day of its month.""" - return (point_date + timedelta(days=1)).month != point_date.month - - -def build_calendar_columns(dates: list[date]) -> dict[str, list[float]]: - """Build the calendar feature columns — a pure function of each date. - - Calendar features carry zero leakage risk: they read only the date - itself, never the target series. Day-of-week and month use cyclical - (sin/cos) encoding so the estimator sees their periodic structure. - - Args: - dates: The horizon days. - - Returns: - A mapping of every name in :data:`CALENDAR_COLUMNS` to its per-day - values. - """ - columns: dict[str, list[float]] = {name: [] for name in CALENDAR_COLUMNS} - for point_date in dates: - dow = point_date.weekday() # 0 = Monday … 6 = Sunday - month = point_date.month - columns["dow_sin"].append(math.sin(2.0 * math.pi * dow / 7.0)) - columns["dow_cos"].append(math.cos(2.0 * math.pi * dow / 7.0)) - columns["month_sin"].append(math.sin(2.0 * math.pi * month / 12.0)) - columns["month_cos"].append(math.cos(2.0 * math.pi * month / 12.0)) - columns["is_weekend"].append(1.0 if dow >= 5 else 0.0) - columns["is_month_end"].append(1.0 if _is_month_end(point_date) else 0.0) - return columns - - -def build_long_lag_columns( - history_tail: list[float], - horizon: int, - lags: tuple[int, ...] = EXOGENOUS_LAGS, -) -> dict[str, list[float]]: - """Build the target long-lag columns — the leakage-critical helper. - - ``history_tail`` is the observed target series ending at the forecast - origin ``T``: ``history_tail[-1] == y[T]``, ``history_tail[-2] == y[T-1]``, - and so on. The lag-``k`` column at horizon day ``T+j`` (``j`` in - ``1 … horizon``) is the observed target ``y[T+j-k]``. - - SAFETY (PRP-27 DECISIONS LOCKED #4): the source index into - ``history_tail`` is ``idx = (j - 1) - k``. The cell is populated **only - when ``idx < 0``** — i.e. the source day ``T+j-k`` lies at or before the - origin ``T`` and therefore inside ``history_tail``. When ``idx >= 0`` the - source day is a *future* horizon day with no observed target, so the cell - is ``NaN`` — never a recursive prediction, never a fabricated value. This - function structurally **cannot** read a future target: its only data - input is ``history_tail`` (entirely ``<= T``). - - Args: - history_tail: Observed target values ending at the origin ``T``. - horizon: Number of horizon days. - lags: Lag offsets (defaults to the pinned ``EXOGENOUS_LAGS``). - - Returns: - A mapping ``"lag_{k}" -> [horizon values]``; out-of-range cells are - ``NaN``. - """ - tail_len = len(history_tail) - columns: dict[str, list[float]] = {} - for lag in lags: - column: list[float] = [] - for j in range(1, horizon + 1): - # Negative index from the end of history_tail. idx < 0 means the - # source day T+j-k is at/before the origin T — safe to read. - idx = (j - 1) - lag - if idx < 0 and -tail_len <= idx: - column.append(float(history_tail[idx])) - else: - column.append(math.nan) - columns[f"lag_{lag}"] = column - return columns - - def build_exogenous_columns( dates: list[date], assumptions: ScenarioAssumptions, diff --git a/app/features/scenarios/service.py b/app/features/scenarios/service.py index 4fcdb308..fc038d9a 100644 --- a/app/features/scenarios/service.py +++ b/app/features/scenarios/service.py @@ -106,10 +106,12 @@ async def simulate( store_id = int(str(store_id_raw)) product_id = int(str(product_id_raw)) - # A regression baseline answers the what-if by genuinely re-forecasting - # through the future feature frame; every other model type uses the - # deterministic heuristic multiplier below (PRP-27 DECISIONS LOCKED #1). - if bundle.config.model_type == "regression": + # A feature-aware baseline (regression, lightgbm) answers the what-if by + # genuinely re-forecasting through the future feature frame; every + # target-only model uses the deterministic heuristic multiplier below + # (PRP-27 DECISIONS LOCKED #1). The branch is capability-based — + # ``requires_features`` — exactly like ``ForecastingService``. + if bundle.model.requires_features: return await self._simulate_model_exogenous(db, request, bundle, store_id, product_id) # Replicate the ForecastingService.predict body (DECISIONS LOCKED #2). @@ -187,7 +189,7 @@ async def _simulate_model_exogenous( store_id: int, product_id: int, ) -> ScenarioComparison: - """Re-forecast a regression baseline through the future feature frame. + """Re-forecast a feature-aware baseline through the future feature frame. Builds two leakage-safe future frames — one carrying the scenario assumptions, one with none — feeds both to the model, and compares the @@ -197,7 +199,7 @@ async def _simulate_model_exogenous( Args: db: Database session. request: The baseline ``run_id``, horizon, and assumptions. - bundle: The already-loaded regression model bundle. + bundle: The already-loaded feature-aware model bundle. store_id: Store the baseline model targets. product_id: Product the baseline model targets. @@ -212,7 +214,7 @@ async def _simulate_model_exogenous( history_tail_raw = bundle.metadata.get("history_tail") if not isinstance(feature_columns_raw, list) or not isinstance(history_tail_raw, list): raise ValueError( - f"Model artifact for run_id '{request.run_id}' is a regression " + f"Model artifact for run_id '{request.run_id}' is a feature-aware " "model without the feature metadata a scenario forecast needs — " "retrain it with the current pipeline." ) diff --git a/app/features/scenarios/tests/conftest.py b/app/features/scenarios/tests/conftest.py index b1cc6eeb..c5ebc731 100644 --- a/app/features/scenarios/tests/conftest.py +++ b/app/features/scenarios/tests/conftest.py @@ -22,12 +22,24 @@ from app.core.config import get_settings from app.core.database import get_db -from app.features.forecasting.models import NaiveForecaster, RegressionForecaster +from app.features.forecasting.models import ( + LightGBMForecaster, + NaiveForecaster, + ProphetLikeForecaster, + RegressionForecaster, + XGBoostForecaster, +) from app.features.forecasting.persistence import ModelBundle, save_model_bundle -from app.features.forecasting.schemas import NaiveModelConfig, RegressionModelConfig -from app.features.scenarios.feature_frame import canonical_feature_columns +from app.features.forecasting.schemas import ( + LightGBMModelConfig, + NaiveModelConfig, + ProphetLikeModelConfig, + RegressionModelConfig, + XGBoostModelConfig, +) from app.features.scenarios.models import ScenarioPlan from app.main import app +from app.shared.feature_frames import canonical_feature_columns # Store / product the test bundle is trained for. High IDs that no seeder uses, # so the revenue calc deterministically hits the unit-price fallback. @@ -101,6 +113,106 @@ def trained_model() -> Generator[str, None, None]: (artifacts_dir / f"model_{run_id}.joblib").unlink(missing_ok=True) +@pytest.fixture +def trained_lightgbm_model() -> Generator[str, None, None]: + """Save a real fitted ``LightGBMForecaster`` bundle on disk; yield run_id. + + SKIPs when the optional ``ml-lightgbm`` dependency is absent. The bundle + carries the full PRP-27 feature metadata so the model-exogenous simulate + path can build a future feature frame and genuinely re-forecast — exactly + as it does for a regression bundle (PRP-30 / MLZOO-B). + """ + pytest.importorskip("lightgbm") + + settings = get_settings() + artifacts_dir = Path(settings.forecast_model_artifacts_dir) + artifacts_dir.mkdir(parents=True, exist_ok=True) + + run_id = uuid.uuid4().hex[:12] + columns = canonical_feature_columns() + rng = np.random.default_rng(7) + n_rows = 200 + features = rng.normal(size=(n_rows, len(columns))) + price_index = columns.index("price_factor") + target = 40.0 - 20.0 * features[:, price_index] + rng.normal(scale=0.5, size=n_rows) + + model = LightGBMForecaster(random_state=7) + model.fit(target.astype(np.float64), features.astype(np.float64)) + + history_start = date(2026, 4, 1) + bundle = ModelBundle( + model=model, + config=LightGBMModelConfig(), + metadata={ + "store_id": TEST_STORE_ID, + "product_id": TEST_PRODUCT_ID, + "train_end_date": TEST_TRAIN_END_DATE, + "n_observations": n_rows, + "feature_columns": columns, + "history_tail": [12.0] * 90, + "history_tail_dates": [ + (history_start + timedelta(days=offset)).isoformat() for offset in range(90) + ], + "launch_date": "2025-01-01", + }, + ) + save_model_bundle(bundle, artifacts_dir / f"model_{run_id}") + + yield run_id + + (artifacts_dir / f"model_{run_id}.joblib").unlink(missing_ok=True) + + +@pytest.fixture +def trained_xgboost_model() -> Generator[str, None, None]: + """Save a real fitted ``XGBoostForecaster`` bundle on disk; yield run_id. + + SKIPs when the optional ``ml-xgboost`` dependency is absent. The bundle + carries the full PRP-27 feature metadata so the model-exogenous simulate + path can build a future feature frame and genuinely re-forecast — exactly + as it does for a regression bundle (PRP-MLZOO-C1). + """ + pytest.importorskip("xgboost") + + settings = get_settings() + artifacts_dir = Path(settings.forecast_model_artifacts_dir) + artifacts_dir.mkdir(parents=True, exist_ok=True) + + run_id = uuid.uuid4().hex[:12] + columns = canonical_feature_columns() + rng = np.random.default_rng(7) + n_rows = 200 + features = rng.normal(size=(n_rows, len(columns))) + price_index = columns.index("price_factor") + target = 40.0 - 20.0 * features[:, price_index] + rng.normal(scale=0.5, size=n_rows) + + model = XGBoostForecaster(random_state=7) + model.fit(target.astype(np.float64), features.astype(np.float64)) + + history_start = date(2026, 4, 1) + bundle = ModelBundle( + model=model, + config=XGBoostModelConfig(), + metadata={ + "store_id": TEST_STORE_ID, + "product_id": TEST_PRODUCT_ID, + "train_end_date": TEST_TRAIN_END_DATE, + "n_observations": n_rows, + "feature_columns": columns, + "history_tail": [12.0] * 90, + "history_tail_dates": [ + (history_start + timedelta(days=offset)).isoformat() for offset in range(90) + ], + "launch_date": "2025-01-01", + }, + ) + save_model_bundle(bundle, artifacts_dir / f"model_{run_id}") + + yield run_id + + (artifacts_dir / f"model_{run_id}.joblib").unlink(missing_ok=True) + + @pytest.fixture def trained_regression_model() -> Generator[str, None, None]: """Save a real fitted ``RegressionForecaster`` bundle on disk; yield run_id. @@ -150,3 +262,53 @@ def trained_regression_model() -> Generator[str, None, None]: yield run_id (artifacts_dir / f"model_{run_id}.joblib").unlink(missing_ok=True) + + +@pytest.fixture +def trained_prophet_like_model() -> Generator[str, None, None]: + """Save a real fitted ``ProphetLikeForecaster`` bundle on disk; yield run_id. + + The Prophet-like additive model is feature-aware (pure scikit-learn — no + flag, no optional dependency), so the bundle carries the full PRP-27 + feature metadata and the model-exogenous simulate path can build a future + feature frame and genuinely re-forecast — exactly as it does for a + regression or LightGBM bundle. Demand is wired to respond negatively to + ``price_factor`` so a price cut lifts the forecast. + """ + settings = get_settings() + artifacts_dir = Path(settings.forecast_model_artifacts_dir) + artifacts_dir.mkdir(parents=True, exist_ok=True) + + run_id = uuid.uuid4().hex[:12] + columns = canonical_feature_columns() + rng = np.random.default_rng(7) + n_rows = 200 + features = rng.normal(size=(n_rows, len(columns))) + price_index = columns.index("price_factor") + target = 40.0 - 20.0 * features[:, price_index] + rng.normal(scale=0.5, size=n_rows) + + model = ProphetLikeForecaster(random_state=7) + model.fit(target.astype(np.float64), features.astype(np.float64)) + + history_start = date(2026, 4, 1) + bundle = ModelBundle( + model=model, + config=ProphetLikeModelConfig(), + metadata={ + "store_id": TEST_STORE_ID, + "product_id": TEST_PRODUCT_ID, + "train_end_date": TEST_TRAIN_END_DATE, + "n_observations": n_rows, + "feature_columns": columns, + "history_tail": [12.0] * 90, + "history_tail_dates": [ + (history_start + timedelta(days=offset)).isoformat() for offset in range(90) + ], + "launch_date": "2025-01-01", + }, + ) + save_model_bundle(bundle, artifacts_dir / f"model_{run_id}") + + yield run_id + + (artifacts_dir / f"model_{run_id}.joblib").unlink(missing_ok=True) diff --git a/app/features/scenarios/tests/test_feature_frame.py b/app/features/scenarios/tests/test_feature_frame.py index 22306eb1..3f14b09b 100644 --- a/app/features/scenarios/tests/test_feature_frame.py +++ b/app/features/scenarios/tests/test_feature_frame.py @@ -12,16 +12,9 @@ from datetime import date, timedelta from app.features.scenarios.feature_frame import ( - CALENDAR_COLUMNS, - EXOGENOUS_COLUMNS, - EXOGENOUS_LAGS, - HISTORY_TAIL_DAYS, MAX_COMPARE_SCENARIOS, assemble_future_frame, - build_calendar_columns, build_exogenous_columns, - build_long_lag_columns, - canonical_feature_columns, ) from app.features.scenarios.schemas import ( HolidayAssumption, @@ -29,6 +22,15 @@ PromotionAssumption, ScenarioAssumptions, ) +from app.shared.feature_frames import ( + CALENDAR_COLUMNS, + EXOGENOUS_COLUMNS, + EXOGENOUS_LAGS, + HISTORY_TAIL_DAYS, + build_calendar_columns, + build_long_lag_columns, + canonical_feature_columns, +) _ORIGIN = date(2026, 6, 30) _HORIZON = 14 diff --git a/app/features/scenarios/tests/test_future_frame_leakage.py b/app/features/scenarios/tests/test_future_frame_leakage.py index 4a4659de..32e51afa 100644 --- a/app/features/scenarios/tests/test_future_frame_leakage.py +++ b/app/features/scenarios/tests/test_future_frame_leakage.py @@ -1,9 +1,16 @@ -"""Leakage spec for the future feature frame — LOAD-BEARING (PRP-27 Phase A). +"""Leakage spec for the scenarios future frame — LOAD-BEARING (PRP-27 Phase A). This file IS the spec, mirroring ``app/features/featuresets/tests/test_leakage.py`` and ``app/features/scenarios/tests/test_leakage.py``: it must NEVER be weakened to make a feature pass (AGENTS.md § Safety). +Its scope is the parts of the future frame the **scenarios slice** owns: the +assumption-driven exogenous columns (``build_exogenous_columns``) and the +end-to-end assembled frame (``assemble_future_frame``). The shared pure builders +(``build_calendar_columns``, ``build_long_lag_columns``) moved to +``app/shared/feature_frames`` in MLZOO-A and are spec'd by the load-bearing +``app/shared/feature_frames/tests/test_leakage.py``. + The model-driven scenario path re-forecasts demand through a feature-consuming regressor, which means it builds a *future feature frame*. A horizon day has no observed target, so the invariant is: @@ -16,15 +23,9 @@ Concretely this spec asserts: -1. ``build_long_lag_columns`` returns only values drawn from ``history_tail`` - (entirely ``<= T``) or ``NaN`` — never a value from the future target - series. -2. A lag cell whose source day lies at or after the first horizon day is - ``NaN`` — the generator never fabricates or recursively predicts it. -3. Calendar columns are independent of the target series entirely. -4. An assumption window that falls before the forecast origin contributes +1. An assumption window that falls before the forecast origin contributes nothing — every horizon day lies strictly after ``T``. -5. Every non-``NaN`` ``lag_*`` cell in an assembled frame is a member of +2. Every non-``NaN`` ``lag_*`` cell in an assembled frame is a member of ``history_tail``. """ @@ -34,14 +35,11 @@ from datetime import date, timedelta from app.features.scenarios.feature_frame import ( - EXOGENOUS_LAGS, assemble_future_frame, - build_calendar_columns, build_exogenous_columns, - build_long_lag_columns, - canonical_feature_columns, ) from app.features.scenarios.schemas import PriceAssumption, ScenarioAssumptions +from app.shared.feature_frames import EXOGENOUS_LAGS, canonical_feature_columns # The forecast origin T is the last observed day; the horizon runs T+1 … T+H. _ORIGIN = date(2026, 6, 30) @@ -56,70 +54,6 @@ _FUTURE_TARGETS = {9000.0 + float(i) for i in range(_HORIZON)} -def test_long_lag_columns_never_emit_a_future_target() -> None: - """Every non-NaN long-lag cell is drawn from the observed history. - - ``build_long_lag_columns`` takes ONLY ``history_tail`` as data input — it - is structurally incapable of reading the future target series. This spec - pins that: no value disjoint from ``history_tail`` may ever appear. - """ - history_values = set(_HISTORY_TAIL) - columns = build_long_lag_columns(_HISTORY_TAIL, _HORIZON) - - for name, values in columns.items(): - for cell in values: - if math.isnan(cell): - continue - assert cell in history_values, ( - f"{name} emitted {cell}, which is not an observed history value" - ) - assert cell not in _FUTURE_TARGETS, f"{name} leaked a future target value {cell}" - - -def test_long_lag_source_index_is_never_at_or_after_the_horizon() -> None: - """A lag cell is populated only when its source day lies at/before ``T``. - - For lag ``k`` and horizon day ``j`` the source index into ``history_tail`` - is ``(j-1)-k``. A non-NaN cell REQUIRES that index to be negative — i.e. - the source target lies at or before the origin ``T``. A non-negative index - would point at a future horizon day and MUST yield ``NaN``. - """ - columns = build_long_lag_columns(_HISTORY_TAIL, _HORIZON) - for lag in EXOGENOUS_LAGS: - column = columns[f"lag_{lag}"] - for j in range(1, _HORIZON + 1): - source_index = (j - 1) - lag - cell = column[j - 1] - if source_index >= 0: - assert math.isnan(cell), ( - f"lag_{lag} day {j}: source index {source_index} is in the " - "future but the cell is not NaN" - ) - else: - assert not math.isnan(cell), ( - f"lag_{lag} day {j}: source index {source_index} is in " - "history but the cell is NaN" - ) - - -def test_calendar_columns_are_independent_of_the_target_series() -> None: - """Calendar columns read only the dates — they cannot leak the target. - - ``build_calendar_columns`` does not accept the target series at all; this - spec pins that structural fact by asserting its output is identical no - matter what history precedes it. - """ - calendar_a = build_calendar_columns(_HORIZON_DATES) - calendar_b = build_calendar_columns(_HORIZON_DATES) - assert calendar_a == calendar_b - # No calendar value coincides with a history or future target value. - history_values = set(_HISTORY_TAIL) - for values in calendar_a.values(): - for cell in values: - assert cell not in history_values - assert cell not in _FUTURE_TARGETS - - def test_assumption_window_before_origin_has_no_effect() -> None: """A price window entirely before the forecast origin contributes nothing. diff --git a/app/features/scenarios/tests/test_routes_integration.py b/app/features/scenarios/tests/test_routes_integration.py index f1d16976..79092115 100644 --- a/app/features/scenarios/tests/test_routes_integration.py +++ b/app/features/scenarios/tests/test_routes_integration.py @@ -163,12 +163,85 @@ async def test_regression_baseline_returns_model_exogenous( data = response.json() assert data["method"] == "model_exogenous" + + async def test_lightgbm_baseline_returns_model_exogenous( + self, client: AsyncClient, trained_lightgbm_model: str + ) -> None: + """A LightGBM baseline is feature-aware — it re-forecasts (PRP-30). + + Pins the capability-based dispatch in ``ScenarioService.simulate`` — + the branch is ``bundle.model.requires_features``, not a hard-coded + ``model_type == "regression"`` string. + """ + response = await client.post( + "/scenarios/simulate", + json={ + "run_id": trained_lightgbm_model, + "horizon": 14, + "assumptions": _PRICE_ASSUMPTION, + }, + ) + assert response.status_code == 200 + data = response.json() + + assert data["method"] == "model_exogenous" + assert data["disclaimer"], "every comparison must carry a non-empty disclaimer" + assert len(data["points"]) == 14 assert data["disclaimer"], "every comparison must carry a non-empty disclaimer" assert len(data["points"]) == 14 # A price cut moves the re-forecast — the deltas are model-driven, not # a fixed multiplier, and the modelled price response lifts demand. assert data["units_delta"] > 0.0 + async def test_xgboost_baseline_returns_model_exogenous( + self, client: AsyncClient, trained_xgboost_model: str + ) -> None: + """An XGBoost baseline is feature-aware — it re-forecasts (PRP-MLZOO-C1). + + Pins the capability-based dispatch in ``ScenarioService.simulate`` — + the branch is ``bundle.model.requires_features``, not a hard-coded + ``model_type == "regression"`` string. + """ + response = await client.post( + "/scenarios/simulate", + json={ + "run_id": trained_xgboost_model, + "horizon": 14, + "assumptions": _PRICE_ASSUMPTION, + }, + ) + assert response.status_code == 200 + data = response.json() + + assert data["method"] == "model_exogenous" + assert data["disclaimer"], "every comparison must carry a non-empty disclaimer" + assert len(data["points"]) == 14 + + async def test_prophet_like_baseline_returns_model_exogenous( + self, client: AsyncClient, trained_prophet_like_model: str + ) -> None: + """A prophet_like baseline is feature-aware — it re-forecasts (MLZOO-C2). + + Pins that the capability-based dispatch in ``ScenarioService.simulate`` + (the ``bundle.model.requires_features`` branch) routes the pure-sklearn + additive model through the genuine re-forecast path with zero scenarios + changes — no flag, no optional dependency. + """ + response = await client.post( + "/scenarios/simulate", + json={ + "run_id": trained_prophet_like_model, + "horizon": 14, + "assumptions": _PRICE_ASSUMPTION, + }, + ) + assert response.status_code == 200 + data = response.json() + + assert data["method"] == "model_exogenous" + assert data["disclaimer"], "every comparison must carry a non-empty disclaimer" + assert len(data["points"]) == 14 + async def test_regression_empty_assumptions_equals_baseline( self, client: AsyncClient, trained_regression_model: str ) -> None: diff --git a/app/main.py b/app/main.py index 3cb36c8e..0558ec6a 100644 --- a/app/main.py +++ b/app/main.py @@ -116,7 +116,9 @@ def create_app() -> FastAPI: r"localhost|127\.0\.0\.1|" r"10\.\d+\.\d+\.\d+|" r"192\.168\.\d+\.\d+|" - r"172\.(1[6-9]|2\d|3[0-1])\.\d+\.\d+" + r"172\.(1[6-9]|2\d|3[0-1])\.\d+\.\d+|" + # Tailscale CGNAT range (100.64.0.0/10) — dev-only remote access + r"100\.(6[4-9]|[7-9]\d|1[01]\d|12[0-7])\.\d+\.\d+" r")(:\d+)?$" if settings.is_development else None diff --git a/app/shared/feature_frames/__init__.py b/app/shared/feature_frames/__init__.py new file mode 100644 index 00000000..df0568b4 --- /dev/null +++ b/app/shared/feature_frames/__init__.py @@ -0,0 +1,45 @@ +"""Shared feature-frame contract for feature-aware forecasting (MLZOO-A). + +The single, cross-cutting home for the regression feature-frame contract — the +pinned constants, the canonical column set, the :class:`FutureFeatureFrame` +carrier, the leakage-safe pure builders, and the :class:`FeatureSafety` +taxonomy. Both the ``forecasting`` slice (historical training frame) and the +``scenarios`` slice (future prediction frame) import from here, so the contract +is defined exactly once. + +This package is leaf-level: it imports nothing from ``app/features/**``. +""" + +from app.shared.feature_frames.contract import ( + CALENDAR_COLUMNS, + EXOGENOUS_COLUMNS, + EXOGENOUS_LAGS, + FEATURE_CLASS, + HISTORY_TAIL_DAYS, + FeatureSafety, + FutureFeatureFrame, + build_calendar_columns, + build_long_lag_columns, + canonical_feature_columns, + feature_safety, +) +from app.shared.feature_frames.rows import ( + build_future_feature_rows, + build_historical_feature_rows, +) + +__all__ = [ + "CALENDAR_COLUMNS", + "EXOGENOUS_COLUMNS", + "EXOGENOUS_LAGS", + "FEATURE_CLASS", + "HISTORY_TAIL_DAYS", + "FeatureSafety", + "FutureFeatureFrame", + "build_calendar_columns", + "build_future_feature_rows", + "build_historical_feature_rows", + "build_long_lag_columns", + "canonical_feature_columns", + "feature_safety", +] diff --git a/app/shared/feature_frames/contract.py b/app/shared/feature_frames/contract.py new file mode 100644 index 00000000..5102a389 --- /dev/null +++ b/app/shared/feature_frames/contract.py @@ -0,0 +1,240 @@ +"""Shared feature-frame contract for feature-aware forecasting (MLZOO-A). + +This module is the **single source of truth** for the regression feature-frame +contract: the pinned modelling constants, the canonical feature-column set and +its order, the :class:`FutureFeatureFrame` carrier, the leakage-safe pure +column builders, and the :class:`FeatureSafety` leakage taxonomy. + +Before MLZOO-A the contract was duplicated across two vertical slices — the +``forecasting`` slice (the historical training frame) and the ``scenarios`` +slice (the future prediction frame) — because a cross-slice import is forbidden +(AGENTS.md § Architecture). A cross-cutting package under ``app/shared/`` is the +sanctioned home: both slices now *import* this one definition rather than +re-typing it, so a silent column-order mismatch is structurally impossible. + +LEAF-LEVEL: ``app/shared/**`` may NEVER import from ``app/features/**``. Every +builder here is pure (stdlib ``math`` / ``datetime`` / ``dataclasses`` only) so +that invariant holds; ``tests/test_contract.py`` enforces it with an AST walk. + +The leakage rule the future-frame builders obey (mirrors PRP-27 and the +load-bearing ``tests/test_leakage.py`` alongside this module): + + A future feature value for horizon day ``D`` may use ONLY information + knowable at the forecast origin ``T``: the observed history up to and + including ``T``, or the calendar (a pure function of the date). It may + NEVER read an observed target at a horizon day. +""" + +from __future__ import annotations + +import math +from dataclasses import dataclass +from datetime import date, timedelta +from enum import Enum + +# ── PINNED modelling constants (PRP-27 DECISIONS LOCKED #10/#11) ── +# Lag offsets (days) for the target long-lag columns: daily, weekly, +# fortnightly, and a four-week lag covering the dominant retail seasonality. +EXOGENOUS_LAGS: tuple[int, ...] = (1, 7, 14, 28) +# Observed-target tail (days, ending at the forecast origin T) fed to the +# generator — 90 comfortably exceeds the largest lag offset (28). +HISTORY_TAIL_DAYS: int = 90 + +# Fixed calendar columns — each a pure function of the date, never a leak. +CALENDAR_COLUMNS: tuple[str, ...] = ( + "dow_sin", + "dow_cos", + "month_sin", + "month_cos", + "is_weekend", + "is_month_end", +) +# Fixed current-day exogenous columns — driven by the scenario assumptions +# (the planner's posited future inputs) and by timeless attributes (the +# calendar, the product launch date). Every value is knowable at origin T. +EXOGENOUS_COLUMNS: tuple[str, ...] = ( + "price_factor", + "promo_active", + "is_holiday", + "days_since_launch", +) + + +@dataclass +class FutureFeatureFrame: + """A horizon-length feature matrix for one ``(store, product)`` series. + + Attributes: + dates: The horizon days ``T+1 … T+horizon`` (chronological). + feature_columns: Column order — matches the trained bundle exactly. + matrix: Row-major ``[horizon][n_features]``; ``NaN`` is allowed and + expected (a long-lag cell whose source target lies in the future, + or ``days_since_launch`` when the product has no launch date). + """ + + dates: list[date] + feature_columns: list[str] + matrix: list[list[float]] + + +def canonical_feature_columns(lags: tuple[int, ...] = EXOGENOUS_LAGS) -> list[str]: + """Return the fixed, ordered regression feature-column list. + + This is the single source of truth for the regression feature set. The + Phase B training path persists exactly this list in the model bundle's + metadata; the future frame reproduces it column-for-column. The column + set is deliberately *fixed* (not horizon-dependent): for a long horizon + some target-lag columns are mostly ``NaN``, which the NaN-tolerant + estimator handles — far safer than a horizon-varying column set. + + Args: + lags: Target long-lag offsets (defaults to the pinned ``EXOGENOUS_LAGS``). + + Returns: + Ordered column names: target lags, then calendar, then exogenous. + """ + target_lags = [f"lag_{k}" for k in lags] + return [*target_lags, *CALENDAR_COLUMNS, *EXOGENOUS_COLUMNS] + + +def _is_month_end(point_date: date) -> bool: + """True when ``point_date`` is the last day of its month.""" + return (point_date + timedelta(days=1)).month != point_date.month + + +def build_calendar_columns(dates: list[date]) -> dict[str, list[float]]: + """Build the calendar feature columns — a pure function of each date. + + Calendar features carry zero leakage risk: they read only the date + itself, never the target series. Day-of-week and month use cyclical + (sin/cos) encoding so the estimator sees their periodic structure. + + Args: + dates: The horizon days. + + Returns: + A mapping of every name in :data:`CALENDAR_COLUMNS` to its per-day + values. + """ + columns: dict[str, list[float]] = {name: [] for name in CALENDAR_COLUMNS} + for point_date in dates: + dow = point_date.weekday() # 0 = Monday … 6 = Sunday + month = point_date.month + columns["dow_sin"].append(math.sin(2.0 * math.pi * dow / 7.0)) + columns["dow_cos"].append(math.cos(2.0 * math.pi * dow / 7.0)) + columns["month_sin"].append(math.sin(2.0 * math.pi * month / 12.0)) + columns["month_cos"].append(math.cos(2.0 * math.pi * month / 12.0)) + columns["is_weekend"].append(1.0 if dow >= 5 else 0.0) + columns["is_month_end"].append(1.0 if _is_month_end(point_date) else 0.0) + return columns + + +def build_long_lag_columns( + history_tail: list[float], + horizon: int, + lags: tuple[int, ...] = EXOGENOUS_LAGS, +) -> dict[str, list[float]]: + """Build the target long-lag columns — the leakage-critical helper. + + ``history_tail`` is the observed target series ending at the forecast + origin ``T``: ``history_tail[-1] == y[T]``, ``history_tail[-2] == y[T-1]``, + and so on. The lag-``k`` column at horizon day ``T+j`` (``j`` in + ``1 … horizon``) is the observed target ``y[T+j-k]``. + + SAFETY (PRP-27 DECISIONS LOCKED #4): the source index into + ``history_tail`` is ``idx = (j - 1) - k``. The cell is populated **only + when ``idx < 0``** — i.e. the source day ``T+j-k`` lies at or before the + origin ``T`` and therefore inside ``history_tail``. When ``idx >= 0`` the + source day is a *future* horizon day with no observed target, so the cell + is ``NaN`` — never a recursive prediction, never a fabricated value. This + function structurally **cannot** read a future target: its only data + input is ``history_tail`` (entirely ``<= T``). + + Args: + history_tail: Observed target values ending at the origin ``T``. + horizon: Number of horizon days. + lags: Lag offsets (defaults to the pinned ``EXOGENOUS_LAGS``). + + Returns: + A mapping ``"lag_{k}" -> [horizon values]``; out-of-range cells are + ``NaN``. + """ + tail_len = len(history_tail) + columns: dict[str, list[float]] = {} + for lag in lags: + column: list[float] = [] + for j in range(1, horizon + 1): + # Negative index from the end of history_tail. idx < 0 means the + # source day T+j-k is at/before the origin T — safe to read. + idx = (j - 1) - lag + if idx < 0 and -tail_len <= idx: + column.append(float(history_tail[idx])) + else: + column.append(math.nan) + columns[f"lag_{lag}"] = column + return columns + + +class FeatureSafety(Enum): + """Leakage classification of a feature column in a FUTURE prediction frame. + + Every canonical feature column falls into exactly one class. The + classification governs how a future-frame builder may populate the column + for a horizon day ``D`` (which has no observed target): + + * ``SAFE`` — a pure function of the date (calendar features); reading it + can never leak a future target. + * ``CONDITIONALLY_SAFE`` — a target long-lag; safe only when its source + day lies at or before the forecast origin ``T``, otherwise the cell is + ``NaN``. + * ``UNSAFE_UNLESS_SUPPLIED`` — a future price / promotion input; knowable + at ``T`` ONLY because the caller posits it (a scenario assumption). It + is never inferred from observed data. + """ + + SAFE = "safe" + CONDITIONALLY_SAFE = "conditionally_safe" + UNSAFE_UNLESS_SUPPLIED = "unsafe_unless_supplied" + + +# The executable taxonomy: every canonical column → its FeatureSafety class. +# ``is_holiday`` and ``days_since_launch`` are SAFE — a calendar holiday row is +# a timeless attribute and ``days_since_launch`` is a pure function of the date +# once the launch date is known. ``price_factor`` / ``promo_active`` are +# UNSAFE_UNLESS_SUPPLIED — only a posited scenario assumption makes them +# knowable for a future day. +FEATURE_CLASS: dict[str, FeatureSafety] = { + **{f"lag_{k}": FeatureSafety.CONDITIONALLY_SAFE for k in EXOGENOUS_LAGS}, + "dow_sin": FeatureSafety.SAFE, + "dow_cos": FeatureSafety.SAFE, + "month_sin": FeatureSafety.SAFE, + "month_cos": FeatureSafety.SAFE, + "is_weekend": FeatureSafety.SAFE, + "is_month_end": FeatureSafety.SAFE, + "price_factor": FeatureSafety.UNSAFE_UNLESS_SUPPLIED, + "promo_active": FeatureSafety.UNSAFE_UNLESS_SUPPLIED, + "is_holiday": FeatureSafety.SAFE, + "days_since_launch": FeatureSafety.SAFE, +} + + +def feature_safety(column: str) -> FeatureSafety: + """Return the leakage classification of a feature column. + + Args: + column: A feature-column name (e.g. ``"lag_7"``, ``"dow_sin"``). + + Returns: + The column's :class:`FeatureSafety` class. A ``lag_*`` column with a + custom offset not literally in :data:`FEATURE_CLASS` resolves to + ``CONDITIONALLY_SAFE`` — every target lag is conditionally safe. + + Raises: + KeyError: When ``column`` is neither a known column nor a ``lag_*`` + column — callers must classify every column they emit. + """ + if column in FEATURE_CLASS: + return FEATURE_CLASS[column] + if column.startswith("lag_"): + return FeatureSafety.CONDITIONALLY_SAFE + raise KeyError(f"Unclassified feature column: {column!r}") diff --git a/app/shared/feature_frames/rows.py b/app/shared/feature_frames/rows.py new file mode 100644 index 00000000..9aed8adc --- /dev/null +++ b/app/shared/feature_frames/rows.py @@ -0,0 +1,201 @@ +"""Shared row-matrix assemblers for feature-aware forecasting (MLZOO-B.2). + +This module joins :mod:`app.shared.feature_frames.contract` under the +cross-cutting ``app/shared/feature_frames`` package. ``contract.py`` owns the +pinned constants, the canonical column set, the leakage-safe *column* builders +and the :class:`~app.shared.feature_frames.contract.FeatureSafety` taxonomy; +this module owns the two *row-matrix* assemblers built on top of them: + +* :func:`build_historical_feature_rows` — the historical (training) feature + matrix. Promoted verbatim from ``ForecastingService._assemble_regression_rows`` + so the ``backtesting`` slice can reuse it without a forbidden cross-slice + import (``backtesting -> forecasting`` is not allowed; ``-> app/shared`` is). +* :func:`build_future_feature_rows` — the test-window (future) feature matrix + for a backtest fold. Leakage-safe by construction (see below). + +LEAF-LEVEL: like ``contract.py`` this module may NEVER import from +``app/features/**``. Every function is pure — stdlib ``math`` / ``datetime`` +plus the contract builders only. ``tests/test_contract.py`` enforces it with +an AST walk; ``tests/test_leakage.py`` pins the leakage invariants. + +The leakage rule the future builder obeys (mirrors ``contract.py`` and the +load-bearing ``tests/test_leakage.py``): + + A future feature value for a test-window day may use ONLY information + knowable at the forecast origin ``T`` — the observed history up to and + including ``T``, the calendar (a pure function of the date), or an + exogenous input recorded for the test window (price / promotion). It may + NEVER read an observed *target* at a test-window day. +""" + +from __future__ import annotations + +import math +from datetime import date + +from app.shared.feature_frames.contract import ( + CALENDAR_COLUMNS, + EXOGENOUS_LAGS, + build_calendar_columns, + build_long_lag_columns, + canonical_feature_columns, +) + + +def build_historical_feature_rows( + *, + dates: list[date], + quantities: list[float], + prices: list[float], + baseline_price: float, + promo_dates: set[date], + holiday_dates: set[date], + launch_date: date | None, +) -> list[list[float]]: + """Assemble the historical regression feature matrix — pure, leakage-safe. + + Time-safe by construction: every lag column at row ``i`` reads only the + observed target at ``i - lag`` (a strictly earlier day); calendar columns + are pure functions of the date; ``price_factor`` / ``promo_active`` / + ``is_holiday`` / ``days_since_launch`` read the same-day exogenous + attributes. No row reads a future observation. + + Column order is :func:`canonical_feature_columns` exactly: the target + lags, then the calendar columns, then ``price_factor``, ``promo_active``, + ``is_holiday``, ``days_since_launch``. + + Promoted verbatim from ``ForecastingService._assemble_regression_rows`` + (which now delegates here) so the leakage invariant is unit-tested without + a database (``app/features/forecasting/tests/test_regression_features_leakage.py``) + and the ``backtesting`` slice can reuse it without a cross-slice import. + + Args: + dates: Observed days in chronological order. + quantities: Observed target values aligned with ``dates``. + prices: Observed unit prices aligned with ``dates``. + baseline_price: The typical price; ``price_factor`` is the ratio to it. + promo_dates: Days a promotion covered. + holiday_dates: Calendar holiday days. + launch_date: The product's launch date, or ``None``. + + Returns: + Row-major feature matrix ``[n_observations][n_features]``; ``NaN`` marks + a lag whose source day precedes the series, and ``days_since_launch`` + when the product has no launch date. + """ + calendar_columns = build_calendar_columns(dates) + rows: list[list[float]] = [] + for index, day in enumerate(dates): + row: list[float] = [] + # Target long-lag columns — read only strictly-earlier observations. + for lag in EXOGENOUS_LAGS: + row.append(quantities[index - lag] if index >= lag else math.nan) + # Calendar columns — pure functions of the date (shared builder). + for name in CALENDAR_COLUMNS: + row.append(calendar_columns[name][index]) + # Exogenous columns — same-day observed attributes. + row.append(prices[index] / baseline_price) + row.append(1.0 if day in promo_dates else 0.0) + row.append(1.0 if day in holiday_dates else 0.0) + row.append(float((day - launch_date).days) if launch_date is not None else math.nan) + rows.append(row) + return rows + + +def build_future_feature_rows( + *, + test_dates: list[date], + history_tail: list[float], + gap: int, + test_prices: list[float], + baseline_price: float, + test_promo_dates: set[date], + test_holiday_dates: set[date], + launch_date: date | None, +) -> list[list[float]]: + """Assemble a backtest fold's test-window feature matrix — leakage-safe. + + This is the leakage-critical builder. A test-window day has no observed + target, so the matrix MUST be rebuilt here rather than sliced from the + historical matrix — a sliced historical row would read an adjacent + test-day observed target as its ``lag_1`` cell (target leakage). + + Column population by class (matches the canonical column order exactly): + + * **Target lags** (``lag_*``) — from :func:`build_long_lag_columns` over + ``history_tail``, which ends at the fold origin ``T``. A lag cell whose + source day lies in the test window is ``NaN`` — structurally enforced, + never a recursive prediction. + * **Calendar columns** — pure functions of the test-window date. + * **Exogenous columns** (``price_factor`` / ``promo_active``) — the + *observed* recorded price / promotion for the test window. This reads + no target ``y`` (not target leakage); it is exogenous foresight under + the ``observed`` policy and assumes the future price/promo plan was + known at ``T``. + * ``is_holiday`` / ``days_since_launch`` — calendar / launch-date + attributes, knowable at ``T``. + + Gap handling: with ``gap > 0`` the first test day is ``T + gap + 1`` but + :func:`build_long_lag_columns` indexes its day ``m`` as ``T + m``. The lag + columns are therefore built for ``gap + len(test_dates)`` days and the + first ``gap`` rows dropped. With ``gap == 0`` the slice is a no-op. + + Args: + test_dates: The fold's test-window days (chronological). + history_tail: Observed targets ending at the fold origin ``T`` + (``history_tail[-1] == y[T]``); excludes the gap days. + gap: Gap days between train end and test start (simulated latency). + test_prices: Recorded unit prices aligned with ``test_dates``. + baseline_price: The typical price; ``price_factor`` is the ratio to it. + test_promo_dates: Test-window days a promotion covered. + test_holiday_dates: Test-window calendar holiday days. + launch_date: The product's launch date, or ``None``. + + Returns: + Row-major feature matrix ``[len(test_dates)][n_features]`` in canonical + column order; ``NaN`` marks a future-sourced lag cell and + ``days_since_launch`` when the product has no launch date. + + Raises: + ValueError: When ``gap`` is negative, ``test_prices`` does not align + with ``test_dates``, or a canonical column cannot be sourced. + """ + horizon = len(test_dates) + if gap < 0: + raise ValueError(f"build_future_feature_rows: gap must be >= 0, got {gap}") + if len(test_prices) != horizon: + raise ValueError( + f"build_future_feature_rows: test_prices has {len(test_prices)} entries " + f"but test_dates has {horizon} — they must align" + ) + + # Lags: build for gap + horizon days, then drop the gap lead-in so row j + # corresponds to test day j. NaN-where-future is enforced by the builder. + lag_columns = build_long_lag_columns(history_tail, gap + horizon) + lag_columns = {name: values[gap:] for name, values in lag_columns.items()} + calendar_columns = build_calendar_columns(test_dates) + calendar_names = set(CALENDAR_COLUMNS) + columns = canonical_feature_columns() + + rows: list[list[float]] = [] + for j, day in enumerate(test_dates): + row: list[float] = [] + for column in columns: + if column.startswith("lag_"): # target lag — NaN where future + row.append(lag_columns[column][j]) + elif column in calendar_names: # pure function of the date + row.append(calendar_columns[column][j]) + elif column == "price_factor": # observed exogenous foresight + row.append(test_prices[j] / baseline_price) + elif column == "promo_active": # observed exogenous foresight + row.append(1.0 if day in test_promo_dates else 0.0) + elif column == "is_holiday": # calendar attribute + row.append(1.0 if day in test_holiday_dates else 0.0) + elif column == "days_since_launch": # pure function of the date + row.append(float((day - launch_date).days) if launch_date is not None else math.nan) + else: # loud failure — never a silent 0.0 / NaN fill + raise ValueError( + f"build_future_feature_rows: cannot source future column {column!r}" + ) + rows.append(row) + return rows diff --git a/app/shared/feature_frames/tests/__init__.py b/app/shared/feature_frames/tests/__init__.py new file mode 100644 index 00000000..d7a9bd24 --- /dev/null +++ b/app/shared/feature_frames/tests/__init__.py @@ -0,0 +1 @@ +"""Tests for the shared feature-frame contract package.""" diff --git a/app/shared/feature_frames/tests/test_contract.py b/app/shared/feature_frames/tests/test_contract.py new file mode 100644 index 00000000..6280e19e --- /dev/null +++ b/app/shared/feature_frames/tests/test_contract.py @@ -0,0 +1,150 @@ +"""Unit tests for the shared feature-frame contract (MLZOO-A). + +Covers the canonical column set + order, the pinned constants, the +:class:`FeatureSafety` taxonomy coverage, the :class:`FutureFeatureFrame` +shape, builder determinism, and the leaf-level architectural invariant +(``app/shared/**`` never imports ``app/features/**``). + +The leakage invariants live separately in ``test_leakage.py`` (load-bearing). +""" + +from __future__ import annotations + +import ast +from datetime import date, timedelta +from pathlib import Path + +from app.shared.feature_frames import ( + CALENDAR_COLUMNS, + EXOGENOUS_COLUMNS, + EXOGENOUS_LAGS, + HISTORY_TAIL_DAYS, + FeatureSafety, + FutureFeatureFrame, + build_calendar_columns, + canonical_feature_columns, + feature_safety, +) + +_ORIGIN = date(2026, 6, 30) +_HORIZON = 14 +_HORIZON_DATES = [_ORIGIN + timedelta(days=offset) for offset in range(1, _HORIZON + 1)] + + +# --- pinned constants --------------------------------------------------------- + + +def test_pinned_constants() -> None: + """The pinned modelling constants hold their decided values.""" + assert EXOGENOUS_LAGS == (1, 7, 14, 28) + assert HISTORY_TAIL_DAYS == 90 + + +def test_canonical_feature_columns_order() -> None: + """The canonical column list is target lags, then calendar, then exogenous.""" + columns = canonical_feature_columns() + assert columns[:4] == ["lag_1", "lag_7", "lag_14", "lag_28"] + assert columns[4 : 4 + len(CALENDAR_COLUMNS)] == list(CALENDAR_COLUMNS) + assert columns[-len(EXOGENOUS_COLUMNS) :] == list(EXOGENOUS_COLUMNS) + assert len(columns) == len(EXOGENOUS_LAGS) + len(CALENDAR_COLUMNS) + len(EXOGENOUS_COLUMNS) + + +# --- FeatureSafety taxonomy --------------------------------------------------- + + +def test_feature_class_covers_every_canonical_column() -> None: + """Every canonical column resolves to a FeatureSafety class — no KeyError.""" + for column in canonical_feature_columns(): + assert isinstance(feature_safety(column), FeatureSafety) + + +def test_calendar_columns_are_all_SAFE() -> None: + """Calendar columns are pure functions of the date — always SAFE.""" + for column in CALENDAR_COLUMNS: + assert feature_safety(column) is FeatureSafety.SAFE + + +def test_lag_columns_are_CONDITIONALLY_SAFE() -> None: + """Target long-lag columns — including custom offsets — are conditionally safe.""" + for lag in EXOGENOUS_LAGS: + assert feature_safety(f"lag_{lag}") is FeatureSafety.CONDITIONALLY_SAFE + # A custom lag offset not literally in FEATURE_CLASS still classifies. + assert feature_safety("lag_3") is FeatureSafety.CONDITIONALLY_SAFE + + +def test_exogenous_price_and_promo_are_unsafe_unless_supplied() -> None: + """Future price / promotion inputs are knowable only when posited.""" + assert feature_safety("price_factor") is FeatureSafety.UNSAFE_UNLESS_SUPPLIED + assert feature_safety("promo_active") is FeatureSafety.UNSAFE_UNLESS_SUPPLIED + + +def test_feature_safety_rejects_an_unclassified_column() -> None: + """A genuinely unknown column raises — callers must classify every column.""" + try: + feature_safety("mystery_feature") + except KeyError: + pass + else: + raise AssertionError("feature_safety must raise KeyError for an unknown column") + + +# --- FutureFeatureFrame dataclass --------------------------------------------- + + +def test_future_feature_frame_dataclass_shape() -> None: + """FutureFeatureFrame carries dates, feature_columns, and a row-major matrix.""" + columns = canonical_feature_columns() + frame = FutureFeatureFrame( + dates=list(_HORIZON_DATES), + feature_columns=columns, + matrix=[[0.0] * len(columns) for _ in _HORIZON_DATES], + ) + assert frame.dates == _HORIZON_DATES + assert frame.feature_columns == columns + assert len(frame.matrix) == _HORIZON + assert all(len(row) == len(columns) for row in frame.matrix) + + +# --- builder determinism ------------------------------------------------------ + + +def test_build_calendar_columns_is_deterministic() -> None: + """Calendar columns depend only on the dates — two calls match exactly.""" + first = build_calendar_columns(_HORIZON_DATES) + second = build_calendar_columns(list(_HORIZON_DATES)) + assert first == second + assert set(first) == set(CALENDAR_COLUMNS) + for values in first.values(): + assert len(values) == _HORIZON + + +# --- architectural invariant -------------------------------------------------- + + +def test_shared_package_imports_nothing_from_features() -> None: + """``app/shared/**`` is leaf-level — it may never import a vertical slice. + + Walks every ``.py`` file in the package and asserts no module imports a + name under ``app.features`` (AGENTS.md § Architecture). ``rows.py`` (the + MLZOO-B.2 row-matrix assemblers) is walked here too — see the explicit + assertion below that it is part of the package. + """ + pkg_dir = Path(__file__).resolve().parents[1] # app/shared/feature_frames/ + walked: set[str] = set() + for py_file in pkg_dir.rglob("*.py"): + walked.add(py_file.name) + source = py_file.read_text(encoding="utf-8") + for node in ast.walk(ast.parse(source)): + if isinstance(node, ast.ImportFrom) and node.module: + assert not node.module.startswith("app.features"), ( + f"ARCHITECTURE BREACH: {py_file} imports from {node.module}" + ) + if isinstance(node, ast.Import): + for alias in node.names: + assert not alias.name.startswith("app.features"), ( + f"ARCHITECTURE BREACH: {py_file} imports {alias.name}" + ) + # rows.py must exist and be covered by the leaf-level walk above. + assert {"contract.py", "rows.py"} <= walked, ( + f"expected contract.py and rows.py in the package walk, got {sorted(walked)}" + ) diff --git a/app/shared/feature_frames/tests/test_leakage.py b/app/shared/feature_frames/tests/test_leakage.py new file mode 100644 index 00000000..f1730d64 --- /dev/null +++ b/app/shared/feature_frames/tests/test_leakage.py @@ -0,0 +1,254 @@ +"""Leakage spec for the shared feature-frame builders — LOAD-BEARING (MLZOO-A). + +This file IS the spec, mirroring ``app/features/featuresets/tests/test_leakage.py`` +and ``app/features/scenarios/tests/test_future_frame_leakage.py``: it must NEVER +be weakened to make a feature pass (AGENTS.md § Safety). + +A feature-aware model re-forecasts demand through a *future feature frame*. A +horizon day has no observed target, so the invariant the shared pure builders +(:func:`build_long_lag_columns`, :func:`build_calendar_columns`) obey is: + + A future feature value for horizon day ``D`` may use ONLY information + knowable at the forecast origin ``T``: the observed history up to and + including ``T``, or the calendar (a pure function of the date). It may + NEVER read an observed target at a horizon day ``D`` (which lies after + ``T``). + +Concretely this spec asserts: + +1. ``build_long_lag_columns`` returns only values drawn from ``history_tail`` + (entirely ``<= T``) or ``NaN`` — never a value from the future target + series. +2. A lag cell whose source day lies at or after the first horizon day is + ``NaN`` — the generator never fabricates or recursively predicts it. +3. Calendar columns are independent of the target series entirely. + +The assumption-driven exogenous columns and the assembled-frame end-to-end +checks stay in ``app/features/scenarios/tests/test_future_frame_leakage.py`` — +those builders live in the scenarios slice. +""" + +from __future__ import annotations + +import math +from datetime import date, timedelta + +import pytest + +from app.shared.feature_frames import ( + EXOGENOUS_LAGS, + build_calendar_columns, + build_future_feature_rows, + build_historical_feature_rows, + build_long_lag_columns, + canonical_feature_columns, +) + +# The forecast origin T is the last observed day; the horizon runs T+1 … T+H. +_ORIGIN = date(2026, 6, 30) +_HORIZON = 21 +_HORIZON_DATES = [_ORIGIN + timedelta(days=offset) for offset in range(1, _HORIZON + 1)] + +# Observed history (all <= T): 90 distinct values 1000.0 … 1089.0. +# history_tail[-1] == y[T], the origin observation. +_HISTORY_TAIL = [1000.0 + float(i) for i in range(90)] +# A DISJOINT "future target" series the generator must never be able to read. +# Any of these values appearing in a feature cell is a leak. +_FUTURE_TARGETS = {9000.0 + float(i) for i in range(_HORIZON)} + + +def test_long_lag_columns_never_emit_a_future_target() -> None: + """Every non-NaN long-lag cell is drawn from the observed history. + + ``build_long_lag_columns`` takes ONLY ``history_tail`` as data input — it + is structurally incapable of reading the future target series. This spec + pins that: no value disjoint from ``history_tail`` may ever appear. + """ + history_values = set(_HISTORY_TAIL) + columns = build_long_lag_columns(_HISTORY_TAIL, _HORIZON) + + for name, values in columns.items(): + for cell in values: + if math.isnan(cell): + continue + assert cell in history_values, ( + f"{name} emitted {cell}, which is not an observed history value" + ) + assert cell not in _FUTURE_TARGETS, f"{name} leaked a future target value {cell}" + + +def test_long_lag_source_index_is_never_at_or_after_the_horizon() -> None: + """A lag cell is populated only when its source day lies at/before ``T``. + + For lag ``k`` and horizon day ``j`` the source index into ``history_tail`` + is ``(j-1)-k``. A non-NaN cell REQUIRES that index to be negative — i.e. + the source target lies at or before the origin ``T``. A non-negative index + would point at a future horizon day and MUST yield ``NaN``. + """ + columns = build_long_lag_columns(_HISTORY_TAIL, _HORIZON) + for lag in EXOGENOUS_LAGS: + column = columns[f"lag_{lag}"] + for j in range(1, _HORIZON + 1): + source_index = (j - 1) - lag + cell = column[j - 1] + if source_index >= 0: + assert math.isnan(cell), ( + f"lag_{lag} day {j}: source index {source_index} is in the " + "future but the cell is not NaN" + ) + else: + assert not math.isnan(cell), ( + f"lag_{lag} day {j}: source index {source_index} is in " + "history but the cell is NaN" + ) + + +def test_calendar_columns_are_independent_of_the_target_series() -> None: + """Calendar columns read only the dates — they cannot leak the target. + + ``build_calendar_columns`` does not accept the target series at all; this + spec pins that structural fact by asserting its output is identical no + matter what history precedes it. + """ + calendar_a = build_calendar_columns(_HORIZON_DATES) + calendar_b = build_calendar_columns(_HORIZON_DATES) + assert calendar_a == calendar_b + # No calendar value coincides with a history or future target value. + history_values = set(_HISTORY_TAIL) + for values in calendar_a.values(): + for cell in values: + assert cell not in history_values + assert cell not in _FUTURE_TARGETS + + +# --- build_future_feature_rows — the backtest test-window matrix (MLZOO-B.2) -- +# +# build_future_feature_rows assembles one backtest fold's test-window feature +# matrix. It receives ONLY history_tail (entirely <= the fold origin T) — it is +# structurally incapable of reading a test-window observed target. These specs +# pin that, the NaN-where-future contract (including a gap > 0 fold), and the +# historical-vs-future asymmetry that is the reason X_future is rebuilt here +# rather than sliced from the historical matrix. + +_TEST_WINDOW = 14 +_TEST_PRICES = [10.0] * _TEST_WINDOW + + +def test_future_lag_cells_are_drawn_only_from_history() -> None: + """Every non-NaN future lag cell comes from ``history_tail`` — never a target. + + ``build_future_feature_rows`` takes only ``history_tail`` as target data; + a value disjoint from it appearing in any lag cell would be a leak. + """ + test_dates = [_ORIGIN + timedelta(days=offset) for offset in range(1, _TEST_WINDOW + 1)] + columns = canonical_feature_columns() + rows = build_future_feature_rows( + test_dates=test_dates, + history_tail=_HISTORY_TAIL, + gap=0, + test_prices=_TEST_PRICES, + baseline_price=10.0, + test_promo_dates=set(), + test_holiday_dates=set(), + launch_date=None, + ) + history_values = set(_HISTORY_TAIL) + for lag in EXOGENOUS_LAGS: + col = columns.index(f"lag_{lag}") + for j in range(_TEST_WINDOW): + cell = rows[j][col] + if math.isnan(cell): + continue + assert cell in history_values, ( + f"lag_{lag} test day {j} emitted {cell}, not an observed history value" + ) + assert cell not in _FUTURE_TARGETS, ( + f"lag_{lag} test day {j} leaked a future target value {cell}" + ) + + +@pytest.mark.parametrize("gap", [0, 3, 7]) +def test_future_lag_is_nan_exactly_where_source_is_a_test_day(gap: int) -> None: + """A future lag cell is ``NaN`` exactly when its source day is in the test window. + + For lag ``k`` and test day ``j`` (0-indexed) the source day relative to the + origin ``T`` is ``T + gap + j + 1 - k``; it lies in the test window — and + the cell MUST be ``NaN`` — exactly when ``gap + j - k >= 0``. Otherwise the + source is observed history and the cell MUST carry a value. + """ + test_dates = [_ORIGIN + timedelta(days=gap + offset) for offset in range(1, _TEST_WINDOW + 1)] + columns = canonical_feature_columns() + rows = build_future_feature_rows( + test_dates=test_dates, + history_tail=_HISTORY_TAIL, + gap=gap, + test_prices=_TEST_PRICES, + baseline_price=10.0, + test_promo_dates=set(), + test_holiday_dates=set(), + launch_date=None, + ) + for lag in EXOGENOUS_LAGS: + col = columns.index(f"lag_{lag}") + for j in range(_TEST_WINDOW): + cell = rows[j][col] + if gap + j - lag >= 0: + assert math.isnan(cell), ( + f"gap={gap} lag_{lag} day {j}: source is a test day — expected NaN, got {cell}" + ) + else: + assert not math.isnan(cell), ( + f"gap={gap} lag_{lag} day {j}: source is in history — expected a value, got NaN" + ) + + +def test_historical_and_future_lag_columns_are_asymmetric() -> None: + """The crux of MLZOO-B.2: a historical lag row reads adjacent observed targets; + a future lag row does NOT — which is why ``X_future`` is rebuilt here and + never sliced from the historical matrix. + + A continuous sequential series is split at the origin ``T``. The historical + matrix row for a test-window day reads that day's neighbouring *observed* + target as ``lag_1`` (slicing it for ``X_future`` would be target leakage). + The future matrix produces ``NaN`` there instead. + """ + series_len = 60 + train_end = 40 # origin T is index 39 (the last train day) + full = [float(i + 1) for i in range(series_len)] + history_tail = full[:train_end] + columns = canonical_feature_columns() + lag1 = columns.index("lag_1") + + historical = build_historical_feature_rows( + dates=[_ORIGIN + timedelta(days=offset) for offset in range(series_len)], + quantities=full, + prices=[10.0] * series_len, + baseline_price=10.0, + promo_dates=set(), + holiday_dates=set(), + launch_date=None, + ) + # The historical matrix row for a TEST-window day reads an observed + # test-day target as lag_1 — proof that slicing it for X_future leaks. + assert historical[train_end + 1][lag1] == full[train_end], ( + "historical lag_1 for a test-window row must read the adjacent observed target" + ) + + test_window = 10 + future = build_future_feature_rows( + test_dates=[_ORIGIN + timedelta(days=offset) for offset in range(1, test_window + 1)], + history_tail=history_tail, + gap=0, + test_prices=[10.0] * test_window, + baseline_price=10.0, + test_promo_dates=set(), + test_holiday_dates=set(), + launch_date=None, + ) + # The future matrix: test day 0's lag_1 is y[T] (knowable); every later + # day's lag_1 is NaN — it never reads a test-window observed target. + assert future[0][lag1] == history_tail[-1], "future lag_1 day 0 must be the origin y[T]" + for j in range(1, test_window): + assert math.isnan(future[j][lag1]), ( + f"future lag_1 test day {j} must be NaN — it must never read a test-window target" + ) diff --git a/docs/PHASE/5-BACKTESTING.md b/docs/PHASE/5-BACKTESTING.md index d06d8d2e..b465b3d1 100644 --- a/docs/PHASE/5-BACKTESTING.md +++ b/docs/PHASE/5-BACKTESTING.md @@ -373,6 +373,39 @@ $ uv run pytest app/features/backtesting/tests/ -v -m integration --- +## Feature-Aware Backtesting (MLZOO-B.2) + +Since PRP-MLZOO-B.2 the fold loop also evaluates **feature-aware** models +(`regression`, `lightgbm` — any model with `requires_features=True`), not just +the target-only baselines. + +**How it works**: +1. `run_backtest` (async) probes the model's `requires_features` flag. For a + feature-aware model it resolves the exogenous data — recorded price, + promotion windows, calendar holidays, product launch date — once, into a + pure in-memory `ExogenousFrame`. The fold loop stays sync and DB-free. +2. `_run_model_backtest` branches on the flag. The feature-aware path builds the + full historical feature matrix once, then per fold: + - **`X_train`** — a positional slice of that historical matrix. + - **`X_future`** — rebuilt per fold by `build_future_feature_rows` (never + sliced — that would leak an adjacent test-day target as `lag_1`). Its + `history_tail` ends at the fold origin `T`, so future-sourced lag cells + are `NaN`; with `gap > 0` the lag columns drop the gap lead-in. +3. The result records `feature_aware: true` and `exogenous_policy: "observed"` + (the v1 policy — the recorded test-window price/promotion plan; exogenous + foresight, not target leakage). + +**Constraints**: +- `min_train_size >= 30` is required for a feature-aware backtest (each fold + must resolve its lag features) — a smaller value raises `ValueError` (400). +- The naive / seasonal baselines stay target-only — they take the unchanged + code path even when the main model is feature-aware. + +The per-fold row builders live in `app/shared/feature_frames/rows.py`; the +leakage invariants are pinned by `app/shared/feature_frames/tests/test_leakage.py`. + +--- + ## Next Phase Preparation Phase 6 (Model Registry) will use the backtesting module to: diff --git a/docs/optional-features/10-baseforecaster-feature-contract.md b/docs/optional-features/10-baseforecaster-feature-contract.md new file mode 100644 index 00000000..4d6f89a1 --- /dev/null +++ b/docs/optional-features/10-baseforecaster-feature-contract.md @@ -0,0 +1,116 @@ +# BaseForecaster Feature Contract + +## Summary + +Formalize the existing `BaseForecaster` interface as the canonical model contract for both target-only baseline models and feature-aware ML models. Add a `requires_features` class attribute or property so services can branch on model capability without `isinstance` checks or a new `FeatureAwareForecaster` subclass. + +This is a small but important foundation item for the Advanced ML Model Zoo. + +## Why It Fits ForecastLabAI + +ForecastLabAI already has a forecasting model interface where models expose: + +- `fit(y, X=None)` +- `predict(horizon, X=None)` + +Baseline models can ignore `X`; regression and future advanced models need `X`. Introducing a second base class too early would create inheritance churn without solving the harder platform problems: feature-frame contracts, future feature availability, leakage safety, and train/serve skew. + +## User Value + +- Keeps current baseline behavior stable. +- Makes feature-aware model support explicit. +- Prepares LightGBM/XGBoost/Prophet-like work without API churn. +- Avoids brittle `isinstance` checks in services. +- Reduces persistence risk for existing joblib model bundles. + +## Proposed Design + +Keep `BaseForecaster` as the single canonical model interface. + +Add a class-level capability flag: + +```python +requires_features: ClassVar[bool] = False +``` + +Baseline models: + +```python +class NaiveForecaster(BaseForecaster): + requires_features = False +``` + +Feature-aware models: + +```python +class RegressionForecaster(BaseForecaster): + requires_features = True +``` + +Service code branches on the model contract: + +```python +if model.requires_features: + # require and validate X / X_future +else: + # y-only baseline path +``` + +## Backend Design + +Likely files: + +- `app/features/forecasting/models.py` +- `app/features/forecasting/service.py` +- `app/features/forecasting/tests/test_models.py` +- `examples/models/model_interface.md` +- `docs/PHASE/4-FORECASTING.md` + +The change should document that: + +- `fit(y, X=None)` is the universal train contract. +- `predict(horizon, X=None)` is the universal predict contract. +- `requires_features = False` models may ignore `X`. +- `requires_features = True` models must receive valid feature frames. +- A `FeatureAwareForecaster` subclass should be revisited only after multiple advanced model families need shared behavior beyond the flag. + +## MVP Scope + +- Add `requires_features` to the model interface. +- Set it explicitly on existing baseline and regression models. +- Update service branching where it currently relies on model type checks. +- Add tests proving baseline models ignore `X` and regression requires it. +- Update model interface documentation. + +## Full Version + +- Add richer capability flags if needed: + - `supports_prediction_intervals` + - `supports_feature_importance` + - `supports_exogenous_future` + - `supports_recursive_prediction` +- Introduce a `FeatureAwareForecaster` subclass only when shared advanced-model behavior justifies the abstraction. + +## Risks + +- Adding a flag without tests can become another implicit contract. +- Service code must not silently pass `None` into feature-aware models. +- Documentation must be precise so future LightGBM work does not reinterpret the contract. + +## Validation Plan + +- Unit tests for each existing model's `requires_features` value. +- Unit tests proving baseline models still fit/predict with `X=None`. +- Unit tests proving feature-aware models reject missing required features. +- Regression tests for existing forecasting service behavior. +- `uv run pytest -q -m "not integration"` +- `uv run ruff check app tests` + +## Documentation + +- scikit-learn estimator development guide: https://scikit-learn.org/stable/developers/develop.html +- scikit-learn Pipeline composition: https://scikit-learn.org/stable/modules/compose.html +- scikit-learn model persistence: https://scikit-learn.org/stable/model_persistence.html +- Joblib persistence documentation: https://joblib.readthedocs.io/en/stable/persistence.html +- Pydantic documentation: https://docs.pydantic.dev/latest/ + diff --git a/docs/optional-features/11-feature-aware-predict-serving.md b/docs/optional-features/11-feature-aware-predict-serving.md new file mode 100644 index 00000000..1c745bd0 --- /dev/null +++ b/docs/optional-features/11-feature-aware-predict-serving.md @@ -0,0 +1,103 @@ +# Feature-Aware Forecasting Predict Serving + +## Summary + +Extend `POST /forecasting/predict` so feature-aware models can produce forecasts outside `/scenarios/simulate` when a leakage-safe future feature frame can be constructed or supplied. + +Today feature-aware regression models are rejected by `/forecasting/predict` because the endpoint cannot supply future `X`. That is correct for the current foundation, but it becomes a missing serving capability once MLZOO-B introduces the first advanced model. + +## Why It Fits ForecastLabAI + +ForecastLabAI is evolving from y-only baseline forecasting toward ML forecasting with `y + X`. Scenario simulation already provides a context where future assumptions can produce `X_future`. The standard forecasting endpoint needs a safe, explicit serving path for feature-aware models too, but only after the shared feature-frame contract is stable. + +## User Value + +- Advanced models become usable through the normal forecast API. +- Forecast visualization can load predictions from feature-aware jobs. +- LightGBM and future XGBoost models can serve without requiring the scenario UI. +- The product can distinguish baseline forecasts, scenario forecasts, and assumptions-free ML forecasts. + +## Proposed Design + +Add feature-aware predict support in a later PRP, not in the foundation-only MLZOO-A work. + +Supported future-frame modes: + +1. **Calendar-only / history-tail mode** + - Use known future calendar features. + - Use historical tail for lag and rolling seeds. + - Generate recursive target-derived features only from history and prior predictions. + - Reject unsafe feature columns that require explicit future assumptions. + +2. **Supplied future-frame mode** + - Client or service supplies validated `X_future`. + - API verifies required columns, order, dtypes, horizon length, and no target leakage. + +3. **Scenario-backed mode** + - Reuse saved scenario assumptions to construct `X_future`. + - Clearly mark the result as scenario-conditioned. + +## Backend Design + +Likely files: + +- `app/features/forecasting/routes.py` +- `app/features/forecasting/service.py` +- `app/features/forecasting/schemas.py` +- `app/shared/feature_frames/` +- `app/features/jobs/service.py` +- `frontend/src/pages/visualize/forecast.tsx` + +Possible request additions: + +- `feature_mode`: `baseline`, `history_calendar`, `supplied_frame`, `scenario` +- `future_frame`: optional structured future features +- `scenario_id`: optional saved scenario reference +- `history_tail_days`: optional bounded history window + +The endpoint must reject feature-aware predictions when required future features are unavailable. + +## MVP Scope + +- Keep current rejection behavior until the first advanced model lands. +- Add a dedicated PRP later for `history_calendar` mode. +- Support only safe known-ahead features and recursive target-derived features. +- Return metadata that states how `X_future` was built. + +## Full Version + +- Supplied future-frame mode. +- Scenario-backed mode. +- Prediction interval support where available. +- Feature availability diagnostics. +- UI warnings when forecasts are assumptions-free vs scenario-conditioned. + +## Risks + +- Assumptions-free future frames can be misleading if users expect promotions, inventory, or exogenous events to be included. +- Recursive lag generation can leak future targets if implemented incorrectly. +- Train/serve skew can silently degrade advanced model quality. +- API shape can become too broad if scenario, supplied-frame, and history-calendar modes are mixed without clear validation. + +## Validation Plan + +- Unit tests for future-frame validation. +- Leakage tests proving `X_future` never reads true future targets. +- API tests: + - baseline model predict still works + - feature-aware predict rejects missing future features + - feature-aware predict accepts valid history-calendar frame + - unsafe future feature requirements produce clear errors +- Job result metadata tests. +- Browser QA for forecast visualization using a feature-aware prediction job. + +## Documentation + +- FastAPI documentation: https://fastapi.tiangolo.com/ +- Pydantic documentation: https://docs.pydantic.dev/latest/ +- scikit-learn TimeSeriesSplit: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html +- Pandas time series documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html +- LightGBM Python API: https://lightgbm.readthedocs.io/en/stable/Python-API.html +- XGBoost Python package documentation: https://xgboost.readthedocs.io/en/stable/python/ +- Recharts documentation: https://recharts.org/en-US/ + diff --git a/docs/optional-features/README.md b/docs/optional-features/README.md index 6a8a309e..33c7f6b9 100644 --- a/docs/optional-features/README.md +++ b/docs/optional-features/README.md @@ -15,6 +15,8 @@ This folder contains implementation-oriented product and architecture notes for | Agent Experiment Workbench | [07-agent-experiment-workbench.md](07-agent-experiment-workbench.md) | Strategic | High | | Demand Anomaly and Data Quality Monitor | [08-demand-anomaly-data-quality-monitor.md](08-demand-anomaly-data-quality-monitor.md) | Medium-term | Medium | | Model Champion/Challenger Governance | [09-model-champion-challenger-governance.md](09-model-champion-challenger-governance.md) | Medium-term | High | +| BaseForecaster Feature Contract | [10-baseforecaster-feature-contract.md](10-baseforecaster-feature-contract.md) | MLZOO foundation | Low | +| Feature-Aware Forecasting Predict Serving | [11-feature-aware-predict-serving.md](11-feature-aware-predict-serving.md) | MLZOO follow-up | Medium | ## Promotion Criteria diff --git a/examples/models/advanced_lightgbm.py b/examples/models/advanced_lightgbm.py new file mode 100644 index 00000000..6a97ee13 --- /dev/null +++ b/examples/models/advanced_lightgbm.py @@ -0,0 +1,54 @@ +"""Example: Training and predicting with the LightGBM forecaster (MLZOO-B). + +``LightGBMForecaster`` is the first ADVANCED feature-aware model — it wraps +``lightgbm.LGBMRegressor`` and, unlike the baselines, REQUIRES an exogenous +feature matrix ``X`` for both ``fit`` and ``predict``. + +LightGBM is an OPTIONAL dependency. Install the extra first: + + uv sync --extra dev --extra ml-lightgbm + +Usage: + python examples/models/advanced_lightgbm.py +""" + +import numpy as np + +from app.features.forecasting.models import LightGBMForecaster +from app.shared.feature_frames import canonical_feature_columns + + +def main(): + # 1. Build a small synthetic feature matrix matching the canonical 14-column + # feature-frame contract, plus a target that genuinely depends on it. + columns = canonical_feature_columns() + n_features = len(columns) # 14 + rng = np.random.default_rng(42) + n_rows = 120 + x_train = rng.normal(size=(n_rows, n_features)) + y_train = ( + 50.0 + 5.0 * x_train[:, 0] - 3.0 * x_train[:, 1] + rng.normal(scale=0.5, size=n_rows) + ).astype(np.float64) + print(f"Training data: {n_rows} rows x {n_features} features") + print(f"Feature columns: {columns}") + + # 2. Create the model — deterministic given a fixed random_state. + model = LightGBMForecaster(n_estimators=100, learning_rate=0.1, max_depth=6, random_state=42) + print(f"\nrequires_features: {LightGBMForecaster.requires_features}") + + # 3. Fit on the historical feature frame (``lightgbm`` is imported lazily here). + model.fit(y_train, x_train) + print(f"Model fitted: {model.is_fitted}") + print(f"Model params: {model.get_params()}") + + # 4. Predict over a future feature frame of `horizon` rows. + horizon = 7 + x_future = rng.normal(size=(horizon, n_features)) + forecasts = model.predict(horizon, x_future) + print(f"\n{horizon}-day forecast:") + for i, f in enumerate(forecasts): + print(f" Day {i + 1}: {f:.2f}") + + +if __name__ == "__main__": + main() diff --git a/examples/models/advanced_xgboost.py b/examples/models/advanced_xgboost.py new file mode 100644 index 00000000..0b58337d --- /dev/null +++ b/examples/models/advanced_xgboost.py @@ -0,0 +1,54 @@ +"""Example: Training and predicting with the XGBoost forecaster (MLZOO-C1). + +``XGBoostForecaster`` is the second ADVANCED feature-aware tree model — it wraps +``xgboost.XGBRegressor`` and, like ``LightGBMForecaster``, REQUIRES an exogenous +feature matrix ``X`` for both ``fit`` and ``predict``. + +XGBoost is an OPTIONAL dependency. Install the extra first: + + uv sync --extra dev --extra ml-xgboost + +Usage: + python examples/models/advanced_xgboost.py +""" + +import numpy as np + +from app.features.forecasting.models import XGBoostForecaster +from app.shared.feature_frames import canonical_feature_columns + + +def main(): + # 1. Build a small synthetic feature matrix matching the canonical 14-column + # feature-frame contract, plus a target that genuinely depends on it. + columns = canonical_feature_columns() + n_features = len(columns) # 14 + rng = np.random.default_rng(42) + n_rows = 120 + x_train = rng.normal(size=(n_rows, n_features)) + y_train = ( + 50.0 + 5.0 * x_train[:, 0] - 3.0 * x_train[:, 1] + rng.normal(scale=0.5, size=n_rows) + ).astype(np.float64) + print(f"Training data: {n_rows} rows x {n_features} features") + print(f"Feature columns: {columns}") + + # 2. Create the model — deterministic given a fixed random_state. + model = XGBoostForecaster(n_estimators=100, learning_rate=0.1, max_depth=6, random_state=42) + print(f"\nrequires_features: {XGBoostForecaster.requires_features}") + + # 3. Fit on the historical feature frame (``xgboost`` is imported lazily here). + model.fit(y_train, x_train) + print(f"Model fitted: {model.is_fitted}") + print(f"Model params: {model.get_params()}") + + # 4. Predict over a future feature frame of `horizon` rows. + horizon = 7 + x_future = rng.normal(size=(horizon, n_features)) + forecasts = model.predict(horizon, x_future) + print(f"\n{horizon}-day forecast:") + for i, f in enumerate(forecasts): + print(f" Day {i + 1}: {f:.2f}") + + +if __name__ == "__main__": + main() diff --git a/examples/models/feature_frame_contract.md b/examples/models/feature_frame_contract.md new file mode 100644 index 00000000..5e408fe6 --- /dev/null +++ b/examples/models/feature_frame_contract.md @@ -0,0 +1,136 @@ +# Feature-Frame Contract + +The contract a **feature-aware** forecasting model (the regression, LightGBM, +XGBoost, and Prophet-like forecasters today) +stands on. The single source of truth in code is +[`app/shared/feature_frames`](../../app/shared/feature_frames/) — the pinned +constants, the canonical column set and order, the `FutureFeatureFrame` +carrier, the leakage-safe pure builders, and the `FeatureSafety` taxonomy. + +A feature-aware model consumes **two** matrices with the *same columns in the +same order*: + +| Frame | Built by | Shape | Rows | +|-------|----------|-------|------| +| Historical training frame | `ForecastingService._build_regression_features` → `_assemble_regression_rows` | `[n_observations, n_features]` | observed days in `[train_start, train_end]` | +| Future prediction frame | `app/features/scenarios/feature_frame.build_future_frame` → `assemble_future_frame` | `[horizon, n_features]` | the horizon days `T+1 … T+horizon` | + +`T` is the **forecast origin** — the last training day (`train_end`). + +## Historical training frame + +- One row per observed day in the SQL window `WHERE date >= train_start AND + date <= train_end`. **That `date <= train_end` filter IS the cutoff guard** — + no row can be assembled for a day after the origin `T`. +- `lag_k` at row `i` reads `quantity[i - k]` — a strictly earlier observation — + or `NaN` when `i < k` (no source day exists yet). +- Calendar columns are pure functions of the row's date. +- `price_factor` / `promo_active` / `is_holiday` / `days_since_launch` read the + row's **same-day** observed attributes — never a future day. +- Spec: [`test_regression_features_leakage.py`](../../app/features/forecasting/tests/test_regression_features_leakage.py) + (load-bearing — sequential targets make any leakage mathematically detectable). + +## Future prediction frame + +- One row per horizon day `T+1 … T+horizon`. +- `lag_k` at horizon day `j` reads `history_tail[(j-1) - k]` **only when + `(j-1)-k < 0`** — i.e. the source day lies at or before the origin `T` and is + therefore inside the observed history tail. When `(j-1)-k >= 0` the source day + is itself a future horizon day with no observed target, so the cell is `NaN`. + **There is no recursion in v1** — a `NaN` lag is never back-filled with a + prediction. +- Calendar columns are pure functions of the horizon date. +- `price_factor` / `promo_active` are knowable for a future day **only because + the caller posits them** (a scenario assumption); `is_holiday` and + `days_since_launch` are timeless date attributes. +- Spec: [`app/shared/feature_frames/tests/test_leakage.py`](../../app/shared/feature_frames/tests/test_leakage.py) + (the shared pure builders) and + [`app/features/scenarios/tests/test_future_frame_leakage.py`](../../app/features/scenarios/tests/test_future_frame_leakage.py) + (the assumption-driven columns + the assembled frame). + +## The canonical column set + +`canonical_feature_columns()` returns these **14 columns, in this order**: + +``` +lag_1, lag_7, lag_14, lag_28, +dow_sin, dow_cos, month_sin, month_cos, is_weekend, is_month_end, +price_factor, promo_active, is_holiday, days_since_launch +``` + +The set is deliberately **fixed** (not horizon-dependent): for a long horizon +some lag columns are mostly `NaN`, which a NaN-tolerant estimator handles — far +safer than a column set that changes shape with the horizon. The trained model +bundle persists exactly this list in its metadata; the future frame reproduces +it column-for-column. + +Pinned constants: `EXOGENOUS_LAGS = (1, 7, 14, 28)`, `HISTORY_TAIL_DAYS = 90` +(the observed-target tail length persisted in the bundle so the future frame +can resolve the longest lag). + +## Feature-class taxonomy + +Every column carries a `FeatureSafety` class (see `FEATURE_CLASS` / +`feature_safety()` in `app/shared/feature_frames`). This is the executable form +of the feature-family table in +[`PRPs/ai_docs/exogenous-regressor-forecasting.md`](../../PRPs/ai_docs/exogenous-regressor-forecasting.md) §2. + +| Column | Class | How to populate a future day — and the leakage trap | +|--------|-------|------------------------------------------------------| +| `lag_1`, `lag_7`, `lag_14`, `lag_28` | `CONDITIONALLY_SAFE` | Read `history_tail` only when the source day `<= T`; otherwise `NaN`. **Trap:** filling a future-sourced lag with a prediction (recursion) or a fabricated value. | +| `dow_sin`, `dow_cos`, `month_sin`, `month_cos`, `is_weekend`, `is_month_end` | `SAFE` | Pure function of the date — compute directly. No trap: a calendar feature cannot leak the target. | +| `is_holiday` | `SAFE` | The `calendar` table is a timeless attribute; reading a horizon day's holiday flag is not leakage. | +| `days_since_launch` | `SAFE` | `(date - launch_date).days` — a pure function of the date once the launch date is known. `NaN` when the product has no launch date. | +| `price_factor` | `UNSAFE_UNLESS_SUPPLIED` | Knowable for a future day **only** if the caller posits it (a price assumption). Never inferred from observed data. Default `1.0` (no change). | +| `promo_active` | `UNSAFE_UNLESS_SUPPLIED` | Knowable for a future day **only** if the caller posits a promotion. Default `0.0` (no promotion). | + +## The NaN-as-unknown rule + +A builder emits `math.nan` for a cell whose source is **genuinely unknowable** +at origin `T` — a long-lag whose source day lies in the horizon, or +`days_since_launch` for a product with no launch date. `NaN` means *unknown*; it +is never silently replaced with a fabricated default such as `0.0`. + +`HistGradientBoostingRegressor` and `lightgbm.LGBMRegressor` tolerate `NaN` +natively. A model that is **not** NaN-tolerant must impute explicitly inside its +own `fit`/`predict` — the shared frame builders must not impute on its behalf. +The `prophet_like` model is the worked example: its `Ridge` step rejects `NaN`, +so it folds a `SimpleImputer(median)` in as the first `Pipeline` step (the +imputer learns its medians on the training `X` only — no leakage). + +## How a future advanced model plugs in + +A new feature-aware model (PRP-MLZOO-B onward): + +1. Subclasses `BaseForecaster` and sets `requires_features: ClassVar[bool] = True`. + `ForecastingService.train_model` / `predict` branch on this flag — no + `isinstance` check, no `model_type` string comparison. +2. Reuses the **shared** frame builders and `canonical_feature_columns()` — it + writes **zero** new contract code, so it cannot drift from the regression + contract. +3. Consumes the historical frame for `fit(y, X)`, the future frame (via + `POST /scenarios/simulate`) for `predict(horizon, X)`, and — since + PRP-MLZOO-B.2 — the per-fold backtest frames for `POST /backtesting/run`. + +## Backtesting frame (per fold) + +Since **PRP-MLZOO-B.2** a feature-aware model is evaluated by the backtesting +fold loop. Each fold builds two matrices from the **same** canonical column set: + +- **`X_train`** — a positional slice of the full historical matrix + (`build_historical_feature_rows`), built once over the whole series. Leakage- + safe *as a training row*: every lag reads a strictly-earlier observed target. +- **`X_future`** — the test-window matrix, **rebuilt per fold** by + `build_future_feature_rows` (never sliced from the historical matrix — a + sliced historical test row would read an adjacent test-day observed target as + its `lag_1` cell, which is target leakage). Its `history_tail` ends at the + fold origin `T`, so a lag cell whose source day is in the test window is + `NaN`. With `gap > 0` the lag columns are built for `gap + horizon` days and + the first `gap` rows dropped. + +The test-window `price_factor` / `promo_active` come from the **recorded** +price/promotion for that window (the v1 `observed` exogenous policy) — that +reads no target `y`, so it is exogenous foresight, not target leakage; the +`ModelBacktestResult` records `exogenous_policy="observed"` so the metric is +read honestly. Both builders live in `app/shared/feature_frames/rows.py` and are +spec'd by the load-bearing `app/shared/feature_frames/tests/test_leakage.py`. diff --git a/examples/models/model_interface.md b/examples/models/model_interface.md index bf5d391a..cd6a1424 100644 --- a/examples/models/model_interface.md +++ b/examples/models/model_interface.md @@ -86,6 +86,14 @@ Check if the model has been fitted. **Returns:** - `True` if `fit()` has been called successfully +#### `requires_features: ClassVar[bool]` + +Class attribute — `True` when `fit()`/`predict()` REQUIRE a non-`None` `X` +feature frame. Baseline (target-only) models leave it `False`; feature-aware +models (e.g. the regression forecaster) override it to `True`. The forecasting +service branches on this flag instead of an `isinstance` check or a +`model_type` string comparison. + --- ## Model Configurations @@ -121,6 +129,85 @@ Each model type has a corresponding configuration schema: } ``` +### RegressionModelConfig + +```python +{ + "schema_version": "1.0", + "model_type": "regression", + "max_iter": 200, # 10-1000 (boosting iterations) + "learning_rate": 0.05, # 0.001-1.0 + "max_depth": 6 # 1-20 +} +``` + +A **feature-aware** model (`requires_features = True`): it wraps scikit-learn's +`HistGradientBoostingRegressor` and consumes a per-day exogenous feature frame. +The feature-frame contract — the canonical column set, the historical vs future +frame shapes, and the leakage taxonomy — is documented in +[`feature_frame_contract.md`](feature_frame_contract.md). + +### LightGBMModelConfig + +```python +{ + "schema_version": "1.0", + "model_type": "lightgbm", + "n_estimators": 100, # 10-1000 (boosting rounds) + "max_depth": 6, # 1-20 + "learning_rate": 0.1 # 0.001-1.0 +} +``` + +A **feature-aware** model (`requires_features = True`) wrapping +`lightgbm.LGBMRegressor` — the first *advanced* model in the MLZOO sequence +(PRP-30 / MLZOO-B). LightGBM is an **optional dependency**: install the +`ml-lightgbm` extra (`uv sync --extra dev --extra ml-lightgbm`) and enable +`forecast_enable_lightgbm=true`. It consumes the same canonical feature frame as +`regression` — see [`feature_frame_contract.md`](feature_frame_contract.md). + +### XGBoostModelConfig + +```python +{ + "schema_version": "1.0", + "model_type": "xgboost", + "n_estimators": 100, # 10-1000 (boosting rounds) + "max_depth": 6, # 1-20 + "learning_rate": 0.1 # 0.001-1.0 +} +``` + +A **feature-aware** model (`requires_features = True`) wrapping +`xgboost.XGBRegressor` — the second *advanced* tree model in the MLZOO sequence +(PRP-MLZOO-C1). XGBoost is an **optional dependency**: install the +`ml-xgboost` extra (`uv sync --extra dev --extra ml-xgboost`) and enable +`forecast_enable_xgboost=true`. It consumes the same canonical feature frame as +`regression` and `lightgbm` — see [`feature_frame_contract.md`](feature_frame_contract.md). + +### ProphetLikeModelConfig + +```python +{ + "schema_version": "1.0", + "model_type": "prophet_like", + "alpha": 1.0 # 0.0-10000.0 (Ridge L2 regularization strength) +} +``` + +A **feature-aware** model (`requires_features = True`) — a deterministic, +regularized **additive linear** model (MLZOO-C2). It is a scikit-learn +`Pipeline` of a `SimpleImputer(median)` + a `Ridge(solver="cholesky")` over the +same canonical 14-column feature frame as `regression`. Unlike the tree models +it ships **always-enabled**: pure scikit-learn, no optional extra, no feature +flag. It exposes a model-specific `decompose()` method that splits any forecast +into its additive trend / seasonality / holiday-regressor contributions. + +It is "Prophet-**like**", not Prophet: it approximates Prophet's additive shape +with a linear model over engineered features. It does **not** add the real +`prophet`/Stan dependency and does **not** model changepoint trend, posterior +uncertainty intervals, or automatic seasonality discovery. + --- ## Model Formulas @@ -149,6 +236,65 @@ Predicts the value from the same position in the previous seasonal cycle. Predicts the average of the last `window_size` observations. +### Regression Forecaster + +``` +ŷ[t+h] = HistGradientBoostingRegressor.predict(X[t+h]) +``` + +Predicts each horizon day from its exogenous feature row `X[t+h]` (target +long-lags, calendar, and posited price/promotion inputs). Unlike the baselines +it REQUIRES a feature frame — see [`feature_frame_contract.md`](feature_frame_contract.md). + +### LightGBM Forecaster + +``` +ŷ[t+h] = LGBMRegressor.predict(X[t+h]) +``` + +Same exogenous-feature contract as the regression forecaster, but the estimator +is `lightgbm.LGBMRegressor` — gradient-boosted leaf-wise trees. Feature-aware +(`requires_features = True`), deterministic (`n_jobs=1`, `deterministic=True`, +`force_col_wise=True`, fixed `random_state`), and NaN-tolerant. Optional — +behind the `ml-lightgbm` extra and the `forecast_enable_lightgbm` flag. + +### XGBoost Forecaster + +``` +ŷ[t+h] = XGBRegressor.predict(X[t+h]) +``` + +Same exogenous-feature contract as the regression and LightGBM forecasters, but +the estimator is `xgboost.XGBRegressor` — gradient-boosted trees. Feature-aware +(`requires_features = True`), deterministic (`n_jobs=1`, `tree_method="hist"`, +fixed `random_state`, no stochastic subsampling), and NaN-tolerant +(`missing=np.nan`). Optional — behind the `ml-xgboost` extra and the +`forecast_enable_xgboost` flag. + +### Prophet-like Forecaster + +``` +ŷ[t+h] = intercept + trend[t+h] + seasonality[t+h] + holiday_regressor[t+h] +``` + +An **additive** linear forecast: a `Ridge` fit gives `ŷ = intercept + Σ coefᵢ·xᵢ`, +and that sum is grouped into three Prophet-style components, each the partial +sum over its columns of the canonical 14-column frame: + +| Component | Canonical columns | +|-----------|-------------------| +| `trend` | `lag_1`, `lag_7`, `lag_14`, `lag_28`, `days_since_launch` | +| `seasonality` | `dow_sin`, `dow_cos`, `month_sin`, `month_cos`, `is_weekend`, `is_month_end` | +| `holiday_regressor` | `price_factor`, `promo_active`, `is_holiday` | + +The three column sets partition all 14 columns exactly, so the **additive +invariant** holds: `decompose(X)`'s four parts sum (within float tolerance) to +`predict(...)`. Feature-aware (`requires_features = True`), deterministic +(`Ridge(solver="cholesky")` closed-form, `SimpleImputer(median)`), and +NaN-tolerant via the imputer. Pure scikit-learn — always available, no extra, +no flag. The `decompose()` method (model-specific, not on `BaseForecaster`) +returns the four-way breakdown. + --- ## Persistence (ModelBundle) @@ -164,6 +310,8 @@ class ModelBundle: created_at: datetime # Save timestamp python_version: str # Python version sklearn_version: str # Scikit-learn version + lightgbm_version: str | None # LightGBM version (None if extra not installed) + xgboost_version: str | None # XGBoost version (None if extra not installed) bundle_hash: str # Deterministic hash ``` diff --git a/examples/models/prophet_like_additive.py b/examples/models/prophet_like_additive.py new file mode 100644 index 00000000..1a9378f2 --- /dev/null +++ b/examples/models/prophet_like_additive.py @@ -0,0 +1,78 @@ +"""Example: Training, predicting, and decomposing with the Prophet-like model (MLZOO-C2). + +``ProphetLikeForecaster`` is a deterministic, regularized ADDITIVE linear model +— a scikit-learn ``Pipeline`` of a ``SimpleImputer`` + a ``Ridge`` regressor +over the canonical 14-column feature frame. Like the other feature-aware models +it REQUIRES an exogenous feature matrix ``X`` for both ``fit`` and ``predict``. + +It is "Prophet-LIKE", not Prophet: it approximates Prophet's additive trend + +seasonality + holiday/regressor decomposition with a linear model over already- +engineered features. It does NOT add the real ``prophet``/Stan dependency and +does NOT model changepoint trend, posterior uncertainty intervals, or automatic +seasonality discovery. + +Pure scikit-learn — no optional extra to install, always available: + + uv run python examples/models/prophet_like_additive.py +""" + +import numpy as np + +from app.features.forecasting.models import ProphetLikeForecaster +from app.shared.feature_frames import canonical_feature_columns + + +def main() -> None: + # 1. Build a small synthetic feature matrix matching the canonical 14-column + # feature-frame contract, plus a target that genuinely depends on it. + columns = canonical_feature_columns() + n_features = len(columns) # 14 + rng = np.random.default_rng(42) + n_rows = 120 + x_train = rng.normal(size=(n_rows, n_features)) + y_train = ( + 50.0 + 5.0 * x_train[:, 0] - 3.0 * x_train[:, 1] + rng.normal(scale=0.5, size=n_rows) + ).astype(np.float64) + print(f"Training data: {n_rows} rows x {n_features} features") + print(f"Feature columns: {columns}") + + # 2. Create the model — deterministic (Ridge solver="cholesky" is closed-form). + model = ProphetLikeForecaster(alpha=1.0, random_state=42) + print(f"\nrequires_features: {ProphetLikeForecaster.requires_features}") + + # 3. Fit on the historical feature frame (the SimpleImputer learns its + # per-column medians on this training X only — no leakage). + model.fit(y_train, x_train) + print(f"Model fitted: {model.is_fitted}") + print(f"Model params: {model.get_params()}") + + # 4. Predict over a future feature frame of `horizon` rows. + horizon = 7 + x_future = rng.normal(size=(horizon, n_features)) + forecasts = model.predict(horizon, x_future) + print(f"\n{horizon}-day forecast:") + for i, value in enumerate(forecasts): + print(f" Day {i + 1}: {value:.2f}") + + # 5. Decompose the forecast into its additive components. The invariant is + # intercept + trend + seasonality + holiday_regressor == predict(...). + decomposition = model.decompose(x_future) + print(f"\nAdditive decomposition (intercept = {decomposition.intercept:.2f}):") + print(" Day | trend | seasonality | holiday_regressor | sum | predict") + for i in range(horizon): + component_sum = ( + decomposition.intercept + + decomposition.trend[i] + + decomposition.seasonality[i] + + decomposition.holiday_regressor[i] + ) + print( + f" {i + 1:>3} | {decomposition.trend[i]:>6.2f} | " + f"{decomposition.seasonality[i]:>11.2f} | " + f"{decomposition.holiday_regressor[i]:>17.2f} | " + f"{component_sum:>7.2f} | {forecasts[i]:>7.2f}" + ) + + +if __name__ == "__main__": + main() diff --git a/frontend/src/lib/scenario-utils.test.ts b/frontend/src/lib/scenario-utils.test.ts index b091470a..4331619b 100644 --- a/frontend/src/lib/scenario-utils.test.ts +++ b/frontend/src/lib/scenario-utils.test.ts @@ -1,5 +1,6 @@ import { describe, expect, it } from 'vitest' import { + assumptionDateErrors, buildMultiSeries, coverageLabel, coverageVariant, @@ -143,3 +144,61 @@ describe('methodLabel', () => { expect(methodLabel('model_exogenous')).toBe('Model-driven') }) }) + +describe('assumptionDateErrors', () => { + const NONE = { + priceEnabled: false, + priceStart: '', + priceEnd: '', + promoEnabled: false, + promoStart: '', + promoEnd: '', + } + + it('reports no errors when nothing is enabled', () => { + expect(assumptionDateErrors(NONE).hasErrors).toBe(false) + }) + + it('flags both price dates when price is enabled and blank', () => { + const e = assumptionDateErrors({ ...NONE, priceEnabled: true }) + expect(e.priceStart).toBe(true) + expect(e.priceEnd).toBe(true) + expect(e.hasErrors).toBe(true) + }) + + it('clears price errors once both dates are filled', () => { + const e = assumptionDateErrors({ + ...NONE, + priceEnabled: true, + priceStart: '2026-07-01', + priceEnd: '2026-07-14', + }) + expect(e.hasErrors).toBe(false) + }) + + it('flags only the blank promotion date', () => { + const e = assumptionDateErrors({ + ...NONE, + promoEnabled: true, + promoStart: '2026-07-01', + promoEnd: '', + }) + expect(e.promoStart).toBe(false) + expect(e.promoEnd).toBe(true) + expect(e.hasErrors).toBe(true) + }) + + it('isolates errors per assumption (price ok, promo blank)', () => { + const e = assumptionDateErrors({ + priceEnabled: true, + priceStart: '2026-07-01', + priceEnd: '2026-07-14', + promoEnabled: true, + promoStart: '', + promoEnd: '', + }) + expect(e.priceStart).toBe(false) + expect(e.promoStart).toBe(true) + expect(e.hasErrors).toBe(true) + }) +}) diff --git a/frontend/src/lib/scenario-utils.ts b/frontend/src/lib/scenario-utils.ts index bbd7b7ed..cbe7795e 100644 --- a/frontend/src/lib/scenario-utils.ts +++ b/frontend/src/lib/scenario-utils.ts @@ -135,3 +135,41 @@ export function buildMultiSeries(comparison: MultiScenarioComparison): MultiSeri export function methodLabel(method: 'heuristic' | 'model_exogenous'): string { return method === 'model_exogenous' ? 'Model-driven' : 'Heuristic' } + +/** Form state for the date-bearing assumptions (price, promotion). */ +export interface AssumptionDateState { + priceEnabled: boolean + priceStart: string + priceEnd: string + promoEnabled: boolean + promoStart: string + promoEnd: string +} + +/** Which enabled assumption date inputs are still blank. */ +export interface AssumptionDateErrors { + priceStart: boolean + priceEnd: boolean + promoStart: boolean + promoEnd: boolean + hasErrors: boolean +} + +/** + * Flag every enabled Price/Promotion assumption whose From/To date is blank. + * The planner blocks Run/Save while `hasErrors` is true so the backend never + * receives an empty-string date (which fails Pydantic date validation → 422). + */ +export function assumptionDateErrors(state: AssumptionDateState): AssumptionDateErrors { + const priceStart = state.priceEnabled && !state.priceStart + const priceEnd = state.priceEnabled && !state.priceEnd + const promoStart = state.promoEnabled && !state.promoStart + const promoEnd = state.promoEnabled && !state.promoEnd + return { + priceStart, + priceEnd, + promoStart, + promoEnd, + hasErrors: priceStart || priceEnd || promoStart || promoEnd, + } +} diff --git a/frontend/src/pages/visualize/planner.tsx b/frontend/src/pages/visualize/planner.tsx index 24aa3194..077ce6e2 100644 --- a/frontend/src/pages/visualize/planner.tsx +++ b/frontend/src/pages/visualize/planner.tsx @@ -35,6 +35,7 @@ import { import { downloadCsv, toCsv } from '@/lib/csv-export' import { formatCurrency, formatNumber, getErrorMessage } from '@/lib/api' import { + assumptionDateErrors, buildMultiSeries, coverageLabel, coverageVariant, @@ -73,8 +74,11 @@ export default function WhatIfPlannerPage() { const [selectedJobId, setSelectedJobId] = useState('') const [horizon, setHorizon] = useState(14) const { data: job } = useJob(selectedJobId, !!selectedJobId) - // A predict job's params.run_id is the baseline model artifact key. - const baselineRunId = typeof job?.params?.run_id === 'string' ? job.params.run_id : null + // A completed `train` job stores result.run_id — the model-artifact key + // POST /scenarios/simulate resolves. (This is NOT a registry run id.) + // A `regression` baseline routes the simulate call down the model_exogenous + // re-forecast branch; other model types fall back to the heuristic factor. + const baselineRunId = typeof job?.result?.run_id === 'string' ? job.result.run_id : null // -- Assumption form state --------------------------------------------- const [priceEnabled, setPriceEnabled] = useState(false) @@ -97,6 +101,19 @@ export default function WhatIfPlannerPage() { const [lifecycleStage, setLifecycleStage] = useState<(typeof LIFECYCLE_STAGES)[number]>('maturity') + // -- Derived validation ------------------------------------------------ + // Enabling Price/Promotion without filling both dates would submit empty + // strings — Pydantic date validation rejects those with an RFC 7807 422. + // Gate Run/Save on this so the form can never produce that request (#228). + const dateErrors = assumptionDateErrors({ + priceEnabled, + priceStart, + priceEnd, + promoEnabled, + promoStart, + promoEnd, + }) + // -- Results / persistence state --------------------------------------- const [simulated, setSimulated] = useState(null) const [planName, setPlanName] = useState('') @@ -152,7 +169,7 @@ export default function WhatIfPlannerPage() { } async function handleRun() { - if (!baselineRunId) return + if (!baselineRunId || dateErrors.hasErrors) return setRunError(null) setReloadId('') try { @@ -169,7 +186,7 @@ export default function WhatIfPlannerPage() { } async function handleSave() { - if (!baselineRunId || !planName.trim()) return + if (!baselineRunId || !planName.trim() || dateErrors.hasErrors) return setRunError(null) try { await createScenario.mutateAsync({ @@ -245,12 +262,15 @@ export default function WhatIfPlannerPage() { 1. Pick a baseline - Choose a completed prediction job — its model is the baseline this scenario adjusts. + Choose a completed training job — its model is the baseline this scenario + adjusts. A regression baseline is genuinely re-forecast through the model + (model-driven); naive, seasonal-naive and moving-average baselines use a + heuristic adjustment factor. {selectedJobId && !baselineRunId && (

- The selected job has no model artifact — pick a completed predict job. + The selected job has no model artifact — pick a completed train job.

)}
@@ -317,6 +337,9 @@ export default function WhatIfPlannerPage() { value={priceStart} onChange={(event) => setPriceStart(event.target.value)} /> + {dateErrors.priceStart && ( +

Required

+ )}
To @@ -326,6 +349,9 @@ export default function WhatIfPlannerPage() { value={priceEnd} onChange={(event) => setPriceEnd(event.target.value)} /> + {dateErrors.priceEnd && ( +

Required

+ )}
)} @@ -368,6 +394,9 @@ export default function WhatIfPlannerPage() { value={promoStart} onChange={(event) => setPromoStart(event.target.value)} /> + {dateErrors.promoStart && ( +

Required

+ )}
To @@ -377,6 +406,9 @@ export default function WhatIfPlannerPage() { value={promoEnd} onChange={(event) => setPromoEnd(event.target.value)} /> + {dateErrors.promoEnd && ( +

Required

+ )}
)} @@ -462,7 +494,10 @@ export default function WhatIfPlannerPage() {
-