Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
df612a1
Merge pull request #235 from w7-mgfcode/main
w7-mgfcode May 19, 2026
4a2e5dc
fix(jobs,ui): reach model_exogenous from the what-if planner (#229)
w7-mgfcode May 19, 2026
34104c9
fix(ui): block planner runs with empty assumption dates (#228)
w7-mgfcode May 19, 2026
f09dec0
Merge pull request #236 from w7-mgfcode/fix/planner-model-exogenous-a…
w7-mgfcode May 19, 2026
22e39aa
docs(docs): add MLZOO planning briefs and feature-contract notes (#238)
w7-mgfcode May 19, 2026
ed89d42
chore(repo): sync uv.lock version field to released v0.2.15 (#239)
w7-mgfcode May 19, 2026
97d8057
Merge pull request #240 from w7-mgfcode/docs/mlzoo-planning-briefs
w7-mgfcode May 19, 2026
b116489
feat(forecast): feature-aware forecasting foundation — shared feature…
w7-mgfcode May 19, 2026
a37abfe
Merge pull request #241 from w7-mgfcode/feat/feature-aware-forecastin…
w7-mgfcode May 19, 2026
2f1b8a5
feat(forecast): add LightGBM feature-aware forecasting model (#242)
w7-mgfcode May 19, 2026
2b44491
Merge pull request #243 from w7-mgfcode/feat/forecasting-lightgbm-fir…
w7-mgfcode May 19, 2026
12f6cdf
feat(backtest): wire feature-aware models into the backtesting fold l…
w7-mgfcode May 19, 2026
5a65e35
Merge pull request #245 from w7-mgfcode/feat/backtesting-feature-awar…
w7-mgfcode May 19, 2026
d7527a5
fix(api): allow Tailscale CGNAT origins in dev CORS allow-list (#246)
w7-mgfcode May 19, 2026
7adc045
docs(docs): add PRP-MLZOO-C1 xgboost model and split the MLZOO-C road…
w7-mgfcode May 19, 2026
0a25a75
docs(docs): add PRP-MLZOO-C2 prophet-like additive model (#248)
w7-mgfcode May 19, 2026
0d219bc
feat(forecast): add Prophet-like additive forecasting model (#248)
w7-mgfcode May 19, 2026
079e0b7
docs(docs): document the Prophet-like additive model (#248)
w7-mgfcode May 19, 2026
d61fc10
chore(repo): add ml-xgboost optional dependency extra (#247)
w7-mgfcode May 19, 2026
53d3d57
feat(forecast): add XGBoost feature-aware forecasting model (#247)
w7-mgfcode May 19, 2026
82c457e
Merge pull request #249 from w7-mgfcode/fix/api-tailscale-cors-origins
w7-mgfcode May 19, 2026
ca4dd4b
Merge branch 'dev' into feat/forecasting-xgboost-model
w7-mgfcode May 19, 2026
2091f2f
Merge pull request #251 from w7-mgfcode/feat/forecasting-xgboost-model
w7-mgfcode May 19, 2026
1ab877c
Merge branch 'dev' into feat/forecasting-prophet-like-model
w7-mgfcode May 19, 2026
7531eac
Merge pull request #250 from w7-mgfcode/feat/forecasting-prophet-like…
w7-mgfcode May 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# INITIAL-MLZOO-A-foundation-feature-frames.md - Feature-Aware Forecasting Foundation

## FEATURE:

Create the foundation for feature-aware forecasting in ForecastLabAI.

This is the first MLZOO PRP input and should become PRP-29. It must not implement LightGBM, XGBoost, Prophet-like models, frontend UI, explainability UI, hyperparameter search, or portfolio/global orchestration. Its job is to make the existing forecasting layer capable of supporting future advanced ML models without breaking current baseline forecasters.

Goals:

- Define a feature-aware forecasting contract that supports `fit(y, X=None)` and `predict(horizon, X=None)`.
- Preserve existing target-only baseline models: `naive`, `seasonal_naive`, and `moving_average`.
- Define historical training feature-frame requirements.
- Define future prediction feature-frame requirements.
- Add or document leakage-safe feature-frame generation rules.
- Add load-bearing leakage tests that prove future rows do not use future target values.
- Make future advanced models possible without adding their dependencies yet.

Expected user value:

- ForecastLabAI gains a safe foundation for serious ML forecasting.
- Future LightGBM/XGBoost/Prophet-like work can build on a tested frame contract.
- Scenario simulation and explainability can later depend on a consistent feature-frame interface.

Recommended user story:

As a forecasting engineer,
I want a leakage-safe feature-frame contract for training and prediction,
So that advanced ML models can be added without breaking baseline models or leaking future data.

Out of scope:

- LightGBM implementation.
- XGBoost implementation.
- Prophet-like implementation.
- New database migrations unless absolutely required.
- Frontend pages.
- Agent tools.
- Hyperparameter search.

## EXAMPLES:

Read these before PRP creation:

- `docs/optional-features/05-advanced-ml-model-zoo.md`
- Full feature vision and risks.

- `PRPs/INITIAL/INITIAL-5.md`
- Existing forecasting model brief.

- `docs/PHASE/4-FORECASTING.md`
- Current forecasting layer documentation.

- `app/features/forecasting/models.py`
- Existing `BaseForecaster` and baseline model implementations.

- `app/features/forecasting/schemas.py`
- Existing model config schemas and discriminated union pattern.

- `app/features/forecasting/service.py`
- Existing train/predict orchestration.

- `app/features/forecasting/persistence.py`
- Existing `ModelBundle` persistence.

- `app/features/featuresets/service.py`
- Existing time-safe feature computation.

- `app/features/featuresets/schemas.py`
- Feature configuration schemas.

- `app/features/featuresets/tests/test_leakage.py`
- Existing leakage tests to mirror and extend.

- `app/features/backtesting/service.py`
- Current backtesting integration points.

Potential example artifacts:

- `examples/models/feature_frame_contract.md`
- Describes historical and future frame shape, required columns, safe/unsafe feature classes.

## DOCUMENTATION:

- scikit-learn estimator interface conventions: https://scikit-learn.org/stable/developers/develop.html
- scikit-learn Pipeline composition: https://scikit-learn.org/stable/modules/compose.html
- scikit-learn TimeSeriesSplit: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html
- Pandas time series documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html
- Pydantic documentation: https://docs.pydantic.dev/latest/
- Joblib persistence documentation: https://joblib.readthedocs.io/en/stable/persistence.html

## OTHER CONSIDERATIONS:

This PRP is primarily about contracts and leakage safety.

Required decisions:

- How to represent feature-aware models without forcing every baseline model to require `X`.
- Whether to introduce a `FeatureAwareForecaster` protocol/base class or extend the existing base interface only.
- Where historical training frames are built.
- Where future prediction frames are built.
- Which feature classes are safe for future frames:
- Safe: calendar features known in advance.
- Conditionally safe: lag/rolling features generated from historical tail and prior predictions.
- Unsafe unless explicitly supplied: future price, promotion, inventory, markdown, exogenous signals.
- How to reject missing future features instead of silently filling misleading defaults.

Validation expectations:

- Existing baseline forecasting tests still pass.
- New feature-frame contract tests exist.
- New leakage tests prove future target values are not used.
- Backtesting remains time-safe.
- `uv run pytest -q -m "not integration"` should pass.
- `uv run ruff check app tests` should pass for touched Python code.

Important gotchas:

- Do not break current target-only baseline forecasters.
- Do not add LightGBM or other heavy ML dependencies in this PRP.
- Do not silently convert unknown future exogenous values into zeros.
- Do not let training frames include rows after the cutoff date.
- Do not let future prediction frames read true future targets.

101 changes: 101 additions & 0 deletions PRPs/INITIAL/INITIAL-MLZOO-B-lightgbm-first-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# INITIAL-MLZOO-B-lightgbm-first-model.md - LightGBM First Advanced Model

## FEATURE:

Add the first advanced feature-aware model to ForecastLabAI after the MLZOO foundation is merged.

Preferred model: LightGBM.

Fallback model: sklearn `HistGradientBoostingRegressor` or another sklearn-native gradient boosting model if LightGBM creates unacceptable dependency or CI risk.

This PRP must depend on `INITIAL-MLZOO-A-foundation-feature-frames.md` being implemented first.

Goals:

- Add one advanced model config schema.
- Add one feature-aware model implementation.
- Support deterministic training.
- Integrate with forecasting train/predict.
- Integrate with backtesting.
- Persist model metadata needed for reproducibility.
- Preserve all existing baseline model behavior.

Out of scope:

- XGBoost.
- Prophet-like models.
- Hyperparameter search.
- Portfolio/global models.
- Frontend model administration.
- Explainability UI.

## EXAMPLES:

Read these before PRP creation:

- `PRPs/INITIAL/INITIAL-MLZOO-A-foundation-feature-frames.md`
- Required prerequisite.

- `docs/optional-features/05-advanced-ml-model-zoo.md`
- Full advanced model vision.

- `app/features/forecasting/models.py`
- Model factory and baseline model patterns.

- `app/features/forecasting/schemas.py`
- Model config schema patterns.

- `app/features/forecasting/service.py`
- Training/prediction service integration.

- `app/features/forecasting/persistence.py`
- Model bundle save/load behavior.

- `app/features/backtesting/service.py`
- Backtesting orchestration.

- `app/features/registry/service.py`
- Registry run metadata patterns.

Potential example artifacts:

- `examples/models/advanced_lightgbm.py`
- Minimal training/prediction example.

## DOCUMENTATION:

- LightGBM documentation: https://lightgbm.readthedocs.io/
- LightGBM Python API: https://lightgbm.readthedocs.io/en/stable/Python-API.html
- LightGBM parameters: https://lightgbm.readthedocs.io/en/stable/Parameters.html
- scikit-learn HistGradientBoostingRegressor: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html
- scikit-learn model persistence: https://scikit-learn.org/stable/model_persistence.html
- Joblib persistence documentation: https://joblib.readthedocs.io/en/stable/persistence.html
- Pydantic documentation: https://docs.pydantic.dev/latest/

## OTHER CONSIDERATIONS:

Dependency strategy is the main open risk.

Required decisions:

- Whether to add LightGBM as a hard dependency, optional dependency group, or defer to sklearn fallback.
- Exact advanced model config fields.
- How model dependency versions are captured in registry/runtime metadata.
- How prediction rejects missing future feature frames.

Recommended defaults:

- Use fixed `random_state` from settings.
- Start with single store/product training.
- Keep the first config conservative.
- Avoid hyperparameter search.
- Persist feature column order.

Validation expectations:

- Config schema tests.
- Deterministic training tests.
- Save/load persistence tests.
- Backtesting integration test comparing baseline and advanced model path.
- Tests proving baselines still work unchanged.

Loading
Loading