feat(forecast): add XGBoost feature-aware forecasting model (#247)#251
Conversation
Adds the [project.optional-dependencies] ml-xgboost = ["xgboost>=2.1.0"] extra, mirroring ml-lightgbm. uv.lock regenerated so CI's uv sync --frozen --all-extras installs it. No core dependency change.
Implements XGBoostForecaster, the second advanced feature-aware tree
model (MLZOO-C1), mirroring the merged LightGBMForecaster byte-for-byte
with xgboost.XGBRegressor in place of lightgbm.LGBMRegressor.
- XGBoostModelConfig: conservative schema (n_estimators / max_depth /
learning_rate / feature_config_hash), added to the ModelConfig union.
- XGBoostForecaster: requires_features=True, lazy xgboost import inside
fit(), deterministic via n_jobs=1 + tree_method=hist + fixed seed.
- model_factory: xgboost branch gated on forecast_enable_xgboost; the
ModelType literal gains "xgboost".
- JobService._execute_train / _execute_backtest: xgboost branches.
- POST /forecasting/train: xgboost feature-flag gate (400 when off).
- ModelBundle.xgboost_version + registry runtime_info xgboost block,
both best-effort; compute_hash unchanged.
- Tests mirror the LightGBM suite, gated with importorskip("xgboost").
- Docs + examples/models/advanced_xgboost.py additive.
train / predict / scenarios / backtesting services are unchanged: each
branches on requires_features, so an xgboost model routes through every
path automatically. No migration, no API-contract change. XGBoost reuses
the regression historical/future feature builders, so the existing
leakage specs cover it by construction.
There was a problem hiding this comment.
Sorry @w7-mgfcode, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Summary
Implements
XGBoostForecaster— the second advanced feature-aware tree model (MLZOO-C1) — wrappingxgboost.XGBRegressor. It mirrors the mergedLightGBMForecaster(PRP-30 / MLZOO-B) byte-for-byte, with two library swaps:lgb.LGBMRegressor→xgb.XGBRegressor, anddeterministic/force_col_wise→tree_method="hist".This is C1 of two MLZOO-C review units. The sibling
PRPs/PRP-MLZOO-C2-prophet-like-additive-model.mdships the Prophet-like additive model on a separate branch — C1 and C2 are intentionally separate, additive, and order-independent.What changed
XGBoostModelConfig— conservative schema (n_estimators/max_depth/learning_rate/feature_config_hash), added to theModelConfigunion.XGBoostForecaster—requires_features=True;xgboostis lazy-imported insidefit()so importingmodels.pynever requires the optional extra. Deterministic vian_jobs=1+tree_method="hist"+ fixedrandom_state+ no stochastic subsampling; NaN-tolerant (missing=np.nan).model_factory— newxgboostbranch gated onforecast_enable_xgboost;ModelTypeliteral gains"xgboost".forecast_enable_xgboostruntime flag inapp/core/config.py(defaultFalse).ml-xgboostoptional dependency extra (xgboost>=2.1.0);uv.lockregenerated.xgboostbranches in_execute_trainand_execute_backtest.POST /forecasting/trainreturns 400 forxgboostwhen the flag is off.ModelBundle.xgboost_version(best-effort save + mismatch-warn on load) and axgboost_versionblock in registryruntime_info.compute_hashunchanged → no bundle hash shift.pytest.importorskip("xgboost").model_interface.md,feature_frame_contract.md,README.md, plusexamples/models/advanced_xgboost.py.ForecastingService.train_model/predict,scenarios/service.py, andbacktesting/service.pyare unchanged — each branches onrequires_features, so an XGBoost model routes through every path automatically. No Alembic migration, no API-contract change. XGBoost reuses the regression historical/future feature builders, so the existing leakage specs (test_regression_features_leakage.py,app/shared/feature_frames/tests/test_leakage.py) cover it by construction — no new leakage test added.Validation
uv run ruff check .— PASSuv run ruff format --check .— PASSuv run mypy app/— PASS (272 files; noxgboost.*mypy override needed — xgboost shipspy.typed)uv run pyright app/— PASS (0 errors, 68 pre-existing warnings)uv run pytest -v -m "not integration"— PASS (1358 passed, 247 deselected)uv run pytest -m integrationon forecasting/scenarios/jobs/registry — PASS (20 passed), includingtest_xgboost_baseline_returns_model_exogenous(method ==model_exogenous) andtest_train_xgboost_rejected_when_disabled(400)assert_array_equal) andrequires_features is Trueconfirmed;examples/models/advanced_xgboost.pyruns end-to-end.Closes #247