feat(features): implement replenishment compute method (#109)#114
Conversation
Land PRP-3.1C — the replenishment compute slice for time-safe featuresets.
Add `FeatureEngineeringService._compute_replenishment_features` plus a sidecar
events attribute, a one-line orchestrator branch after the exogenous family,
and `FeatureDataLoader.load_replenishment_events` with a SQL-side
`date <= cutoff_date` filter for time-safety before pandas sees the rows.
Replenishment features land as two columns when the config is set:
* `days_since_last_replenishment_lag{N}` (float64, NaN sentinel)
* `replenishment_count_w{W}_lag{N}` (int64, 0 for no-events)
Cross-series isolation is enforced by `merge_asof(..., by=["store_id",
"product_id"])` plus per-entity `groupby.shift(...)` — belt-and-braces.
The rolling count uses `shift(1).rolling(W).sum()` (NEVER `rolling().shift()`),
and `lag_days > 1` layers an additional shift on top of the canonical safety
boundary (PRP-3.1C §15 Decision C).
Tests:
* `TestReplenishmentLeakage` (4 cases) asserts shift(N) invariance,
shift(1).rolling(W) ordering, cross-series isolation, and the on-cutoff
inclusion semantics implied by the SQL `<=` filter.
* `TestReplenishmentFeatures` (5 cases) covers happy path, zero-event
entity, single-event entity, cutoff alignment, and dtype contracts.
No schema, route, or migration changes — additive only. The PRP-3.1A
`config_hash_unchanged_when_phase2_omitted` snapshot still passes.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Sorry @w7-mgfcode, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
Summary
Lands PRP-3.1C — the replenishment compute slice for time-safe featuresets. Rebased onto
devafter PRs #111 (lifecycle) and #112 (promotion) landed; patched to align with the PRP-3.1D sidecar pattern before pushing (see "Patch notes" below).FeatureEngineeringService._compute_replenishment_features(df, events_df).compute_features, gated onself.config.replenishment_config.setattr(service, "_replenishment_events_df", events_df)in the loader-orchestrator (or in tests). The compute method reads viagetattr(self, "_replenishment_events_df", None). No new public API.days_since_last_replenishment_lag{N}(float64; NaN sentinel for no-prior-event).replenishment_count_w{W}_lag{N}(int64; 0 for no-events).FeatureDataLoader.load_replenishment_events()with SQL-sidedate <= cutoff_datefilter (enforces time-safety BEFORE pandas sees the rows).compute_features_for_seriesto load events whenreplenishment_configis set.Time-safety
merge_asof(..., by=["store_id", "product_id"])plus per-entitygroupby(...).shift(...)is the leakage gate fordays_since_last.shift(1).rolling(W).sum()is the canonical ordering forcount_window— neverrolling(W).sum().shift(1). Forlag_days > 1, an extrashift(lag_days - 1)layers on top.TestReplenishmentLeakage(4 cases) asserts: same-day event invisible at lag-1;shift(1).rolling(W)ordering; cross-series isolation; on-cutoff inclusion via the SQL<=filter.Scope
service.py,tests/test_leakage.py,tests/test_service.py(3 files, +749/-0 vsdevafter the patch).schemas.py/routes.py/conftest.py/ migration / docs edits.config_hash_unchanged_when_phase2_omitted) still passes.Patch notes — three deviations from the original agent commit
set_replenishment_events(events_df)setter. The original commit exposed it as a new public method onFeatureEngineeringService. Replaced with the matching PRP-3.1D pattern: dynamic sidecar attribute (setattrin loader/orchestrator, direct assignment in tests). Result: no new public surface; both replenishment and promotion families share one idiom.W=7vsW=3intest_count_window_uses_shift_then_rolling— kept as-is in this PR. The PRP-3.1A schema (feat(features): pydantic configs + PRP set for phase 2 feature wiring (#109) #110) boundscount_window_daysatge=7, so the test uses W=7 as the smallest allowed. Follow-up tracked in feat(features): revisit count_window_days minimum in PRP-3.1A ReplenishmentConfig schema #113 to decide whether to lower the schema floor or document it.days_sincereturning[NaN, NaN, NaN, 0, 1, 2, 3]at the post-shift step — accepted as-is after review. This matches the canonicalgroupby.shift(1)leakage idiom: the event-day's per-row value (0) becomes invisible because shift(1) reads the previous row, which had no event yet. The post-event rows correctly see the gap as observed at the lagged row. The PRP pseudocode's[NaN, NaN, NaN, NaN, 1, 2, 3]would be over-conservative without leakage benefit.Test plan
ruff check+ruff format --checkcleanmypy --strict app/features/featuresets/clean (0 errors)pyright --strict app/features/featuresets/clean (0 errors)pytest -vonapp/features/featuresets/— 104 tests pass (4 new replenishment leakage + 5 new replenishment service cases on top of lag/rolling/calendar/lifecycle/promotion)Refs
Closes the replenishment portion of #109. Follow-up: #113 (revisit
count_window_dayslower bound). PRP-3.1E will document thedays_sincesemantic choice in the featuresets reference and exercise the loader extension end-to-end.