Context
Continuation of the seeder realism extension (.agents/plans/extend-data-seeder.md → Phase 2). Phase 0 (UI controls, #82), Phase 1 (realism — exogenous signals/multi-seasonality/changepoints/returns/substitution, #88) have shipped. Phase 2 deepens the retail model so a single dataset can illustrate multi-channel cannibalization, lifecycle-aware backtests, markdown analytics, and lead-time-driven stockouts.
Goal
Make the seeded data exercise the retail depth dimensions the forecasting/featuresets layers need:
- Channels —
in_store, online, click_collect, wholesale with realistic mix + cannibalization.
- Lifecycle —
intro / growth / maturity / decline / discontinued with launch/discontinue dates and ramp curves.
- Bundles/BOGO —
Promotion.kind discriminator (pct_off | bogo | bundle | markdown) + bundle member tracking.
- Markdowns — price-driven clearance events distinct from promo lifts.
- Lead-time —
replenishment_event table drives stockout clustering.
Scope (in)
Schema (Alembic — additive only)
SalesDaily.channel VARCHAR(20) NOT NULL DEFAULT 'in_store' + index on (date, channel).
Product.lifecycle_stage VARCHAR(20) NULL, Product.launch_date DATE NULL, Product.discontinue_date DATE NULL, Product.pack_size INT NULL, Product.subcategory VARCHAR(100) NULL.
Promotion.kind VARCHAR(20) NOT NULL DEFAULT 'pct_off', Promotion.bundle_member_product_ids JSONB NULL, Promotion.discount_pct FLOAT NULL (verify existence).
- New table
replenishment_event(id, date, store_id, product_id, lead_time_days, ordered_qty, received_qty).
Configs (app/shared/seeder/config.py)
ChannelConfig, LifecycleConfig, BundleConfig, MarkdownConfig, LeadTimeConfig — each disabled by default; defaults must not change byte-output of existing scenarios.
Generators (app/shared/seeder/generators/)
ProductGenerator updates — lifecycle attrs.
ProductLifecycleGenerator (new) — per-(product, date) demand multiplier.
BundleGenerator (new) — BOGO + bundle promo mechanics.
MarkdownGenerator (new) — clearance pricing driven by age/lifecycle/stockout-risk.
ReplenishmentGenerator (new) — drives inventory + stockout clustering.
SalesDailyGenerator — channel split + lifecycle multiplier integration.
Orchestration (core.py)
- Generation order: stores → products (with lifecycle) → calendar → exogenous → replenishment → promotions (with bundles) → markdowns → inventory (replenishment-driven) → sales (per channel, lifecycle-aware) → returns.
delete_data includes replenishment_event.
verify_data_integrity adds: channel sum equals total, lifecycle stages valid, bundle product IDs exist.
API surface
GenerateParams gains enable_multichannel, channel_mix, enable_lifecycle, enable_bundles, enable_markdowns, enable_lead_time.
GET /seeder/channels — current mix from generated data.
GET /dimensions/products/{id}/lifecycle-curve — per-product lifecycle trajectory.
Tests
- Per-channel reproducibility, lifecycle ramp, bundle uplift, markdown trigger, lead-time stockout clustering.
- Regression invariant:
retail_standard produces identical record counts when Phase 2 toggles are off.
Scope (out)
- Customer/transaction grain (separate PRP).
- Image/text assets (RAG slice owns).
- Real external API integration (synthesize only).
- International holidays (US-only behavior preserved).
- Frontend UI for the new toggles (lives in Phase 4 with the full SeederForm rewrite).
Acceptance
References
Context
Continuation of the seeder realism extension (
.agents/plans/extend-data-seeder.md→ Phase 2). Phase 0 (UI controls, #82), Phase 1 (realism — exogenous signals/multi-seasonality/changepoints/returns/substitution, #88) have shipped. Phase 2 deepens the retail model so a single dataset can illustrate multi-channel cannibalization, lifecycle-aware backtests, markdown analytics, and lead-time-driven stockouts.Goal
Make the seeded data exercise the retail depth dimensions the forecasting/featuresets layers need:
in_store,online,click_collect,wholesalewith realistic mix + cannibalization.intro / growth / maturity / decline / discontinuedwith launch/discontinue dates and ramp curves.Promotion.kinddiscriminator (pct_off | bogo | bundle | markdown) + bundle member tracking.replenishment_eventtable drives stockout clustering.Scope (in)
Schema (Alembic — additive only)
SalesDaily.channel VARCHAR(20) NOT NULL DEFAULT 'in_store'+ index on(date, channel).Product.lifecycle_stage VARCHAR(20) NULL,Product.launch_date DATE NULL,Product.discontinue_date DATE NULL,Product.pack_size INT NULL,Product.subcategory VARCHAR(100) NULL.Promotion.kind VARCHAR(20) NOT NULL DEFAULT 'pct_off',Promotion.bundle_member_product_ids JSONB NULL,Promotion.discount_pct FLOAT NULL(verify existence).replenishment_event(id, date, store_id, product_id, lead_time_days, ordered_qty, received_qty).Configs (
app/shared/seeder/config.py)ChannelConfig,LifecycleConfig,BundleConfig,MarkdownConfig,LeadTimeConfig— each disabled by default; defaults must not change byte-output of existing scenarios.Generators (
app/shared/seeder/generators/)ProductGeneratorupdates — lifecycle attrs.ProductLifecycleGenerator(new) — per-(product, date) demand multiplier.BundleGenerator(new) — BOGO + bundle promo mechanics.MarkdownGenerator(new) — clearance pricing driven by age/lifecycle/stockout-risk.ReplenishmentGenerator(new) — drives inventory + stockout clustering.SalesDailyGenerator— channel split + lifecycle multiplier integration.Orchestration (
core.py)delete_dataincludesreplenishment_event.verify_data_integrityadds: channel sum equals total, lifecycle stages valid, bundle product IDs exist.API surface
GenerateParamsgainsenable_multichannel,channel_mix,enable_lifecycle,enable_bundles,enable_markdowns,enable_lead_time.GET /seeder/channels— current mix from generated data.GET /dimensions/products/{id}/lifecycle-curve— per-product lifecycle trajectory.Tests
retail_standardproduces identical record counts when Phase 2 toggles are off.Scope (out)
Acceptance
SeederConfigwith disabled defaults.GenerateParamsexposes the six new toggles +channel_mix.GET /seeder/channelsandGET /dimensions/products/{id}/lifecycle-curvereturn correct shapes.ruff,mypy --strict,pyright --strict, unit + integration tests all green.docs/DATA-SEEDER.mdupdated with the new options.References
.agents/plans/extend-data-seeder.md(Phase 2 section).