Skip to content

feat(data): seeder Phase 2 — retail depth (channels, lifecycle, bundles/BOGO, markdowns, lead-time) #92

@w7-mgfcode

Description

@w7-mgfcode

Context

Continuation of the seeder realism extension (.agents/plans/extend-data-seeder.md → Phase 2). Phase 0 (UI controls, #82), Phase 1 (realism — exogenous signals/multi-seasonality/changepoints/returns/substitution, #88) have shipped. Phase 2 deepens the retail model so a single dataset can illustrate multi-channel cannibalization, lifecycle-aware backtests, markdown analytics, and lead-time-driven stockouts.

Goal

Make the seeded data exercise the retail depth dimensions the forecasting/featuresets layers need:

  • Channelsin_store, online, click_collect, wholesale with realistic mix + cannibalization.
  • Lifecycleintro / growth / maturity / decline / discontinued with launch/discontinue dates and ramp curves.
  • Bundles/BOGOPromotion.kind discriminator (pct_off | bogo | bundle | markdown) + bundle member tracking.
  • Markdowns — price-driven clearance events distinct from promo lifts.
  • Lead-timereplenishment_event table drives stockout clustering.

Scope (in)

Schema (Alembic — additive only)

  • SalesDaily.channel VARCHAR(20) NOT NULL DEFAULT 'in_store' + index on (date, channel).
  • Product.lifecycle_stage VARCHAR(20) NULL, Product.launch_date DATE NULL, Product.discontinue_date DATE NULL, Product.pack_size INT NULL, Product.subcategory VARCHAR(100) NULL.
  • Promotion.kind VARCHAR(20) NOT NULL DEFAULT 'pct_off', Promotion.bundle_member_product_ids JSONB NULL, Promotion.discount_pct FLOAT NULL (verify existence).
  • New table replenishment_event(id, date, store_id, product_id, lead_time_days, ordered_qty, received_qty).

Configs (app/shared/seeder/config.py)

  • ChannelConfig, LifecycleConfig, BundleConfig, MarkdownConfig, LeadTimeConfig — each disabled by default; defaults must not change byte-output of existing scenarios.

Generators (app/shared/seeder/generators/)

  • ProductGenerator updates — lifecycle attrs.
  • ProductLifecycleGenerator (new) — per-(product, date) demand multiplier.
  • BundleGenerator (new) — BOGO + bundle promo mechanics.
  • MarkdownGenerator (new) — clearance pricing driven by age/lifecycle/stockout-risk.
  • ReplenishmentGenerator (new) — drives inventory + stockout clustering.
  • SalesDailyGenerator — channel split + lifecycle multiplier integration.

Orchestration (core.py)

  • Generation order: stores → products (with lifecycle) → calendar → exogenous → replenishment → promotions (with bundles) → markdowns → inventory (replenishment-driven) → sales (per channel, lifecycle-aware) → returns.
  • delete_data includes replenishment_event.
  • verify_data_integrity adds: channel sum equals total, lifecycle stages valid, bundle product IDs exist.

API surface

  • GenerateParams gains enable_multichannel, channel_mix, enable_lifecycle, enable_bundles, enable_markdowns, enable_lead_time.
  • GET /seeder/channels — current mix from generated data.
  • GET /dimensions/products/{id}/lifecycle-curve — per-product lifecycle trajectory.

Tests

  • Per-channel reproducibility, lifecycle ramp, bundle uplift, markdown trigger, lead-time stockout clustering.
  • Regression invariant: retail_standard produces identical record counts when Phase 2 toggles are off.

Scope (out)

  • Customer/transaction grain (separate PRP).
  • Image/text assets (RAG slice owns).
  • Real external API integration (synthesize only).
  • International holidays (US-only behavior preserved).
  • Frontend UI for the new toggles (lives in Phase 4 with the full SeederForm rewrite).

Acceptance

  • All five schema changes land in a single additive migration that roundtrips cleanly.
  • Five new dataclass configs wired into SeederConfig with disabled defaults.
  • Five new/extended generators emit valid rows for each table.
  • GenerateParams exposes the six new toggles + channel_mix.
  • GET /seeder/channels and GET /dimensions/products/{id}/lifecycle-curve return correct shapes.
  • Regression invariant: existing scenarios produce byte-identical counts when toggles are disabled.
  • ruff, mypy --strict, pyright --strict, unit + integration tests all green.
  • docs/DATA-SEEDER.md updated with the new options.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions