chore: merge dev to main (Phase 1 Data Platform) by w7-mgfcode · Pull Request #13 · w7-mgfcode/ForecastLabAI

w7-mgfcode · 2026-01-26T10:58:36Z

Summary

Merges Phase 1 (Data Platform) implementation from dev to main.

Changes included:

7 SQLAlchemy 2.0 ORM models (Store, Product, Calendar, SalesDaily, PriceHistory, Promotion, InventorySnapshotDaily)
Alembic baseline migration
Pydantic v2 validation schemas
59 tests (32 unit + 13 integration + 14 core)
Documentation updates
Example SQL queries

PRs merged:

feat(data-platform): implement PRP-2 schema and migrations #12: feat(data-platform): implement PRP-2 schema and migrations

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a retail forecasting data platform: 3 dimension tables and 4 fact tables, plus DB migrations and enforcement of grain/constraints.
Documentation
- Updated Quick Start to run migrations; added detailed schema docs, example queries, and phase/architecture notes.
Tests
- Added unit and integration tests to validate models, constraints, and migration behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

* docs: add PRP-2 for data platform schema and migrations Create comprehensive Product Requirements Prompt for INITIAL-2 (Data Platform) including: - Mini-warehouse schema with dimension tables (store, product, calendar) - Fact tables (sales_daily, price_history, promotion, inventory_snapshot_daily) - SQLAlchemy 2.0 ORM model patterns with type annotations - Alembic migration guidance and constraint naming conventions - Grain protection via unique constraints for idempotent upserts - Composite indexes for time-series query optimization - Unit and integration test specifications - Example SQL queries for KPIs and exogenous feature joins Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * feat(data-platform): implement PRP-2 schema and migrations Add mini-warehouse schema for retail demand forecasting: Dimension tables: - store: locations with code, region, city, store_type - product: catalog with sku, category, brand, base_price - calendar: time dimension with day_of_week, is_holiday Fact tables: - sales_daily: grain-protected (date, store_id, product_id) - price_history: validity windows with valid_from/valid_to - promotion: discount mechanics with pct and amount - inventory_snapshot_daily: stockout detection Key features: - SQLAlchemy 2.0 patterns (Mapped[], mapped_column()) - Grain protection via UniqueConstraint for idempotent upserts - Check constraints for data quality (positive qty, valid dates) - Composite indexes for time-range + store/product queries - Pydantic v2 schemas for API validation - 32 unit tests + 11 integration tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * docs: add Phase 1 (Data Platform) documentation - Update README.md with migration step and database schema section - Update ARCHITECTURE.md section 5 to reflect implemented data platform - Update PHASE-index.md with Phase 1 status and version history - Create docs/PHASE/1-DATA_PLATFORM.md with detailed phase documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address code review feedback from Sourcery - Add `from __future__ import annotations` and remove quoted type annotations - Fix SQL discount calculation to distinguish NULL (no price) from 0% discount - Add positive-path test for Calendar insertion - Remove redundant single-column indexes covered by composite indexes - Add documentation comments explaining fixture duplication for pytest discovery Addresses: Comments 1-4 and overall comments from Sourcery review Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix(ci): resolve lint and type check failures - Format models.py with ruff - Add type: ignore comments for __table__.constraints attribute access - Consolidate mypy overrides for test modules in pyproject.toml Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Gabe@w7dev <gabor@w7-7.net> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

sourcery-ai

Sorry @w7-mgfcode, your pull request is larger than the review limit of 150000 diff characters

coderabbitai · 2026-01-26T10:59:08Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds a Phase 1 Data Platform feature: star-schema ORM models (Store, Product, Calendar, SalesDaily, PriceHistory, Promotion, InventorySnapshotDaily), Alembic migration, Pydantic schemas, tests (unit + integration), docs, and example SQL queries and fixtures.

Changes

Cohort / File(s)	Summary
PRP & Planning Docs `PRPs/PRP-2-data-platform-schema.md`, `docs/PHASE/1-DATA_PLATFORM.md`, `docs/PHASE-index.md`, `docs/ARCHITECTURE.md`	New PRP and Phase 1 documentation describing schema, goals, tasks, deliverables, and status.
ORM Models & Package `app/features/data_platform/__init__.py`, `app/features/data_platform/models.py`	Adds SQLAlchemy 2.0 models for 3 dimensions and 4 fact tables with PKs, FKs, composite unique grain constraints, check constraints, relationships, and indexes.
Alembic Migration `alembic/env.py`, `alembic/versions/e1165ebcef61_create_data_platform_tables.py`	Registers models for autogenerate and provides an upgrade/downgrade migration creating all tables, constraints, and indexes.
Pydantic Schemas `app/features/data_platform/schemas.py`	Adds Base/Create/Read Pydantic models (ORM mapping via from_attributes) and field validations for each entity.
Tests & Fixtures `app/features/data_platform/tests/__init__.py`, `app/features/data_platform/tests/conftest.py`, `app/features/data_platform/tests/test_constraints.py`, `app/features/data_platform/tests/test_models.py`, `tests/conftest.py`	New async test fixtures (db_session, sample entities) and comprehensive unit/integration tests validating constraints, relationships, and model surface. Root test conftest extended to support async DB sessions.
Examples & Schema README `examples/schema/README.md`, `examples/queries/exog_join.sql`, `examples/queries/kpi_sales.sql`	Adds schema README and multiple example queries (exogenous joins, feature engineering, KPI queries).
Project Docs & Config `README.md`, `pyproject.toml`	README updated to include migration step and schema refs; MyPy overrides merged/expanded for test modules.

Sequence Diagram(s)

sequenceDiagram
  participant Dev as Developer
  participant Alembic as Alembic
  participant App as Application (models/schemas)
  participant DB as Database (Postgres)

  rect rgba(200,200,255,0.5)
    Dev->>Alembic: run "alembic upgrade head"
    Alembic->>DB: create tables, constraints, indexes
    DB-->>Alembic: schema applied
  end

  rect rgba(200,255,200,0.5)
    Dev->>App: start application / run tests
    App->>DB: connect (use ORM models)
    App->>DB: read/write using models (enforce constraints)
    DB-->>App: results / IntegrityErrors for violations
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Hopping through migrations bright,

Seven tables in the night,
Constraints snug, indexes tight,
Schemas, tests — a tasty bite,
Forecasts bloom beneath moonlight.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically identifies the main change: merging Phase 1 (Data Platform implementation) from dev to main, which aligns with the changeset's substantial introduction of ORM models, schemas, migrations, tests, and documentation.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@docs/PHASE/1-DATA_PLATFORM.md`:
- Around line 60-66: The doc list uses names like StoreResponse/ProductResponse
but the code defines StoreRead/ProductRead/etc.; update the documentation to
reference the actual schema names (StoreRead, ProductRead, CalendarRead,
SalesDailyRead, PriceHistoryRead, PromotionRead, InventorySnapshotDailyRead) so
the docs match the schemas, or alternatively add aliases in the schema module
(e.g., define StoreResponse = StoreRead, ProductResponse = ProductRead, etc.) so
both names are valid; pick one approach and apply it consistently across the
referenced entries.

🧹 Nitpick comments (12)

pyproject.toml (1)

112-120: Consider simplifying the module patterns.

The pattern "*.tests.*" is a glob that should already match "app.*.tests.*" and "app.features.*.tests.*". The more specific patterns may be redundant.

However, MyPy's module matching can be nuanced, so keeping explicit patterns ensures coverage across different import paths. If intentional for safety, this is acceptable.
tests/conftest.py (1)
22-53: Consider using session/module scope or transaction-based isolation for performance.

Creating and dropping all tables for each test function is expensive. For integration test suites with many tests, this can significantly slow down the test run.

Consider these alternatives:

Use scope="session" or scope="module" for the table creation/teardown, with per-test transaction rollback for isolation

Use savepoints for nested transaction isolation within a single database setup
♻️ Example using module scope with transaction isolation
`@pytest.fixture`(scope="module")
async def db_engine():
    """Create engine and tables once per module."""
    settings = get_settings()
    engine = create_async_engine(settings.database_url, echo=False)
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)
    yield engine
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.drop_all)
    await engine.dispose()


`@pytest.fixture`
async def db_session(db_engine):
    """Create session with transaction rollback per test."""
    async_session_maker = async_sessionmaker(
        db_engine,
        class_=AsyncSession,
        expire_on_commit=False,
    )
    async with async_session_maker() as session:
        async with session.begin():
            yield session
            await session.rollback()
app/features/data_platform/tests/conftest.py (1)

20-51: Same performance consideration applies here.

This fixture has the same create/drop-all-tables-per-test pattern. The recommendation from tests/conftest.py to use scoped fixtures with transaction isolation applies here as well.
app/features/data_platform/models.py (2)
249-249: Consider adjusting Numeric(5, 4) precision for discount_pct.

With Numeric(5, 4), the maximum value is 9.9999, but the check constraint limits the range to 0-1. A precision of Numeric(5, 4) reserves 4 decimal places (e.g., 0.1500 for 15%), which is reasonable for precision but means 4 of the 5 digits are after the decimal.

This works correctly but Numeric(3, 2) (allowing 0.00 to 0.99) would be more aligned with typical percentage representations, unless sub-percentage precision (e.g., 15.25% = 0.1525) is a requirement.

294-296: Consider a check constraint to keep is_stockout consistent with on_hand_qty.

is_stockout is a denormalized boolean that could become inconsistent with on_hand_qty if updated independently. Consider adding a check constraint to enforce consistency:
CheckConstraint(
    "(is_stockout = true AND on_hand_qty = 0) OR (is_stockout = false AND on_hand_qty > 0)",
    name="ck_inventory_stockout_consistency"
)
Alternatively, if is_stockout should always be derived, consider making it a generated/computed column (PostgreSQL 12+ supports GENERATED ALWAYS AS), or removing it and computing it at query time.
PRPs/PRP-2-data-platform-schema.md (1)
193-196: Minor documentation inconsistency with actual implementation.

The gotchas section recommends wildcard import (from app.features.data_platform.models import *), but the actual implementation in alembic/env.py uses the cleaner module import pattern (from app.features.data_platform import models as data_platform_models). Consider updating this documentation to reflect the preferred pattern used in the implementation.
📝 Suggested documentation update
 # CRITICAL: Alembic import models in env.py for autogenerate to work
 # In alembic/env.py, add:
-from app.features.data_platform.models import *  # noqa: F401, F403
+from app.features.data_platform import models as data_platform_models  # noqa: F401
app/features/data_platform/schemas.py (2)
127-147: Consider adding cross-field validation for date ranges.

The schema accepts valid_to and valid_from but doesn't validate that valid_to >= valid_from. While the database check constraint will catch this, adding a Pydantic model_validator would provide earlier feedback to API consumers.
♻️ Optional: Add model validator
from pydantic import model_validator

class PriceHistoryBase(BaseModel):
    """Base schema for price history data."""

    product_id: int = Field(..., gt=0)
    store_id: int | None = Field(default=None, gt=0)
    price: Decimal = Field(..., ge=0)
    valid_from: date
    valid_to: date | None = None

    `@model_validator`(mode='after')
    def check_date_range(self) -> 'PriceHistoryBase':
        if self.valid_to is not None and self.valid_to < self.valid_from:
            raise ValueError('valid_to must be >= valid_from')
        return self
154-176: Same date range validation opportunity as PriceHistory.

Consider adding a model_validator to ensure end_date >= start_date for early API feedback, though the DB constraint will enforce this.
docs/PHASE/1-DATA_PLATFORM.md (1)
199-242: Add language specifiers to fenced code blocks.

Per static analysis hints, the validation result and directory structure code blocks should have language specifiers for proper rendering.
📝 Suggested fix
 ### Ruff (Linting + Formatting)
-```
+```text
 All checks passed!
MyPy (Static Type Checking)

- +text
Success: no issues found in 25 source files
### Pyright (Static Type Checking)
-```
+```text
0 errors, 0 warnings, 0 informations
Pytest

- +text
58 passed (14 core + 32 unit + 11 integration + 1 fixture)
## Directory Structure

-```
+```text
app/features/data_platform/
├── __init__.py
examples/schema/README.md (1)
145-158: Add language specifier for the relationships diagram.

The ASCII diagram code block should have a language specifier for consistent markdown rendering.
📝 Suggested fix
-```
+```text
 ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
 │    Store     │     │   Product    │     │   Calendar   │
examples/queries/kpi_sales.sql (2)
7-18: Parameterize the date range to keep this example evergreen.

Hard-coding January 2024 will go stale quickly. Consider using relative dates or placeholders so the query stays useful.
♻️ Proposed tweak
-WHERE s.date BETWEEN '2024-01-01' AND '2024-01-31'
+WHERE s.date BETWEEN CURRENT_DATE - INTERVAL '30 days' AND CURRENT_DATE
55-81: Consider left‑joining prior year to include new stores.

The inner join drops stores without prior‑year data, which can hide growth for newly opened stores. A left join keeps those rows while still protecting against divide‑by‑zero.
♻️ Proposed tweak
-FROM current_year cy
-JOIN prior_year py ON cy.store_id = py.store_id
+FROM current_year cy
+LEFT JOIN prior_year py ON cy.store_id = py.store_id
 JOIN store st ON cy.store_id = st.id
-    py.revenue AS prior_year_revenue,
-    ROUND((cy.revenue - py.revenue) / NULLIF(py.revenue, 0) * 100, 2) AS yoy_growth_pct
+    py.revenue AS prior_year_revenue,
+    ROUND((cy.revenue - COALESCE(py.revenue, 0))
+          / NULLIF(COALESCE(py.revenue, 0), 0) * 100, 2) AS yoy_growth_pct

coderabbitai · 2026-01-26T11:02:46Z

+- `StoreBase`, `StoreCreate`, `StoreResponse`
+- `ProductBase`, `ProductCreate`, `ProductResponse`
+- `CalendarBase`, `CalendarCreate`, `CalendarResponse`
+- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyResponse`
+- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryResponse`
+- `PromotionBase`, `PromotionCreate`, `PromotionResponse`
+- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyResponse`


⚠️ Potential issue | 🟡 Minor

Documentation-code naming mismatch.

The documentation references StoreResponse, ProductResponse, etc., but the actual schemas in app/features/data_platform/schemas.py use StoreRead, ProductRead, etc.

📝 Suggested fix

Base and response schemas for each model: -- `StoreBase`, `StoreCreate`, `StoreResponse` -- `ProductBase`, `ProductCreate`, `ProductResponse` -- `CalendarBase`, `CalendarCreate`, `CalendarResponse` -- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyResponse` -- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryResponse` -- `PromotionBase`, `PromotionCreate`, `PromotionResponse` -- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyResponse` +- `StoreBase`, `StoreCreate`, `StoreRead` +- `ProductBase`, `ProductCreate`, `ProductRead` +- `CalendarBase`, `CalendarCreate`, `CalendarRead` +- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyRead` +- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryRead` +- `PromotionBase`, `PromotionCreate`, `PromotionRead` +- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyRead`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- `StoreBase`, `StoreCreate`, `StoreResponse`

- `ProductBase`, `ProductCreate`, `ProductResponse`

- `CalendarBase`, `CalendarCreate`, `CalendarResponse`

- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyResponse`

- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryResponse`

- `PromotionBase`, `PromotionCreate`, `PromotionResponse`

- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyResponse`

- `StoreBase`, `StoreCreate`, `StoreRead`

- `ProductBase`, `ProductCreate`, `ProductRead`

- `CalendarBase`, `CalendarCreate`, `CalendarRead`

- `SalesDailyBase`, `SalesDailyCreate`, `SalesDailyRead`

- `PriceHistoryBase`, `PriceHistoryCreate`, `PriceHistoryRead`

- `PromotionBase`, `PromotionCreate`, `PromotionRead`

- `InventorySnapshotDailyBase`, `InventorySnapshotDailyCreate`, `InventorySnapshotDailyRead`

🤖 Prompt for AI Agents

In `@docs/PHASE/1-DATA_PLATFORM.md` around lines 60 - 66, The doc list uses names like StoreResponse/ProductResponse but the code defines StoreRead/ProductRead/etc.; update the documentation to reference the actual schema names (StoreRead, ProductRead, CalendarRead, SalesDailyRead, PriceHistoryRead, PromotionRead, InventorySnapshotDailyRead) so the docs match the schemas, or alternatively add aliases in the schema module (e.g., define StoreResponse = StoreRead, ProductResponse = ProductRead, etc.) so both names are valid; pick one approach and apply it consistently across the referenced entries.

sourcery-ai Bot reviewed Jan 26, 2026

View reviewed changes

w7-learn approved these changes Jan 26, 2026

View reviewed changes

coderabbitai Bot reviewed Jan 26, 2026

View reviewed changes

w7-learn previously approved these changes Jan 26, 2026

View reviewed changes

Merge origin/main into dev

2dcc671

w7-mgfcode dismissed w7-learn’s stale review via 2dcc671 January 26, 2026 11:05

w7-learn approved these changes Jan 26, 2026

View reviewed changes

w7-mgfcode merged commit 4d672a6 into main Jan 26, 2026
12 of 13 checks passed

coderabbitai Bot mentioned this pull request May 12, 2026

chore(repo): merge dev to main — seeder phases 1+2 + ci/docs hardening (#92) #97

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: merge dev to main (Phase 1 Data Platform)#13

chore: merge dev to main (Phase 1 Data Platform)#13
w7-mgfcode merged 2 commits into
mainfrom
dev

w7-mgfcode commented Jan 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

coderabbitai Bot commented Jan 26, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

MyPy (Static Type Checking)

Pytest

Uh oh!

coderabbitai Bot Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

w7-mgfcode commented Jan 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes included:

PRs merged:

Summary by CodeRabbit

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

MyPy (Static Type Checking)

Pytest

Uh oh!

coderabbitai Bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

w7-mgfcode commented Jan 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 26, 2026 •

edited

Loading